february 2014 hug : pig on tez

The Apache Software Foundation

Pig on Tez

C h e o l s o o P a r k , N e t f l i x

R o h i n i P a l a n i s w a m y , Ya h o o !

P R E S E N T E D B Y

The Apache Software Foundation 2

Apache Pig on Tez Team

Name Role Company

Alex BainApache Pig Contributor Linkedin

Cheolsoo Park VP. Apache Pig Netflix

Daniel Dai Apache Pig PMC Hortonworks

Mark Wagner Apache Pig Committer Linkedin

Olga NatkovichApache Pig PMC, Pig on Tez Project Manager

Yahoo!

Rohini Palaniswamy Apache Pig PMC Yahoo!


Agenda Overview Pig and Hive Pig on Tez

Why Tez? Benefits of Tez Design Operator DAGs Performance Known Issues Where are we? What next?


Pig Overview Apache top-level project for ETL on hadoop. PIG Latin - Procedural scripting language that translates sequence of data processing

steps into MapReduce jobs. Easy to write, read and reuse and very extensible. Feature parity with SQL (FILTER BY, CROSS, JOIN (OUTER, INNER), ORDER BY, LIMIT, RANK, ROLLUP,

CUBE), Custom Loader and Storer, User defined functions (java and non-java), Nested ForEach, Streaming, macros and much more

PAGEVIEWS = LOAD ‘/data/pageviews’ as (user, url);

GRP = GROUP PAGEVIEWS BY user;

CNT = FOREACH GRP GENERATE group, COUNT(url) as numvisits;

STORE CNT into ‘/data/visited’ using PigStorage(‘,’);


Pig and HivePig Hive

LanguagePIG Latin - Procedural SQL - Declarative

Features

Feature rich. Can easily add new operators and constructs. For eg: Nested Foreach, Switch case, Macros, Scalars.

Limited to SQL operators

Developer code

Load/StoreFunc, Algebraic and Accumulator java UDFs, non-java UDFs (jython, python, javascript, groovy, jruby), Custom Partitioners.

StorageHandler, java UDFs

Complex ProcessingWell suited. Multi-query works well with 1000s of lines of pig script.

Not a good fit

ServerOnly client. Can work with Hive Metastore using HCatLoader/Storer.

Requires Metastore server and data has to be registered in it. HiverServer2 for jdbc


Pig and Hive - Continued

Pig Hive

Tez as execution enginePlanned for 0.14 Planned for Hive 0.13

ORCFile SupportPatch available. Currently through HCatLoader

From Hive 0.12 onwards. Huge performance gains

Vectorization No. May be in future.Yes. Huge performance gains

Transactions No Yes. In works

Cost-based optimizer No Yes. In works

JDBC support, Integration with BI tools

NoYes. HiveServer2 with Microstrategy/Tableau

Area of applicationPipeline processing language standard

Interactive Analytics /Reporting Platform


Why Tez? Built on top of YARN

Multi-tenancy (queues, capacity management) Resource allocation

DAG execution framework Natural fit for Pig and Hive than MR as their execution plans are DAGs. Better than running a DAG of MR jobs passing data in between jobs using HDFS as intermediate store.

Different types of edges ONE_ONE, BROADCAST, SHUFFLE

Flexible Input-Processor-Output runtime model Custom Vertex Processors. For eg: Map Processor, Reduce Processor, Pig Processor Custom Inputs. For eg: MRFileInput (input to map), ShuffledMergedInput (input to reduce) Custom Outputs. For eg: OnFileSortedOutput (output of map), MRFileOutput (output of reduce)

Multiple inputs and outputs Highly extensible Security Support from Tez Community and Hive Community


Why Tez? – As a end user Better Performance Reduced Resource Usage (Containers/Memory/CPU) Reduced Network I/O Reduced Namenode and Datanode load


Benefits of Tez

Features Benefits

No intermediate data storage

• Less pressure on Namenode- Lesser calls for listing and getting block locations- Smaller namespace usage - Cuts down on GC

• Less pressure on Datanode- Cuts down on IO in network for both writing and reading.- Saves space as there are no 3 replicas

• Eliminates extra step of map reads from HDFS in every intermediate job in DAG

- Saves on capacity by eliminating the need for map task containers

Single AM for whole DAG

• Saves on capacity. For a 5 stage MR job, there would be 5 AM containers launched.

• Eliminates issue of queue and resource contention faced in MR for jobs started after previous job in DAG completes.


Benefits of Tez - Continued

Features Benefits

Container reuse

• Reduced launch overhead- Container request and release overhead- Resource localization overhead- JVM launch time overhead

• Reduced network IO- Reduce tasks can be launched on same node as Map- 1-1 edge tasks can be launched on same node

Vertex caching• Memory structures like small tables used for join can be cached

in jvm and reused for next task on container reuse. Provides significant performance speedup.

Custom inputs and outputs• Using unsorted input and output where possible saves a lot of

CPU usage and increases performance

Dynamic reducer estimation• Saves on capacity. Can have reducers based on data size

instead of having fixed number of reducers.


Pig on Tez - Design

Logical Plan

Tez Plan MR Plan

Physical Plan

Tez Execution Engine MR Execution Engine

LogToPhyTranslationVisitor

MRCompilerTezCompiler


Pig on Tez – Join

Leftsplit

Load L and R

Rightsplit

Leftsplit

Rightsplit

l = LOAD ‘left’ AS (x, y);r = LOAD ‘right’ AS (x, z);j = JOIN l BY x, r BY x;

Load L

Leftsplit

Leftsplit

Rightsplit

Rightsplit

Load R

Join Join

Configurationper input

Configurationper job


Pig on Tez – Split + Group-by

f = LOAD ‘foo’ AS (x, y, z);g1 = GROUP f BY y;g2 = GROUP f BY z;j = JOIN g1 BY group, g2 BY group;Group by y Group by z

Load foo

Join

Load g1 and Load g2

Group by y Group by z

Load foo

Join

Multiple outputs

Reduce followsreduce

HDFS HDFS

Split multiplex de-multiplex


Pig on Tez – Order-by

Aggregate

Sample

Sort

Partition

f = LOAD ‘foo’ AS (x, y);o = ORDER f BY x;

Stage sample mapon distributed cache

Load & Sample

Aggregate

Partition

Sort

Broadcast sample mapHDFS

Pass through inputvia 1-1 edge


Pig on Tez – Skewed join

Aggregate

Sample L

Join

Stage sample mapon distributed cache

l = LOAD ‘left’ AS (x, y);r = LOAD ‘right’ AS (x, z);j = JOIN l BY x, r BY x USING ‘skewed’;

Load &Sample

Aggregate

Partition L

Join

Pass through inputvia 1-1 edge

Partition R

HDFS

Broadcastsample map

Partition L and Partition R


Performance numbers

Replicated Join (2.8x)

Join + Groupby

(1.5x)

Join + Groupby +

Orderby (1.5x)

3 way Split + Join +

Groupby + Orderby

(2.6x)

0500

100015002000250030003500400045005000

MRTez

Tim

e in

se

cs


Factors affecting performance Number of stages in the DAG

Higher the number of stages in the DAG, performance of Tez over MR will be better.

Cluster/queue capacity More congested a queue is, the performance of Tez over MR will be better due to container reuse.

Size of intermediate output More the size of intermediate output, the performance of Tez over MR will be better due to reduced

HDFS usage.

Size of data in the job For smaller data and more stages, the performance of Tez over MR will be better as percentage of

launch overhead in the total time is high for smaller jobs.

Vertex caching


Container usage

Query MR Tez SavingsTez with

container reuse

Replicated Join7563 7562 1 180

Join + Groupby + Orderby

7655 7603 52 180

Join + Groupby + Orderby

7663 7609 54 180

3 way Split + Join + Groupby + Order by

622 563 59 180

Note. The cluster size was 25 nodes with 180 containers (1.5G each) and Tez reused them again and again for tasks.


Known issues Container reuse will have issues when there are

Static variables in LoadFunc, StoreFunc, UDFs Memory leaks in LoadFunc, StoreFunc, UDFs

With single DAG execution of whole script, AM retries can be very costly until Tez supports checkpointing and resuming.


Where are we? Major operators

Load, Store, Filter-by, Foreach Split, Union Group-by, Distinct, Limit Order-by Hash join, Replicated join, Skewed join, Merge join

UDFs (Java and non-Java) Streaming Multi-query on and off Macros Scalars 95% of e2e tests pass for finished features.


What next? Feature Parity with MR

Local mode Port all unit and e2e tests Support for remaining Operators

CROSS, RANK, CUBE, ROLLUP

Support for Native Mapreduce (Low priority) Merge tez branch with trunk Stability

Handling failures Testing and tuning for large data and DAGs with > 10 stages

Usability Counters Progress Information Log information and debuggability


What next? – Performance Improvements› Dynamic Reducer Estimation› Better memory management› Calculate input splits in AM and let Tez do combining of input splits for

pig.maxCombinedSplitSize› Vertex Grouping to write data directly into one output directory from multiple vertices in

case of union› Using unsorted shuffle in Union, Orderby, Skewed Join, etc to improve performance.› Shared Edges for multiple outputs if same data has to go to multiple downstream

vertices. For eg: Multi-query off, skewed join sample aggregation output.› HDFS Caching


Contr ibutors Welcome


Pig User Group Meetup at LinkedIn 14 t h March 2014


Quest ions ???

february 2014 hug : pig on tez

Education