[2c1] 아파치 피그를 위한 테즈 연산...
DESCRIPTION
DEVIEW 2014 [2C1] 아파치 피그를 위한 테즈 연산 엔진 개발하기 최종TRANSCRIPT
박철수 엔지니어 / 넷플릭스 빅데이터플랫폼팀 Netflix Big Data Platform
Apache Pig를 위한 Tez 연산 엔진 개발하기
1. Background 2. What is Pig on Tez? 3. Why Apache Tez? 4. Shortcomings and What’s Next
CONTENTS
1. Background
1.1 Netflix Data Pipeline
Cloud apps
Suro Ursula
Cassandra SS
Tables Aegisthus
S3 DW
15 min
Daily
Events Data Pipeline
Stateful Data Pipeline
1.2 Netflix Big Data Platform
S3 DW
Hadoop clusters
Federated execution
engine
Federated metadata service
Data Lineage
Data Visualization
Data Movement
Data Quality
Pig Workflow Visualization
Job/Cluster Performance Visualization
1.3 Data Volume
~200 billions events/day
~40 TB incoming data/day (compressed)
~1.2 PB data read/day
~100 TB data wrote/day
10+ PB DW on S3
1.4 Netflix Big Data Platform
S3 DW
Hadoop clusters
Federated execution
engine
Federated metadata service
Data Lineage
Data Visualization
Data Movement
Data Quality
Pig Workflow Visualization
Job/Cluster Performance Visualization
With ever growing data, ETL runs
slower and slower.
1.5 ETL Completion Trend
Common problems across organizations 1. Similar data platform architecture
1. Pig for ETL jobs
2. Hive/Presto for ad-hoc queries
1.6 Common Problems
1.7 Pig on Tez Team
• Alex Bain (LinkedIn: 2013/08~2014/01, Dev)
• Mark Wagner (LinkedIn: 2013/08~2014/01, Dev)
• Cheolsoo Park (Netflix: 2013/08~2014/08, Dev)
• Olga Natkovich (Yahoo: 2013/08~present, PM)
• Rohini Palaniswamy (Yahoo: 2013/08~present, Dev)
• Daniel Dai (Hortonworks: 2013/08~present, Dev)
2. What is Pig on Tez?
Non-blocking operators 1. LOAD / STORE
2. FOREACH __ GENERATE __
3. FILTER __ BY __
Blocking operators 1. GROUP __ BY __
2. ORDER __ BY __
3. JOIN __ BY __
Translated to a MapReduce shuffle
2.1 Pig Concepts
2.2 MapReduce Plan
LOAD
FOREACH
GROUP BY
FOREACH
STORE
LOAD
FOREACH
GLOBAL REARRANGE
FOREACH
PACKAGE
LOCAL REARRANGE
STORE
LOAD
FOREACH
LOCAL REARRANGE
PACKAGE
STORE
FOREACH
Shuffle
Logical Plan
Physical Plan MR Plan
2.3 What’s Problem?
Restrictions by MapReduce 1. Extra intermediate output on HDFS
2. Artificial synchronization barriers
3. Inefficient use of resources
4. Multi-query optimization
Low-level DAG Framework 1. Build DAG by defining vertices and edges.
2. Customize scheduling of DAG and movement of data.
• Sequential and concurrent
• 1-1, broadcasting, scatter and gather
Flexible Input-Processor-Output Model 1. Thin API layer to wrap around arbitrary application code.
2. Compose inputs, processor, and outputs to execute arbitrary processing.
2.4 Tez Concepts
Input Processor Output initialize getReader handleEvents close
initialize run handleEvents close
initialize getWriter handleEvents close
2.5 Pig on Tez Logical Plan
Physical Plan
Tez Plan
Tez Execution Engine
MR Plan
MR Execution Engine
LogToPhyTranslationVisitor
MRCompiler TezCompiler
2.6 Tez DAG: Split + Group By + Join Load ‘foo’
Group by y, Group by z
Join g1, g2
Load g1, Load g2
HDFS HDFS
Split multiplex De-multiplex
Load ‘foo’
Group by y
Group by z
Join g1, g2
Multiple outputs
Reducer follows reducer
a = LOAD ‘foo’ AS (x, y, z); b = GROUP a BY y; c = GROUP a BY z; d = JOIN b BY group; c BY group;
2.7 Tez DAG: Order By Sample
Aggregate
Sort
Load, Partition
HDFS
Load, Sample
Partition
Sort
Aggregate a = LOAD ‘foo’ AS (x, y); b = FILTER a BY y is not null; c = ORDER b BY x;
Stage sample map on distributed cache
Broadcast sample map
1-1 Unsorted edge
Cache sample map
3. Why Apache Tez?
3.1 DAG Execution
DAG Execution 1. Eliminate HDFS writes between workflow jobs.
2. Eliminate job launch overhead of workflow jobs.
3. Eliminate identity mappers in every workflow jobs.
Benefits 1. Faster execution and higher predictability.
3.2 MR vs. Tez
3.3 AM / Container Reuse
AM Reuse 1. Grunt shell uses one AM for all commands till timeout.
2. More than one DAGs submitted for merge join, collected group, and exec.
Container Reuse 1. Rerun new tasks on already warmed-up JVM.
Benefits 1. Reduce container launch overhead.
2. Reduce networks IO.
• 1-1 edge tasks are launched on same node.
3.4 Broadcast Edge / Object Cache
Broadcast Edge 1. Broadcast same data to all tasks in successor vertex.
Object Cache 1. Shared in memory objects for scope of vertex and DAG.
Benefits 1. Replace use of distributed cache.
2. Avoid input fetching if cache is available on container reuse.
• Replicated join runs faster on small cluster.
3.5 Vertex Group
Vertex Group 1. Group multiple vertices into a vertex group and produce a combiner output.
Benefits 1. Better performance due to elimination of an additional vertex.
Load b Load a
Group
Load b Load a
Union
Group
a = LOAD ‘a’; b = LOAD ‘b’; c = UNION a, b; d = GROUP c BY $0;
3.6 Slow Start/Pre-launch
Slow Start/Pre-launch 1. Pluggable vertex manager pre-launches the reducers before all maps have co
mpleted so that shuffle can start (e.g. LIMIT not following ORDER BY).
Benefits 1. Better performance due to parallel execution of multiple vertices.
3.7 Performance Numbers
0
50
100
150
200
250
Job 1 (2x) Job 2 (3x) Job 3 (1.7x) Job 4 (1.2x) Job 5 (1.0x)
MR
Tez
20m vs 10m
1h22m vs 28m
2h17m vs 1h15m
33m vs 28m
3h57m vs 3h54m
3.8 Performance Deep Dive
This MR job blocks DAG.
3.9 Performance Deep Dive
Huge amount of intermediate files are written to HDFS.
4. Shortcomings And What’s Next
4.1 Shortcomings
Auto Parallelism 1. Eliminating mappers without adjusting parallelisms can make jobs run slower.
In MR, combiners run with 1600 tasks.
In Tez, combiners Run With 500 tasks.
4.2 Shortcomings
Current Status 1. User-specified parallelism always takes precedence.
2. If no parallelism is specified, Pig estimates using static rules. For eg, if vertex
contains filter-by, reduce its parallelism by 50%.
3. At execution time, parallelism is adjusted again based on per-vertex sampling.
Problems 1. In legacy Pig jobs, parallelism is optimized for MR. So honoring user-specified
parallelism can hurt performance in Tez.
2. Static-rule-based estimation cannot be always accurate.
3. Sample-based estimation cannot be always accurate.
4.3 Shortcomings
Web UI and Tools Integration 1. Tez AM has no UI (i.e. no job page).
2. Tez hasn’t integrated with YARN ATS (i.e. no job history page).
3. Tez hasn’t integrated with Netflix internal tools such as Inviso and Lipstick.
4.4 What’s Next?
Tez 1. Resolve TEZ-8: Tez UI for progress tracking and history.
• Tez 0.5.x release (latest) doesn’t include TEZ-8.
Pig on Tez 1. Improve auto parallelism and usability.
• Pig on Tez will be included in Pig 0.14 release, but these issues might be
still there.
Q&A
THANK YOU