conﬁguringandop;mizingspark...

1 © Cloudera, Inc. All rights reserved.

Configuring and Op;mizing Spark Applica;ons With Ease

Nishkam Ravi Cloudera

Ethan Chan Stanford, Cloudera


Spark I MapReduce


val dataRDD = sparkContext.textFile("hdfs://...") dataRDD.flatMap(line => line.split(" ")).map(word => (word, 1)).reduceByKey(_ + _).collect()

Transforma;ons Ac;on

RDD

•  Simple high-‐level API for data transforma;on



Driver

Worker

Worker

Worker

Tasks

Tasks

Tasks

Executors

Executors

Executors


• General task graphs •  filter, dis;nct, union, sortBy.. •  Transforma;ons pipelined where possible



• General task graphs •  filter, dis;nct, union, sortBy.. •  Transforma;ons pipelined where possible

•  In-‐memory compute • Great performance •  Orders of magnitude faster than MapReduce •  Fault tolerance

• One stack to rule them all •  Procedural, ML, Graph, SQL, Streaming, etc



Cloudera Customer Use Cases Core Spark Spark Streaming

•  Pordolio Risk Analysis •  ETL Pipeline Speed-‐Up •  20+ years of stock data Financial

Services

Health

•  Iden;fy disease-‐causing genes in the full human genome

•  Calculate Jaccard scores on health care data sets

ERP

•  Op;cal Character Recogni;on and Bill Classifica;on

•  Trend analysis •  Document classifica;on (LDA) •  Fraud analy;cs Data

Services

1010

•  Online Fraud Detec;on

Financial Services

Health

•  Incident Predic;on for Sepsis

Retail

•  Online Recommenda;on Systems •  Real-‐Time Inventory Management

Ad Tech

•  Real-‐Time Ad Performance Analysis


Problem

Too many knobs Developers don’t always write good code Debugging is hard

Cloudera Spark Support Issues


Need for tools

•  Not enough tools in Big Data space •  Develop-‐deploy-‐debug cycle is complex


Auto-‐configura;on Performance Op;miza;on Debugging

Preven;ve Care Correc;ve Care


Common pidalls

•  Executor (mis)configura;on •  Number, size

•  Insufficient parallelism

• Number of par;;ons •  YARN memory overhead •  Fetch failures •  Timeout values, ulimit

•  Caching and serializa;on • Use of collect() and such • Use of groupByKey() •  reduceBykey()

• Use of rdd.forEach() •  rdd.forEachPar::on()

• GC tuning


Use run;me informa;on •  Run applica;on and monitor run;me info

•  Pros •  More accurate

•  Cons •  High internal and external complexity

•  Ini;al run needs configura;on

Use load ;me informa;on Input data size Cluster informa;on Applica;on code

Pros Zero external complexity Out of the box usability & performance Configura;on recommenda;ons for performance

Cons Not as accurate

Auto-‐configura;on (Design Op;ons)

Configura;on language num-‐exec = 2 * num-‐nodes

Pros More expressive

Cons User needs to specify heuris;cs Most heuris;cs are non-‐trivial


Use run;me informa;on Run applica;on and monitor run;me info

Pros More accurate

Cons High internal and external complexity Ini;al run needs starter configura;on

Use load ;me informa;on Input data size Cluster informa;on Applica;on code

Pros Zero external complexity Out of the box usability & performance Configura;on recommenda;ons for performance

Cons Not as accurate


Configura;on language •  num-‐exec = 2 * num-‐nodes

•  Pros •  More expressive

•  Cons •  User needs to specify heuris;cs •  Heuris;cs are non-‐trivial


Use run;me informa;on Run applica;on and monitor run;me info

Pros More accurate

Cons High internal and external complexity Ini;al run needs starter configura;on

Use load ;me informa;on •  Input data size •  Cluster informa;on •  Applica;on code

•  Pros •  Zero external complexity •  Out-‐of-‐the-‐box usability & performance

•  Cons •  May not be as accurate


Configura;on language num-‐exec = 2 * num-‐nodes

Pros More expressive

Cons User needs to specify heuris;cs Most heuris;cs are non-‐trivial


Interface Output 1  spark-‐final.conf

(configura;on seqngs) 2  spark-‐conf.advice

(configura;on recommenda;ons) 3  command line to execute job 4  op;mizedCode.scala

(op;mized code) 5  spark-‐code.advice

(code recommenda;ons) 6  op;miza;on-‐report.txt

(op;miza;on report)

Input • Cluster info •  Input data size • Deploy mode • Applica;on code path ..

SparkAid

Auto-‐config

Op;mizer


Input: Interac;ve Command Line


Output (1): Configura;on File

• ~100 default seqngs • 15-‐20 seqngs configured with heuris;cs •  Executor memory •  Executor cores •  Storage level …


Output (2): Recommenda;ons File

•  Recommenda;ons for Spark and CM configura;ons •  spark.yarn.executor.memoryOverhead Increase this if YARN containers fail/run out of memory

•  spark.akka.;meout Increase if GC pauses cause problems

•  spark.default.parallelism Try doubling this value for poten;al performance gains


Output (3): Command Line

•  spark-‐submit -‐-‐master yarn -‐-‐deploy-‐mode cluster -‐-‐class pagerank -‐-‐proper;es-‐file spark-‐final.conf

-‐-‐driver-‐cores 16 -‐-‐driver-‐memory 32g /path/to/spark.jar


Design Overview Auto-‐config •  Implements a bunch of heuris;cs

•  Draws on past experience •  And experimenta;on

•  Wriuen in Java

Op;mizer •  Parses source code

•  Looks for performance bugs •  Fixes performance bugs

•  Modifies source code •  Generates advice

•  Wriuen in Python

Debugger •  Analyzes run;me informa;on

•  Logs, stack etc •  Helps visualize •  Generates sugges;ons

•  Configura;on •  Code changes

•  Not yet implemented


Integrate with Spark?

•  Op;mizer and Debugger completely external to Spark •  Config defaults in Spark will likely improve over ;me •  Adding non-‐essen;al features to Spark repo

•  Increases complexity • Maintenance effort

•  spark-‐packages.org and Cloudera labs exist for that purpose


Auto-‐config Heuris;cs

int calculatedNumExecutorsPerNode = (int)(effectiveMemoryPerNode / idealExecutorMemory);!double finalExecutorMemory = idealExecutorMemory;!boolean recalculateFlag = false;!if (calculatedNumExecutorsPerNode > upperBound){!

!numExecutorsPerNode = upperBound;!!recalculateFlag = true;!

}!else if (calculatedNumExecutorsPerNode < lowerBound){!

!numExecutorsPerNode = lowerBound;!!recalculateFlag = true;!

}!else{ !

!numExecutorsPerNode = calculatedNumExecutorsPerNode;!!double currMemSizePerNode = idealExecutorMemory * numExecutorsPerNode;!!double leftOverMemPerNode = effectiveMemoryPerNode - currMemSizePerNode;!!if(leftOverMemPerNode > (idealExecutorMemory / 2)){!! !recalculateFlag = true;!!}!

}!if(recalculateFlag){!

! finalExecutorMemory = effectiveMemoryPerNode/numExecutorsPerNode!}!

•  Example #1: spark.executor.memory


if (inputDeserialized < availableMemory){!!storageLevel = "MEMORY_ONLY";!

}else if (inputUncompressedSerialized < availableMemory){!!storageLevel = "MEMORY_ONLY_SER";!

}else if (inputCompressedSerialized < availableMemory){!!storageLevel = "MEMORY_ONLY_SER";!!rddCompress = “true”;!

}else{!!storageLevel = "MEMORY_AND_DISK_SER";!

}!

•  Example #2: spark.storage.level



int totalCoresAvailable = (int)(numCoresPerNode * (numNodes - numJobs) * resourceFraction);!int calculatedParallelism = (int)(parallelismFactor * totalCoresAvailable);!

•  Example #3: spark.default.parallelism


if (inputsTable.get("fileSystem").equals("ext4") || inputsTable.get("fileSystem").equals("xfs")){!!shuffleConsolidateFiles = "true";!

}!

•  Example #4: spark.shuffle.consolidateFiles


Performance Op;mizer


Performance Op;mizer

•  Find performance bugs using poor man’s sta;c analysis •  Iden;fy pauerns and generate op;mized code and recommenda;ons • RDD-‐cache op;miza;on •  Storage-‐frac;on op;miza;on • Parallelism-‐level op;miza;on • GroupByKey-‐replace op;miza;on


Preprocessing

• Classes, func;ons, nested func;ons • Loops, comments, line numbers, etc

Analysis

• RDD iden;fica;on • UD analysis

Op;miza;on • Four op;miza;ons

Code genera;on

• Separate phase


Op;miza;ons #1: RDD-‐cache op;miza;on •  Find RDDs that can be cached •  Being read in a loop •  Not being assigned in the loop

•  Insert rdd.cache() before the loop


Op;miza;ons #1: RDD-‐cache op;miza;on •  Find RDDs that can be cached •  Being read in a loop •  Not being assigned in the loop

•  Insert rdd.cache() before the loop

#2: Storage-‐frac;on op;miza;on •  Look for RDD cache instances •  If no RDD is being cached •  spark.storage.memoryFrac;on set to 0.2


Op;miza;ons

•  Modify RDD instan;a;ons to be consistent with spark.default.parallelism value

#3: Parallelism-‐level op;miza;on


Op;miza;ons

•  Modify RDD instan;a;ons to be consistent with spark.default.parallelism value

#4: GroupByKey-‐replace op;miza;on •  Recommend reduceByKey () instead of groupByKey()

#3: Parallelism-‐level op;miza;on


Op;mizer Output Files


Output (1): Op;mized Code object PageRank {!

!def main(args: Array[String]) {!! !val conf = new SparkConf().setAppName("PageRank")!! !val sc = new SparkContext(conf)!

! !val key_links_txt = sc.textFile("hdfs://cloudera.com:8020/user/pagerank_input", 3136)!! !val key_links = key_links_txt.map( line => line.split("\t"))!! ! ! ! ! !.map(array => (array(0),array(1))).distinct().groupByKey()!

! !var ranks = key_links.mapValues (v => 1.0) ! key_links.cache()!! !for (i <-0 until 2){!! ! !val contributions = key_links.join(ranks).flatMap{!

! ! ! !case (pageId, (links,rank)) =>!! ! ! ! !links.map(dest => (dest, rank/links.size))!! ! ! !}!

! ! ! !ranks = contributions.reduceByKey((x,y)=> x+y).mapValues(v => 0.15 + 0.85 * v)!! ! !}!! !ranks.collect()!!}!

}!


Output (2): Code Advice ===================== GroupByKey() Recommendation ========================!Consider using reduceByKey() instead of groupByKey() if possible at Line 12:!key_links_txt.map( line => line.split("\t")).map(array => (array(0),array(1))).distinct().groupByKey()!

Output (3): Op;miza;on Report


Thank you Ethan Chan| Performance Team UIUC 2015 , Stanford 2017 Nishkam Ravi | Mentor Silvius Rus | Manager

Demo


Uni;ng Spark and Hadoop The One Pladorm Ini;a;ve

Management Leverage Hadoop-‐na;ve RM

Usability

Security Full support for Hadoop security

and beyond

Scale Enable 10k-‐node clusters

Streaming Support for 80% of common stream

processing workloads


Exploratory Effort

No plan to release in the immediate future


Thanks! Nishkam Ravi, Ethan Chan [email protected]

conﬁguringandop;mizingspark...

Documents