![Page 1: Yang PENG Network and System Lab CSE, HKUST Monday, March 11, 2013 ypengab@cse.ust.hk](https://reader031.vdocuments.net/reader031/viewer/2022013012/56816724550346895ddbaeeb/html5/thumbnails/1.jpg)
Comp6611 Course LectureBig data applications
Yang PENG Network and System LabCSE, HKUST
Monday, March 11, [email protected]
Material adapted from slides by Christophe Bisciglia, Aaron Kimball, & Sierra Michels-Slettvet, Google Distributed Computing Seminar, 2007 (licensed under Creation Commons Attribution 3.0 License)
![Page 2: Yang PENG Network and System Lab CSE, HKUST Monday, March 11, 2013 ypengab@cse.ust.hk](https://reader031.vdocuments.net/reader031/viewer/2022013012/56816724550346895ddbaeeb/html5/thumbnails/2.jpg)
Today's Topics MapReduce
Background information/overview Map and Reduce
-------- from a programmer's perspective Architecture and workflow
-------- a global overview Virtues and defects
Improvement Spark
Background
MapReduce
Spark
![Page 3: Yang PENG Network and System Lab CSE, HKUST Monday, March 11, 2013 ypengab@cse.ust.hk](https://reader031.vdocuments.net/reader031/viewer/2022013012/56816724550346895ddbaeeb/html5/thumbnails/3.jpg)
MapReduce Background Before MapReduce, large-scale data processing
was difficult Managing parallelization and distribution
Application development is tedious and hard to debug Resource scheduling and load-balancing
Data storage and distribution Distributed file system “Moving computation is cheaper than moving data.”
Fault/crash tolerance ScalabilityBackground
MapReduce
Spark
![Page 4: Yang PENG Network and System Lab CSE, HKUST Monday, March 11, 2013 ypengab@cse.ust.hk](https://reader031.vdocuments.net/reader031/viewer/2022013012/56816724550346895ddbaeeb/html5/thumbnails/4.jpg)
Does “Divide and Conquer paradigm” still work in big data?
Background
MapReduce
Spark
Work
𝒘𝟏 𝒘𝟐 𝒘𝟑
𝒘𝟏 𝒘𝟐 𝒘𝟑
Worker Worker Worker
Partition
CombineResult
![Page 5: Yang PENG Network and System Lab CSE, HKUST Monday, March 11, 2013 ypengab@cse.ust.hk](https://reader031.vdocuments.net/reader031/viewer/2022013012/56816724550346895ddbaeeb/html5/thumbnails/5.jpg)
Programming Model• Opportunity: design an software abstraction undertake the divide
and conquer and reduce programmers' workload for• resource management• task scheduling• distributed synchronization and communication
• Functional programming, which has long history, provides some high-order functions to support divide and conquer. Map: do something to everything in a list Fold: combine results of a list in some way
Background
MapReduce
Spark
Computer Computer Computer
Abstraction
Application
…
![Page 6: Yang PENG Network and System Lab CSE, HKUST Monday, March 11, 2013 ypengab@cse.ust.hk](https://reader031.vdocuments.net/reader031/viewer/2022013012/56816724550346895ddbaeeb/html5/thumbnails/6.jpg)
Map Map is a higher-order function How map works:
Function is applied to every element in a list Result is a new list
f f f f fBackground
MapReduce
Spark
![Page 7: Yang PENG Network and System Lab CSE, HKUST Monday, March 11, 2013 ypengab@cse.ust.hk](https://reader031.vdocuments.net/reader031/viewer/2022013012/56816724550346895ddbaeeb/html5/thumbnails/7.jpg)
Fold Fold is also a higher-order function How fold works:
Accumulator set to initial value Function applied to list element and the accumulator Result stored in the accumulator Repeated for every item in the list Result is the final value in the accumulator
f f f f f final value
Initial value
Background
MapReduce
Spark
![Page 8: Yang PENG Network and System Lab CSE, HKUST Monday, March 11, 2013 ypengab@cse.ust.hk](https://reader031.vdocuments.net/reader031/viewer/2022013012/56816724550346895ddbaeeb/html5/thumbnails/8.jpg)
Map/Fold in Action Simple map example:
Fold examples:
Sum of squares:
(map (lambda (x) (* x x)) '(1 2 3 4 5)) '(1 4 9 16 25)
(fold + 0 '(1 2 3 4 5)) 15(fold * 1 '(1 2 3 4 5)) 120
(define (sum-of-squares v) (fold + 0 (map (lambda (x) (* x x)) v)))(sum-of-squares '(1 2 3 4 5)) 55
Background
MapReduce
Spark
![Page 9: Yang PENG Network and System Lab CSE, HKUST Monday, March 11, 2013 ypengab@cse.ust.hk](https://reader031.vdocuments.net/reader031/viewer/2022013012/56816724550346895ddbaeeb/html5/thumbnails/9.jpg)
MapReduce Programmers specify two functions:
map (k1,v1) → list(k2,v2) reduce (k2, list (v2)) → list(v2)
function map(String name, String document): // K1 name: document name // V1 document: document contents for each word w in document: emit (w, 1) function reduce(String word, Iterator partialCounts): // K2 word: a word // list(V2) partialCounts: a list of aggregated partial counts sum = 0 for each pc in partialCounts: sum += ParseInt(pc) emit (word, sum)
Background
MapReduce
Spark
An implementation of WordCount
![Page 10: Yang PENG Network and System Lab CSE, HKUST Monday, March 11, 2013 ypengab@cse.ust.hk](https://reader031.vdocuments.net/reader031/viewer/2022013012/56816724550346895ddbaeeb/html5/thumbnails/10.jpg)
It's just divide and conquer!Data Store
Initial kv pairs
mapmap
Initial kv pairs
map
Initial kv pairs
map
Initial kv pairs
k1, values…
k2, values…k3, values…
k1, values…
k2, values…k3, values…
k1, values…
k2, values…k3, values…
k1, values…
k2, values…k3, values…
Barrier: aggregate values by keys
reduce
k1, values…
final k1 values
reduce
k2, values…
final k2 values
reduce
k3, values…
final k3 values
Background
MapReduce
Spark
![Page 11: Yang PENG Network and System Lab CSE, HKUST Monday, March 11, 2013 ypengab@cse.ust.hk](https://reader031.vdocuments.net/reader031/viewer/2022013012/56816724550346895ddbaeeb/html5/thumbnails/11.jpg)
Behind the scenes…
Background
MapReduce
Spark
![Page 12: Yang PENG Network and System Lab CSE, HKUST Monday, March 11, 2013 ypengab@cse.ust.hk](https://reader031.vdocuments.net/reader031/viewer/2022013012/56816724550346895ddbaeeb/html5/thumbnails/12.jpg)
Programming interface input reader Map function partition function
The partition function is given the key and the number of reducers and returns the index of the desired reduce.
For load-balance, e.g. Hash function compare function
The compare function is used to sort computing output. Ordering guarantee
Reduce function output writer
Background
MapReduce
Spark
![Page 13: Yang PENG Network and System Lab CSE, HKUST Monday, March 11, 2013 ypengab@cse.ust.hk](https://reader031.vdocuments.net/reader031/viewer/2022013012/56816724550346895ddbaeeb/html5/thumbnails/13.jpg)
Ouput of a Hadoop jobypeng@vm115:~/hadoop-0.20.2$ bin/hadoop jar hadoop-0.20.2-examples.jar wordcount /user/hduser/wordcount/15G-enwiki-input /user/hduser/wordcount/15G-enwiki-output
13/01/16 07:00:48 INFO input.FileInputFormat: Total input paths to process : 1
13/01/16 07:00:49 INFO mapred.JobClient: Running job: job_201301160607_0003
13/01/16 07:00:50 INFO mapred.JobClient: map 0% reduce 0%
.........................
13/01/16 07:01:50 INFO mapred.JobClient: map 18% reduce 0%
13/01/16 07:01:52 INFO mapred.JobClient: map 19% reduce 0%
13/01/16 07:02:06 INFO mapred.JobClient: map 20% reduce 0%13/01/16 07:02:08 INFO mapred.JobClient: map 20% reduce 1%
13/01/16 07:02:10 INFO mapred.JobClient: map 20% reduce 2%
.........................
13/01/16 07:06:41 INFO mapred.JobClient: map 99% reduce 32%
13/01/16 07:06:47 INFO mapred.JobClient: map 100% reduce 33%13/01/16 07:06:55 INFO mapred.JobClient: map 100% reduce 39%
.........................
13/01/16 07:07:21 INFO mapred.JobClient: map 100% reduce 99%
13/01/16 07:07:31 INFO mapred.JobClient: map 100% reduce 100%
13/01/16 07:07:43 INFO mapred.JobClient: Job complete: job_201301160607_0003
(To continue.)
Background
MapReduce
Spark
Progress
![Page 14: Yang PENG Network and System Lab CSE, HKUST Monday, March 11, 2013 ypengab@cse.ust.hk](https://reader031.vdocuments.net/reader031/viewer/2022013012/56816724550346895ddbaeeb/html5/thumbnails/14.jpg)
Counters in a Hadoop job13/01/16 07:07:43 INFO mapred.JobClient: Counters: 18
13/01/16 07:07:43 INFO mapred.JobClient: Job Counters
13/01/16 07:07:43 INFO mapred.JobClient: Launched reduce tasks=24
13/01/16 07:07:43 INFO mapred.JobClient: Rack-local map tasks=17
13/01/16 07:07:43 INFO mapred.JobClient: Launched map tasks=249
13/01/16 07:07:43 INFO mapred.JobClient: Data-local map tasks=203
13/01/16 07:07:43 INFO mapred.JobClient: FileSystemCounters
13/01/16 07:07:43 INFO mapred.JobClient: FILE_BYTES_READ=12023025990
13/01/16 07:07:43 INFO mapred.JobClient: HDFS_BYTES_READ=15492905740
13/01/16 07:07:43 INFO mapred.JobClient: FILE_BYTES_WRITTEN=14330761040
13/01/16 07:07:43 INFO mapred.JobClient: HDFS_BYTES_WRITTEN=752814339
13/01/16 07:07:43 INFO mapred.JobClient: Map-Reduce Framework
13/01/16 07:07:43 INFO mapred.JobClient: Reduce input groups=39698527
13/01/16 07:07:43 INFO mapred.JobClient: Combine output records=508662829
13/01/16 07:07:43 INFO mapred.JobClient: Map input records=279422018
13/01/16 07:07:43 INFO mapred.JobClient: Reduce shuffle bytes=2647359503
13/01/16 07:07:43 INFO mapred.JobClient: Reduce output records=39698527
13/01/16 07:07:43 INFO mapred.JobClient: Spilled Records=828280813
13/01/16 07:07:43 INFO mapred.JobClient: Map output bytes=24932976267
13/01/16 07:07:43 INFO mapred.JobClient: Combine input records=2813475352
13/01/16 07:07:43 INFO mapred.JobClient: Map output records=2376465967
13/01/16 07:07:43 INFO mapred.JobClient: Reduce input records=71653444
Background
MapReduce
Spark
Summary of counters in job
![Page 15: Yang PENG Network and System Lab CSE, HKUST Monday, March 11, 2013 ypengab@cse.ust.hk](https://reader031.vdocuments.net/reader031/viewer/2022013012/56816724550346895ddbaeeb/html5/thumbnails/15.jpg)
Master in MapReduce Resource Management
Maintain the current resource usage of each Worker(CPU, RAM, Used & free disk space, etc.)
Examine worker failure periodically. Task Scheduling
“Moving computation is cheaper than moving data.” Map and reduce tasks are assigned to idle Workers. Tasks on failure workers will be re-scheduled. When job is close to end, it launches backup tasks.
Counter provides interactive job progress. stores the occurrences of various events. is helpful to performance tuning.
Background
MapReduce
Spark
![Page 16: Yang PENG Network and System Lab CSE, HKUST Monday, March 11, 2013 ypengab@cse.ust.hk](https://reader031.vdocuments.net/reader031/viewer/2022013012/56816724550346895ddbaeeb/html5/thumbnails/16.jpg)
Data-oriented Map scheduling
Launch map 1 on Worker 3
Switch
Worker 1
Worker 2
Worker 3
Switch
Worker 4
Worker 5
Worker 6
1
11
2
2
2 3
3
3
4 4
45
5 5
input =1 2 3 4 5+ + + +
input splits
Rack 1 Rack 2
Launch map 2 on Worker 4
Launch map 3 on Worker 1
Launch map 4 on Worker 2
Launch map 5 on Worker 5
Background
MapReduce
Spark
![Page 17: Yang PENG Network and System Lab CSE, HKUST Monday, March 11, 2013 ypengab@cse.ust.hk](https://reader031.vdocuments.net/reader031/viewer/2022013012/56816724550346895ddbaeeb/html5/thumbnails/17.jpg)
Data flow in MapReduce jobs
Mapper Reducer
other Mappers
other Reducers
circular buffer (in memory)
spills (on disk)
merged spills (on disk)
intermediate files (on disk)
Combiner
Background
MapReduce
Spark
GFSlocal split
rack-local split non-local split
GFS
![Page 18: Yang PENG Network and System Lab CSE, HKUST Monday, March 11, 2013 ypengab@cse.ust.hk](https://reader031.vdocuments.net/reader031/viewer/2022013012/56816724550346895ddbaeeb/html5/thumbnails/18.jpg)
Map internal The map phase reads the task’s input split from
GFS, parses it into records(key/value pairs), and applies the map function to each records.
After the map function has been applied to each record, the commit phase registers the final output to Master, which will tell reduce the location of map output.
Background
MapReduce
Spark
![Page 19: Yang PENG Network and System Lab CSE, HKUST Monday, March 11, 2013 ypengab@cse.ust.hk](https://reader031.vdocuments.net/reader031/viewer/2022013012/56816724550346895ddbaeeb/html5/thumbnails/19.jpg)
Reduce internal The shuffle phase fetches the reduce task’s input
data.
The sort phase groups records with the same key together.
The reduce phase applies the user-defined reduce function to each key and corresponding list of values.
Background
MapReduce
Spark
![Page 20: Yang PENG Network and System Lab CSE, HKUST Monday, March 11, 2013 ypengab@cse.ust.hk](https://reader031.vdocuments.net/reader031/viewer/2022013012/56816724550346895ddbaeeb/html5/thumbnails/20.jpg)
Backup Tasks There are barriers in a MapReduce job.
No reduce function executes until all maps finish. The job can not complete until all reduces finish.
The execution time of a job will be severely lengthened if a task is blocked.
Master schedules backup/speculative tasks for unfinished ones before the job is close to end.
A job will take 44% longer if backup tasks are disabled.
MapMapMap
Map
ReduceReduceReduce
Job complete
Background
MapReduce
Spark
![Page 21: Yang PENG Network and System Lab CSE, HKUST Monday, March 11, 2013 ypengab@cse.ust.hk](https://reader031.vdocuments.net/reader031/viewer/2022013012/56816724550346895ddbaeeb/html5/thumbnails/21.jpg)
Virtues and defects of MR
Virtues Towards large scale data Programming friendly Implicit parallelism Data-locality Fault/crash tolerance Scalability Open-source with good
ecosystem[1]
Defects Bad for iterative
ML algorithms Not sure
[1] http://docs.hortonworks.com/CURRENT/index.htm#About_Hortonworks_Data_Platform/Understanding_Hadoop_Ecosystem.htm
Background
MapReduce
Spark
![Page 22: Yang PENG Network and System Lab CSE, HKUST Monday, March 11, 2013 ypengab@cse.ust.hk](https://reader031.vdocuments.net/reader031/viewer/2022013012/56816724550346895ddbaeeb/html5/thumbnails/22.jpg)
Network traffic in MapReduce1. Map may read split from remote ChunkServer2. Reduce copy the output of Map3. Reduce output write to GFS
1
2 3
Background
MapReduce
Spark
![Page 23: Yang PENG Network and System Lab CSE, HKUST Monday, March 11, 2013 ypengab@cse.ust.hk](https://reader031.vdocuments.net/reader031/viewer/2022013012/56816724550346895ddbaeeb/html5/thumbnails/23.jpg)
Disk R/W in MapReduce1. ChunkServer reads local block for remote split fetching2. Spill intermediate result to disk3. Write the copied partition to local disk4. Write the result output to local ChunkServer5. Write the result output to remote ChunkServer
12
34
5Background
MapReduce
Spark
![Page 24: Yang PENG Network and System Lab CSE, HKUST Monday, March 11, 2013 ypengab@cse.ust.hk](https://reader031.vdocuments.net/reader031/viewer/2022013012/56816724550346895ddbaeeb/html5/thumbnails/24.jpg)
Iterative MapReduce
Performing graph algorithm Using MapReduce.
Background
MapReduce
Spark
![Page 25: Yang PENG Network and System Lab CSE, HKUST Monday, March 11, 2013 ypengab@cse.ust.hk](https://reader031.vdocuments.net/reader031/viewer/2022013012/56816724550346895ddbaeeb/html5/thumbnails/25.jpg)
Motivation of Spark Iterative algorithms (machine learning, graphs)
Interactive data mining tools (R, Excel, Python)
Background
MapReduce
Spark
![Page 26: Yang PENG Network and System Lab CSE, HKUST Monday, March 11, 2013 ypengab@cse.ust.hk](https://reader031.vdocuments.net/reader031/viewer/2022013012/56816724550346895ddbaeeb/html5/thumbnails/26.jpg)
Programming Model Fine-grained
Computing outputs of every iteration are distributed and store to stable storage
Coarse-grained Only logging the transformations to build a dataset(i.e. lineage)
Resilient distributed datasets (RDDs) Immutable, partitioned collections of objects Created through parallel transformations (map, filter,
groupBy, join, …) on data in stable storage Can be cached for efficient reuse
Actions on RDDs Count, reduce, collect, save, …
Background
MapReduce
Spark
![Page 27: Yang PENG Network and System Lab CSE, HKUST Monday, March 11, 2013 ypengab@cse.ust.hk](https://reader031.vdocuments.net/reader031/viewer/2022013012/56816724550346895ddbaeeb/html5/thumbnails/27.jpg)
Spark Operations
Transformations(define a new
RDD)
mapfilter
samplegroupByKeyreduceByKey
sortByKey
flatMapunionjoin
cogroupcross
mapValues
Actions(return a result to driver program)
collectreducecountsave
lookupKey
Background
MapReduce
Spark
![Page 28: Yang PENG Network and System Lab CSE, HKUST Monday, March 11, 2013 ypengab@cse.ust.hk](https://reader031.vdocuments.net/reader031/viewer/2022013012/56816724550346895ddbaeeb/html5/thumbnails/28.jpg)
Example: Log MiningLoad error messages from a log into memory, then interactively search for various patternslines = spark.textFile(“hdfs://...”)
errors = lines.filter(_.startsWith(“ERROR”))messages = errors.map(_.split(‘\t’)(2))cachedMsgs = messages.cache()
Block 1
Block 2
Block 3
Worker
Worker
Worker
Driver
cachedMsgs.filter(_.contains(“foo”)).countcachedMsgs.filter(_.contains(“bar”)).count. . .
tasksresults
Cache 1
Cache 2
Cache 3
Base RDDTransformed
RDD
Action
Result: full-text search of Wikipedia in <1 sec (vs 20
sec for on-disk data)
Result: scaled to 1 TB data in 5-7 sec
(vs 170 sec for on-disk data)
![Page 29: Yang PENG Network and System Lab CSE, HKUST Monday, March 11, 2013 ypengab@cse.ust.hk](https://reader031.vdocuments.net/reader031/viewer/2022013012/56816724550346895ddbaeeb/html5/thumbnails/29.jpg)
RDD Fault ToleranceRDDs maintain lineage information that can be used to reconstruct lost partitions
Ex:messages = textFile(...).filter(_.startsWith(“ERROR”)) .map(_.split(‘\t’)(2))
HDFS File Filtered RDD Mapped RDDfilter
(func = _.contains(...))map
(func = _.split(...))Background
MapReduce
Spark
![Page 30: Yang PENG Network and System Lab CSE, HKUST Monday, March 11, 2013 ypengab@cse.ust.hk](https://reader031.vdocuments.net/reader031/viewer/2022013012/56816724550346895ddbaeeb/html5/thumbnails/30.jpg)
Example: Logistic Regression
Goal: find best line separating two sets of points
+
–
+ ++
+
+
++ +
– ––
–
–
–– –
+
target
–
random initial line
Background
MapReduce
Spark
![Page 31: Yang PENG Network and System Lab CSE, HKUST Monday, March 11, 2013 ypengab@cse.ust.hk](https://reader031.vdocuments.net/reader031/viewer/2022013012/56816724550346895ddbaeeb/html5/thumbnails/31.jpg)
Example: Logistic Regression
val data = spark.textFile(...).map(readPoint).cache()
var w = Vector.random(D)
for (i <- 1 to ITERATIONS) { val gradient = data.map(p => (1 / (1 + exp(-p.y*(w dot p.x))) - 1) * p.y * p.x ).reduce(_ + _) w -= gradient}
println("Final w: " + w)Background
MapReduce
Spark
Keep variable “data” in memory
![Page 32: Yang PENG Network and System Lab CSE, HKUST Monday, March 11, 2013 ypengab@cse.ust.hk](https://reader031.vdocuments.net/reader031/viewer/2022013012/56816724550346895ddbaeeb/html5/thumbnails/32.jpg)
Logistic Regression Performance
1 5 10 20 300
50010001500200025003000350040004500
Hadoop
Spark
Number of Iterations
Run
ning
Tim
e (s
)
127 s / iteration
first iteration 174 sfurther iterations 6
sBackground
MapReduce
Spark
![Page 33: Yang PENG Network and System Lab CSE, HKUST Monday, March 11, 2013 ypengab@cse.ust.hk](https://reader031.vdocuments.net/reader031/viewer/2022013012/56816724550346895ddbaeeb/html5/thumbnails/33.jpg)
Spark Programming Interface (eg. page rank)
![Page 34: Yang PENG Network and System Lab CSE, HKUST Monday, March 11, 2013 ypengab@cse.ust.hk](https://reader031.vdocuments.net/reader031/viewer/2022013012/56816724550346895ddbaeeb/html5/thumbnails/34.jpg)
Representing RDDs
![Page 35: Yang PENG Network and System Lab CSE, HKUST Monday, March 11, 2013 ypengab@cse.ust.hk](https://reader031.vdocuments.net/reader031/viewer/2022013012/56816724550346895ddbaeeb/html5/thumbnails/35.jpg)
Spark Scheduler
Dryad-like DAGs Pipelines functions
within a stage Cache-aware work
reuse & locality Partitioning-aware
to avoid shuffles
= cached data partition
Background
MapReduce
Spark
![Page 36: Yang PENG Network and System Lab CSE, HKUST Monday, March 11, 2013 ypengab@cse.ust.hk](https://reader031.vdocuments.net/reader031/viewer/2022013012/56816724550346895ddbaeeb/html5/thumbnails/36.jpg)
Behavior with Not Enough RAM
Cache disabled
25% 50% 75% Fully cached
020406080
10068
.8
58.1
40.7
29.7
11.5
% of working set in memory
Iter
atio
n ti
me
(s)
![Page 37: Yang PENG Network and System Lab CSE, HKUST Monday, March 11, 2013 ypengab@cse.ust.hk](https://reader031.vdocuments.net/reader031/viewer/2022013012/56816724550346895ddbaeeb/html5/thumbnails/37.jpg)
Fault Recovery Results
1 2 3 4 5 6 7 8 9 10020406080
100120140
119
57 56 58 58
81
57 59 57 59
No Failure
Iteration
Iter
atri
on t
ime
(s)
![Page 38: Yang PENG Network and System Lab CSE, HKUST Monday, March 11, 2013 ypengab@cse.ust.hk](https://reader031.vdocuments.net/reader031/viewer/2022013012/56816724550346895ddbaeeb/html5/thumbnails/38.jpg)
Conclusion Both MapReduce and Spark are excellent big
data software, which are scalable, fault-tolerant, and programming friendly.
Especially, Spark provides a more effective method for iterative computing jobs.
Background
MapReduce
Spark
![Page 39: Yang PENG Network and System Lab CSE, HKUST Monday, March 11, 2013 ypengab@cse.ust.hk](https://reader031.vdocuments.net/reader031/viewer/2022013012/56816724550346895ddbaeeb/html5/thumbnails/39.jpg)
QUESTIONS?Thanks!