csce678 - mapreduce & hadoopcourses.cse.tamu.edu/.../csce678/s19/slides/mapreduce.pdfmapreduce...
TRANSCRIPT
MapReduce
CSCE678
Parallel Computing is Hard (1/2)
• Data partitioning needs to be done by developers
• Which data need to be processed together?
• Synchronization can get pretty complex
• One node waits for some data to be ready by other nodes
• Threading (on each node) causes race conditions
2
Data Pool
Parallel Computing is Hard (2/2)
• Wasteful data movement between nodes
• How to schedule the best route for sending the data?
• Scaling computation to more nodes
• Changing the partitions of data during computation
• What if a node failed? Do we rerun everything on the node?
3
Data Pool
MapReduce (1/3)
• A programming model for enabling:
• Automatic parallelization and distribution of workloads
• Data movement scheduling and optimization
• Scaling out computation to more commodity servers without affecting any running jobs
• Fault tolerance: handling machine failures
4
MapReduce (2/3)
• Many data problems can be deconstructed as Map and
Reduce operations
5
Data split 1
Map:extract useful informationfrom each data record
Reduce:aggregating or filteringmultiple records
Outputfile 1
Outputfile 2
Data split 2
Data split 3
Node
Node
Node
Localwrite
Node
Node
Sort +remotereadRead Write
MapReduce (3/3)
6
For each key-value pair,generate a list of key-value outputs
Collect all map outputswith the same key
Aggregated reduce outputs
• Map: (k1, v1) → list(k2, v2)
• Reduce: (k2, list(v2)) → list(v2)
Basic Example: Word Count
• Problem: counting the occurrence of each word
7
• Map: (k1, v1) → list(k2, v2)
• Reduce: (k2, list(v2)) → list(v2)
(A.txt, “Hello This Is Hello Michael”)
(B.txt, “Michael Hello This”)
Basic Example: Word Count
• Problem: counting the occurrence of each word
8
• Map: (k1, v1) → list(k2, v2)
• Reduce: (k2, list(v2)) → list(v2)
(Hello, 1), (This, 1), (Is, 1), (Hello, 1), (Michael, 1)
(Michael, 1), (Hello, 1), (This, 1)
( Hello, [1, 1, 1] )( This, [1, 1] )
( Is, [1] )( Michael, [1, 1] )
Hello: 3
This: 2
Is: 1Michael: 2
(A.txt, “Hello This Is Hello Michael”)
(B.txt, “Michael Hello This”)
For each word in each value, emit (word, 1)
For each key, emit (key, sum of all values)
Basic Example: Word Count
9
It will be seen that this mere painstaking burrowerand grub-worm of a poor devil of a Sub-Subappears to have gone through the long Vaticansand streetstalls of the earth, picking up whateverrandom allusions to whales he could anyways findin any book whatsoever, sacred or profane.Therefore you must not, in every case at least,take the higgledy-piggledy whale statements,however authentic, in these extracts, for veritablegospel cetology. Far from it. As touching theancient authors generally, as well as the poetshere appearing, these extracts are solely valuableor entertaining, as affording a glancing bird's eyeview of what has been promiscuously said,thought, fancied, and sung of Leviathan, by manynations and generations, including our own.
it 1will 1be 1seen 1that 1this 1mere 1 painstaking 1 burrower 1and 1grub-worm 1of 1a 1poor 1devil 1of 1
a 1a 1aback 1aback 1abaft 1abaft 1abandon 1abandon 1 abandon 1 abandoned 1 abandoned 1 abandoned 1 abandoned 1 abandoned 1 abandoned 1 abandoned 1
2
2
2
3
7
map reduce
Unsorted,Unaggregated
Sorted,Aggregated
Basic Example: File Grep
• Problem: searching lines containing the word “Michael”
10
• Map: (k1, v1) → list(k2, v2)
• Reduce: (k2, list(v2)) → list(v2)
(line 1, “Hello This Is Michael”)
(line 3, “Michael Hello”)
(line 2, “Hello Again”)
Basic Example: File Grep
• Problem: searching lines containing the word “Michael”
11
• Map: (k1, v1) → list(k2, v2)
• Reduce: (k2, list(v2)) → list(v2)
(line 1, “Hello This Is Michael”)
(line 3, “Michael Hello”)
(“Hello This is Michael”, 1)
(line 2, “Hello Again”)
(“Michael hello”, 1)
(“Hello This is Michael”, 1)
(“Michael hello”, 1)
(“Hello This is Michael”, 1)
(“Michael hello”, 1)
For each value containing “Michael”, emit (value, 1)
Emit each key-value pair from input
Apache Hadoop
• The most common and open-source MapReduce
implementation
• Containing node manager (resource manager and
task scheduler) and storage manager (HDFS)
• Basis of almost all MapReduce cloud offerings
• Amazon Elastic MapReduce
• Azure HDInsight
• Google Cloud Dataproc
12
Data Partitioning & Movement
• HDFS (Hadoop file system)
• Partitions input files into multiple splits (shards)
• Replicating splits (shards) across nodes
13
Split 1 Split 2 Split 3 Split 4 Split M …
Input files
Data Partitioning & Movement
• HDFS (Hadoop file system)
• Partitions input files into multiple splits (shards)
• Replicating splits (shards) across nodes
14
Split 1 Split 2 Split 3 Split 4
Input files
Split 1 Split 3
Split 1 Split 2 Split 3 Split 4
Split 2 Split 4Node 1
Node 2
Node 3
Node 4
Data Partitioning & Movement
• Move data to operations ➔ Expensive network I/O
Move operations to data ➔ Cost-effective
15
Split 1 Split 2 Split 3 Split 4
Input files
Split 1 Split 3
Split 1 Split 2 Split 3 Split 4
Split 2 Split 4Node 1
Node 2
Node 3
Node 4
Task 1 Task 2 Task 3 Task 4
Task 2 Task 3
Task 4
More data replicas= more nodes for scheduling map tasks
Task 1
Slave NodeSlave NodeMaster Node
Scheduling
• Hadoop master forks multiple workers across nodes
• Each worker is a single thread
• Each idle worker can be assigned as:
• Mapper: each work on a data split
• Reducer: each work on a part of map outputs
16
Master
Worker Worker Worker Worker
Mapper Mapper Mapper Reducer
Remote fork
Scheduler
Dealing with Stragglers
• Stragglers are workers that run too long
• Example: a machine with a bad disk can slow down its read from 30MB/s to 1MB/s
• Backup tasks:
• Spawning backups of in-complete tasks when the whole computation is close to completion
• If the backup task finishes first, kill the original task
17
Fault Tolerance
• Hadoop master pings each node periodically
• Recovery from a node failure
• Both map and reduce are deterministic
• Re-execute any tasks which not yet sync outputs to HDFS
• Can recover from cluster failure or network outage
• Master failure:
• If Hadoop master fails, the whole system needs to abort
• Hadoop 2.0: high availability with two masters
18
Partitioner
• Decides which reducer to process the map outputs
• Default partitioner:
• Same key ➔ always processed by the same reducer
• Users can customize partitioner
• To change the way of grouping map outputs for reducersEx: Dates as keys ➔ group by months
19
(k, v) → Hash(k) mod #reducers
Shuffling & Sorting
• After partitioning, map outputs are sorted by keys
20
(Hello, 2)(This, 1)(Is, 1)
(Michael, 1)
(Michael, 1)(Hello, 1)(This, 1)
Map outputs:
(Hello, 2)(Hello, 1)(This, 1)(This, 1)
After partitioningand sorting:
(Is, 1)(Michael, 1)(Michael, 1)
(Hello, [2, 1])(This, [1, 1])
Reduce inputs:
(Is, [1])(Michael, [1, 1])
Advanced Example: TeraSort
• Problem: How to sort terabytes of data
21
Map Partition Reduce
Advanced Example: TeraSort
• Problem: How to sort terabytes of data
22
Map Partition Reduce
(k, v) → (k, v) k /# Reducer
Max - Min(k, [v]) → (k, [v])
Default (no-op) mapper
k = 15k = 4k = 10
k = 7k = 18k = 3
k = 4k = 7k = 3
k = 15k = 10k = 18
k = 3k = 4k = 7
k = 10k = 15k = 18
Partitioninto ranges Sorted
Default (no-op) reducer
0 <= k < 10
10 <= k < 20
TeraSort Performance
• TeraGen + TeraSort + TeraValidate (O’Malley 2008)
• 10 billion key-value pairs
• 910 machines with 4 dual-core Xeon CPUs, 8GB RAM
• 1800 mappers and 1800 reducers
23
All reducers completedwithin 209 seconds
Lessons from MapReduce
• A programming model with load distribution in mind
• Good at processing key-value data
• Easily scale out computation to nearly 1000 machines
• Used for calculating page ranks in Google
• Problems:
• Batch-oriented, can take too long to finish a job
• Reducers have to wait for mappers
• Cannot handle relational data (i.e., SQL)
24