csce678 - mapreduce & hadoopcourses.cse.tamu.edu/.../csce678/s19/slides/mapreduce.pdfmapreduce...

MapReduce

CSCE678

Parallel Computing is Hard (1/2)

• Data partitioning needs to be done by developers

• Which data need to be processed together?

• Synchronization can get pretty complex

• One node waits for some data to be ready by other nodes

• Threading (on each node) causes race conditions

2

Data Pool

Parallel Computing is Hard (2/2)

• Wasteful data movement between nodes

• How to schedule the best route for sending the data?

• Scaling computation to more nodes

• Changing the partitions of data during computation

• What if a node failed? Do we rerun everything on the node?

3

Data Pool

MapReduce (1/3)

• A programming model for enabling:

• Automatic parallelization and distribution of workloads

• Data movement scheduling and optimization

• Scaling out computation to more commodity servers without affecting any running jobs

• Fault tolerance: handling machine failures

4

MapReduce (2/3)

• Many data problems can be deconstructed as Map and

Reduce operations

5

Data split 1

Map:extract useful informationfrom each data record

Reduce:aggregating or filteringmultiple records

Outputfile 1

Outputfile 2

Data split 2

Data split 3

Node

Node

Node

Localwrite

Node

Node

Sort +remotereadRead Write

MapReduce (3/3)

6

For each key-value pair,generate a list of key-value outputs

Collect all map outputswith the same key

Aggregated reduce outputs

• Map: (k1, v1) → list(k2, v2)

• Reduce: (k2, list(v2)) → list(v2)

Basic Example: Word Count

• Problem: counting the occurrence of each word

7

• Map: (k1, v1) → list(k2, v2)


(A.txt, “Hello This Is Hello Michael”)

(B.txt, “Michael Hello This”)


• Problem: counting the occurrence of each word

8

• Map: (k1, v1) → list(k2, v2)


(Hello, 1), (This, 1), (Is, 1), (Hello, 1), (Michael, 1)

(Michael, 1), (Hello, 1), (This, 1)

( Hello, [1, 1, 1] )( This, [1, 1] )

( Is, [1] )( Michael, [1, 1] )

Hello: 3

This: 2

Is: 1Michael: 2

(A.txt, “Hello This Is Hello Michael”)

(B.txt, “Michael Hello This”)

For each word in each value, emit (word, 1)

For each key, emit (key, sum of all values)


9

It will be seen that this mere painstaking burrowerand grub-worm of a poor devil of a Sub-Subappears to have gone through the long Vaticansand streetstalls of the earth, picking up whateverrandom allusions to whales he could anyways findin any book whatsoever, sacred or profane.Therefore you must not, in every case at least,take the higgledy-piggledy whale statements,however authentic, in these extracts, for veritablegospel cetology. Far from it. As touching theancient authors generally, as well as the poetshere appearing, these extracts are solely valuableor entertaining, as affording a glancing bird's eyeview of what has been promiscuously said,thought, fancied, and sung of Leviathan, by manynations and generations, including our own.

it 1will 1be 1seen 1that 1this 1mere 1 painstaking 1 burrower 1and 1grub-worm 1of 1a 1poor 1devil 1of 1

a 1a 1aback 1aback 1abaft 1abaft 1abandon 1abandon 1 abandon 1 abandoned 1 abandoned 1 abandoned 1 abandoned 1 abandoned 1 abandoned 1 abandoned 1

2

2

2

3

7

map reduce

Unsorted,Unaggregated

Sorted,Aggregated

Basic Example: File Grep

• Problem: searching lines containing the word “Michael”

10

• Map: (k1, v1) → list(k2, v2)


(line 1, “Hello This Is Michael”)

(line 3, “Michael Hello”)

(line 2, “Hello Again”)

Basic Example: File Grep

• Problem: searching lines containing the word “Michael”

11

• Map: (k1, v1) → list(k2, v2)


(line 1, “Hello This Is Michael”)

(line 3, “Michael Hello”)

(“Hello This is Michael”, 1)

(line 2, “Hello Again”)

(“Michael hello”, 1)





For each value containing “Michael”, emit (value, 1)

Emit each key-value pair from input

Apache Hadoop

• The most common and open-source MapReduce

implementation

• Containing node manager (resource manager and

task scheduler) and storage manager (HDFS)

• Basis of almost all MapReduce cloud offerings

• Amazon Elastic MapReduce

• Azure HDInsight

• Google Cloud Dataproc

12

Data Partitioning & Movement

• HDFS (Hadoop file system)

• Partitions input files into multiple splits (shards)

• Replicating splits (shards) across nodes

13

Split 1 Split 2 Split 3 Split 4 Split M …

Input files


• HDFS (Hadoop file system)

• Partitions input files into multiple splits (shards)

• Replicating splits (shards) across nodes

14

Split 1 Split 2 Split 3 Split 4

Input files

Split 1 Split 3


Split 2 Split 4Node 1

Node 2

Node 3

Node 4


• Move data to operations ➔ Expensive network I/O

Move operations to data ➔ Cost-effective

15


Input files

Split 1 Split 3


Split 2 Split 4Node 1

Node 2

Node 3

Node 4

Task 1 Task 2 Task 3 Task 4

Task 2 Task 3

Task 4

More data replicas= more nodes for scheduling map tasks

Task 1

Slave NodeSlave NodeMaster Node

Scheduling

• Hadoop master forks multiple workers across nodes

• Each worker is a single thread

• Each idle worker can be assigned as:

• Mapper: each work on a data split

• Reducer: each work on a part of map outputs

16

Master

Worker Worker Worker Worker

Mapper Mapper Mapper Reducer

Remote fork

Scheduler

Dealing with Stragglers

• Stragglers are workers that run too long

• Example: a machine with a bad disk can slow down its read from 30MB/s to 1MB/s

• Backup tasks:

• Spawning backups of in-complete tasks when the whole computation is close to completion

• If the backup task finishes first, kill the original task

17

Fault Tolerance

• Hadoop master pings each node periodically

• Recovery from a node failure

• Both map and reduce are deterministic

• Re-execute any tasks which not yet sync outputs to HDFS

• Can recover from cluster failure or network outage

• Master failure:

• If Hadoop master fails, the whole system needs to abort

• Hadoop 2.0: high availability with two masters

18

Partitioner

• Decides which reducer to process the map outputs

• Default partitioner:

• Same key ➔ always processed by the same reducer

• Users can customize partitioner

• To change the way of grouping map outputs for reducersEx: Dates as keys ➔ group by months

19

(k, v) → Hash(k) mod #reducers

Shuffling & Sorting

• After partitioning, map outputs are sorted by keys

20

(Hello, 2)(This, 1)(Is, 1)

(Michael, 1)

(Michael, 1)(Hello, 1)(This, 1)

Map outputs:

(Hello, 2)(Hello, 1)(This, 1)(This, 1)

After partitioningand sorting:

(Is, 1)(Michael, 1)(Michael, 1)

(Hello, [2, 1])(This, [1, 1])

Reduce inputs:

(Is, [1])(Michael, [1, 1])

Advanced Example: TeraSort

• Problem: How to sort terabytes of data

21

Map Partition Reduce

Advanced Example: TeraSort

• Problem: How to sort terabytes of data

22

Map Partition Reduce

(k, v) → (k, v) k /# Reducer

Max - Min(k, [v]) → (k, [v])

Default (no-op) mapper

k = 15k = 4k = 10

k = 7k = 18k = 3

k = 4k = 7k = 3

k = 15k = 10k = 18

k = 3k = 4k = 7

k = 10k = 15k = 18

Partitioninto ranges Sorted

Default (no-op) reducer

0 <= k < 10

10 <= k < 20

TeraSort Performance

• TeraGen + TeraSort + TeraValidate (O’Malley 2008)

• 10 billion key-value pairs

• 910 machines with 4 dual-core Xeon CPUs, 8GB RAM

• 1800 mappers and 1800 reducers

23

All reducers completedwithin 209 seconds

Lessons from MapReduce

• A programming model with load distribution in mind

• Good at processing key-value data

• Easily scale out computation to nearly 1000 machines

• Used for calculating page ranks in Google

• Problems:

• Batch-oriented, can take too long to finish a job

• Reducers have to wait for mappers

• Cannot handle relational data (i.e., SQL)

24

csce678 - mapreduce & hadoopcourses.cse.tamu.edu/.../csce678/s19/slides/mapreduce.pdfmapreduce...

Documents