introduction to map/reduce -...
TRANSCRIPT
![Page 1: Introduction to Map/Reduce - support.pdnsoft.comsupport.pdnsoft.com/...Meeting/...Nezarat_MapReduce_Example-ver1… · Introduction to Map/Reduce Examples and Principles . Recall](https://reader035.vdocuments.net/reader035/viewer/2022071011/5fc9e63d42449c547531b698/html5/thumbnails/1.jpg)
Introduction to Map/Reduce
Examples and Principles
![Page 2: Introduction to Map/Reduce - support.pdnsoft.comsupport.pdnsoft.com/...Meeting/...Nezarat_MapReduce_Example-ver1… · Introduction to Map/Reduce Examples and Principles . Recall](https://reader035.vdocuments.net/reader035/viewer/2022071011/5fc9e63d42449c547531b698/html5/thumbnails/2.jpg)
Recall the framework:
D1 map()
• User defines <key,value>, mapper, and reducer
reduce() Output
![Page 3: Introduction to Map/Reduce - support.pdnsoft.comsupport.pdnsoft.com/...Meeting/...Nezarat_MapReduce_Example-ver1… · Introduction to Map/Reduce Examples and Principles . Recall](https://reader035.vdocuments.net/reader035/viewer/2022071011/5fc9e63d42449c547531b698/html5/thumbnails/3.jpg)
Recall the framework:
D1
D2
Dn
map()
map()
map()
• Hadoop handles the logistics
reduce()
reduce()
O1
Om
![Page 4: Introduction to Map/Reduce - support.pdnsoft.comsupport.pdnsoft.com/...Meeting/...Nezarat_MapReduce_Example-ver1… · Introduction to Map/Reduce Examples and Principles . Recall](https://reader035.vdocuments.net/reader035/viewer/2022071011/5fc9e63d42449c547531b698/html5/thumbnails/4.jpg)
Hadoop Rule of Thumb
• 1 mapper per data split (typically)
![Page 5: Introduction to Map/Reduce - support.pdnsoft.comsupport.pdnsoft.com/...Meeting/...Nezarat_MapReduce_Example-ver1… · Introduction to Map/Reduce Examples and Principles . Recall](https://reader035.vdocuments.net/reader035/viewer/2022071011/5fc9e63d42449c547531b698/html5/thumbnails/5.jpg)
Hadoop Rule of Thumb
• 1 mapper per data split (typically)
• 1 reducer per computer core (best parallelism)
![Page 6: Introduction to Map/Reduce - support.pdnsoft.comsupport.pdnsoft.com/...Meeting/...Nezarat_MapReduce_Example-ver1… · Introduction to Map/Reduce Examples and Principles . Recall](https://reader035.vdocuments.net/reader035/viewer/2022071011/5fc9e63d42449c547531b698/html5/thumbnails/6.jpg)
Hadoop Rule of Thumb
• 1 mapper per data split (typically)
• 1 reducer per computer core (best parallelism)
Processing Time
Number Output Files
![Page 7: Introduction to Map/Reduce - support.pdnsoft.comsupport.pdnsoft.com/...Meeting/...Nezarat_MapReduce_Example-ver1… · Introduction to Map/Reduce Examples and Principles . Recall](https://reader035.vdocuments.net/reader035/viewer/2022071011/5fc9e63d42449c547531b698/html5/thumbnails/7.jpg)
Wordcount Strategy
• Let <word, 1> be the <key,value> • Simple mapper & reducer • Hadoop did the hard work of shuffling &
grouping
![Page 8: Introduction to Map/Reduce - support.pdnsoft.comsupport.pdnsoft.com/...Meeting/...Nezarat_MapReduce_Example-ver1… · Introduction to Map/Reduce Examples and Principles . Recall](https://reader035.vdocuments.net/reader035/viewer/2022071011/5fc9e63d42449c547531b698/html5/thumbnails/8.jpg)
Good key-value properties • Simple • Enables reducers to get correct output
Shuffling & Grouping
Key-Value simplicity
![Page 9: Introduction to Map/Reduce - support.pdnsoft.comsupport.pdnsoft.com/...Meeting/...Nezarat_MapReduce_Example-ver1… · Introduction to Map/Reduce Examples and Principles . Recall](https://reader035.vdocuments.net/reader035/viewer/2022071011/5fc9e63d42449c547531b698/html5/thumbnails/9.jpg)
Good Task Decomposition:
Mappers: simple and separable Reducers: easy consolidation
![Page 10: Introduction to Map/Reduce - support.pdnsoft.comsupport.pdnsoft.com/...Meeting/...Nezarat_MapReduce_Example-ver1… · Introduction to Map/Reduce Examples and Principles . Recall](https://reader035.vdocuments.net/reader035/viewer/2022071011/5fc9e63d42449c547531b698/html5/thumbnails/10.jpg)
Example: Trending Wordcount
![Page 11: Introduction to Map/Reduce - support.pdnsoft.comsupport.pdnsoft.com/...Meeting/...Nezarat_MapReduce_Example-ver1… · Introduction to Map/Reduce Examples and Principles . Recall](https://reader035.vdocuments.net/reader035/viewer/2022071011/5fc9e63d42449c547531b698/html5/thumbnails/11.jpg)
Trending Wordcount • Twitter Data: date, message, location,
… [other metadata]
![Page 12: Introduction to Map/Reduce - support.pdnsoft.comsupport.pdnsoft.com/...Meeting/...Nezarat_MapReduce_Example-ver1… · Introduction to Map/Reduce Examples and Principles . Recall](https://reader035.vdocuments.net/reader035/viewer/2022071011/5fc9e63d42449c547531b698/html5/thumbnails/12.jpg)
Trending Wordcount • Twitter Data: date, message, location,
… [other metadata]
Task 1 Get word count by day Task 2 Get total word count
![Page 13: Introduction to Map/Reduce - support.pdnsoft.comsupport.pdnsoft.com/...Meeting/...Nezarat_MapReduce_Example-ver1… · Introduction to Map/Reduce Examples and Principles . Recall](https://reader035.vdocuments.net/reader035/viewer/2022071011/5fc9e63d42449c547531b698/html5/thumbnails/13.jpg)
Trending Wordcount
Task 1: get word count by day
![Page 14: Introduction to Map/Reduce - support.pdnsoft.comsupport.pdnsoft.com/...Meeting/...Nezarat_MapReduce_Example-ver1… · Introduction to Map/Reduce Examples and Principles . Recall](https://reader035.vdocuments.net/reader035/viewer/2022071011/5fc9e63d42449c547531b698/html5/thumbnails/14.jpg)
Trending Wordcount
Task 1: get word count by day Design: Use composite key Map/Reduce: <date word,count>
![Page 15: Introduction to Map/Reduce - support.pdnsoft.comsupport.pdnsoft.com/...Meeting/...Nezarat_MapReduce_Example-ver1… · Introduction to Map/Reduce Examples and Principles . Recall](https://reader035.vdocuments.net/reader035/viewer/2022071011/5fc9e63d42449c547531b698/html5/thumbnails/15.jpg)
Trending Wordcount
Task 2: get total word count
![Page 16: Introduction to Map/Reduce - support.pdnsoft.comsupport.pdnsoft.com/...Meeting/...Nezarat_MapReduce_Example-ver1… · Introduction to Map/Reduce Examples and Principles . Recall](https://reader035.vdocuments.net/reader035/viewer/2022071011/5fc9e63d42449c547531b698/html5/thumbnails/16.jpg)
Trending Wordcount
Task 2: get total word count Easy way: re-use previous wordcount
![Page 17: Introduction to Map/Reduce - support.pdnsoft.comsupport.pdnsoft.com/...Meeting/...Nezarat_MapReduce_Example-ver1… · Introduction to Map/Reduce Examples and Principles . Recall](https://reader035.vdocuments.net/reader035/viewer/2022071011/5fc9e63d42449c547531b698/html5/thumbnails/17.jpg)
Trending Wordcount
Task 2: get total word count Alternatively: use Task 1 output (it’s partially aggregated)
![Page 18: Introduction to Map/Reduce - support.pdnsoft.comsupport.pdnsoft.com/...Meeting/...Nezarat_MapReduce_Example-ver1… · Introduction to Map/Reduce Examples and Principles . Recall](https://reader035.vdocuments.net/reader035/viewer/2022071011/5fc9e63d42449c547531b698/html5/thumbnails/18.jpg)
Cascading Map/Reduce
D1 map() reduce() Output
Task 1:
E1 map() reduce() Output
Task 2:
Task 3 …
![Page 19: Introduction to Map/Reduce - support.pdnsoft.comsupport.pdnsoft.com/...Meeting/...Nezarat_MapReduce_Example-ver1… · Introduction to Map/Reduce Examples and Principles . Recall](https://reader035.vdocuments.net/reader035/viewer/2022071011/5fc9e63d42449c547531b698/html5/thumbnails/19.jpg)
Example: Joining Data
![Page 20: Introduction to Map/Reduce - support.pdnsoft.comsupport.pdnsoft.com/...Meeting/...Nezarat_MapReduce_Example-ver1… · Introduction to Map/Reduce Examples and Principles . Recall](https://reader035.vdocuments.net/reader035/viewer/2022071011/5fc9e63d42449c547531b698/html5/thumbnails/20.jpg)
Joining Data • Task: combine datasets by key
– A standard data management function
![Page 21: Introduction to Map/Reduce - support.pdnsoft.comsupport.pdnsoft.com/...Meeting/...Nezarat_MapReduce_Example-ver1… · Introduction to Map/Reduce Examples and Principles . Recall](https://reader035.vdocuments.net/reader035/viewer/2022071011/5fc9e63d42449c547531b698/html5/thumbnails/21.jpg)
Joining Data • Task: combine datasets by key
– A standard data management function – In pseudo SQL
Select * from table A, table B, where A.key=B.key
![Page 22: Introduction to Map/Reduce - support.pdnsoft.comsupport.pdnsoft.com/...Meeting/...Nezarat_MapReduce_Example-ver1… · Introduction to Map/Reduce Examples and Principles . Recall](https://reader035.vdocuments.net/reader035/viewer/2022071011/5fc9e63d42449c547531b698/html5/thumbnails/22.jpg)
Joining Data • Task: combine datasets by key
– A standard data management function – In pseudo SQL
Select * from table A, table B, where A.key=B.key
– Joins can be inner, left or right outer
![Page 23: Introduction to Map/Reduce - support.pdnsoft.comsupport.pdnsoft.com/...Meeting/...Nezarat_MapReduce_Example-ver1… · Introduction to Map/Reduce Examples and Principles . Recall](https://reader035.vdocuments.net/reader035/viewer/2022071011/5fc9e63d42449c547531b698/html5/thumbnails/23.jpg)
Joining Data • Task: given two wordcount datasets …
![Page 24: Introduction to Map/Reduce - support.pdnsoft.comsupport.pdnsoft.com/...Meeting/...Nezarat_MapReduce_Example-ver1… · Introduction to Map/Reduce Examples and Principles . Recall](https://reader035.vdocuments.net/reader035/viewer/2022071011/5fc9e63d42449c547531b698/html5/thumbnails/24.jpg)
Joining Data • Task: given two wordcount datasets …
able , 5
actor , 18 burger , 25 . . .
File A: <word, total-count>
![Page 25: Introduction to Map/Reduce - support.pdnsoft.comsupport.pdnsoft.com/...Meeting/...Nezarat_MapReduce_Example-ver1… · Introduction to Map/Reduce Examples and Principles . Recall](https://reader035.vdocuments.net/reader035/viewer/2022071011/5fc9e63d42449c547531b698/html5/thumbnails/25.jpg)
Joining Data • Task: given two wordcount datasets …
able , 5
actor , 18 burger , 25 . . .
File A: <word, total-count> File B: <date word, day-count> Jan-16 able , 2 Feb-22 actor , 15 May-03 actor , 3 Jul-4 burger, 20 . . .
![Page 26: Introduction to Map/Reduce - support.pdnsoft.comsupport.pdnsoft.com/...Meeting/...Nezarat_MapReduce_Example-ver1… · Introduction to Map/Reduce Examples and Principles . Recall](https://reader035.vdocuments.net/reader035/viewer/2022071011/5fc9e63d42449c547531b698/html5/thumbnails/26.jpg)
able , 5 actor , 18 burger , 25 . . .
Joining Data • Task: combine by word
Jan-16 able , 2 Feb-22 actor , 15 May-03 actor , 3 Jul-04 burger, 20 . . .
File A: <word, total-count> File B: <date word, day-count>
![Page 27: Introduction to Map/Reduce - support.pdnsoft.comsupport.pdnsoft.com/...Meeting/...Nezarat_MapReduce_Example-ver1… · Introduction to Map/Reduce Examples and Principles . Recall](https://reader035.vdocuments.net/reader035/viewer/2022071011/5fc9e63d42449c547531b698/html5/thumbnails/27.jpg)
Joining Data • Result wanted:
able Jan-16, 2 5 actor Feb-22, 15 18 actor May-03, 1 18 burger Jul-04, 20 25 . . .
File AjoinB: <word date, day-count total-count >
![Page 28: Introduction to Map/Reduce - support.pdnsoft.comsupport.pdnsoft.com/...Meeting/...Nezarat_MapReduce_Example-ver1… · Introduction to Map/Reduce Examples and Principles . Recall](https://reader035.vdocuments.net/reader035/viewer/2022071011/5fc9e63d42449c547531b698/html5/thumbnails/28.jpg)
Joining Data • Recall that data is split in parts
actor 18
Feb-22 actor 15
May-03 actor 1
Apr-15 actor 2 How to gather the right pieces?
![Page 29: Introduction to Map/Reduce - support.pdnsoft.comsupport.pdnsoft.com/...Meeting/...Nezarat_MapReduce_Example-ver1… · Introduction to Map/Reduce Examples and Principles . Recall](https://reader035.vdocuments.net/reader035/viewer/2022071011/5fc9e63d42449c547531b698/html5/thumbnails/29.jpg)
Key-Value & Task Decomposition • Main design consideration: Join depends on word (e.g. Select * where A.word=B.word)
![Page 30: Introduction to Map/Reduce - support.pdnsoft.comsupport.pdnsoft.com/...Meeting/...Nezarat_MapReduce_Example-ver1… · Introduction to Map/Reduce Examples and Principles . Recall](https://reader035.vdocuments.net/reader035/viewer/2022071011/5fc9e63d42449c547531b698/html5/thumbnails/30.jpg)
Key-Value & Task Decomposition • For the join:
– Let <key> = word – Let <value> = other info
<word, … >
![Page 31: Introduction to Map/Reduce - support.pdnsoft.comsupport.pdnsoft.com/...Meeting/...Nezarat_MapReduce_Example-ver1… · Introduction to Map/Reduce Examples and Principles . Recall](https://reader035.vdocuments.net/reader035/viewer/2022071011/5fc9e63d42449c547531b698/html5/thumbnails/31.jpg)
Key-Value & Task Decomposition
• Note: able , 5
actor , 18 . . .
File A: <word, total-count> File B: <date word, day-count> Jan-16 able , 2 Feb-22 actor , 15 . . .
![Page 32: Introduction to Map/Reduce - support.pdnsoft.comsupport.pdnsoft.com/...Meeting/...Nezarat_MapReduce_Example-ver1… · Introduction to Map/Reduce Examples and Principles . Recall](https://reader035.vdocuments.net/reader035/viewer/2022071011/5fc9e63d42449c547531b698/html5/thumbnails/32.jpg)
Key-Value & Task Decomposition
• Note: able , 5
actor , 18 . . .
File A: <word, total-count> File B: <date word, day-count> Jan-16 able , 2 Feb-22 actor , 15 . . .
word already the key
![Page 33: Introduction to Map/Reduce - support.pdnsoft.comsupport.pdnsoft.com/...Meeting/...Nezarat_MapReduce_Example-ver1… · Introduction to Map/Reduce Examples and Principles . Recall](https://reader035.vdocuments.net/reader035/viewer/2022071011/5fc9e63d42449c547531b698/html5/thumbnails/33.jpg)
Key-Value & Task Decomposition
• Note: able , 5
actor , 18 . . .
File A: <word, total-count> File B: <date word, day-count> Jan-16 able , 2 Feb-22 actor , 15 . . .
date needs to be filtered out
![Page 34: Introduction to Map/Reduce - support.pdnsoft.comsupport.pdnsoft.com/...Meeting/...Nezarat_MapReduce_Example-ver1… · Introduction to Map/Reduce Examples and Principles . Recall](https://reader035.vdocuments.net/reader035/viewer/2022071011/5fc9e63d42449c547531b698/html5/thumbnails/34.jpg)
Key-Value & Task Decomposition
• Note: able , 5
actor , 18 . . .
File A: <word, total-count> File B: <date word, day-count> Jan-16 able , 2 Feb-22 actor , 15 . . .
date needs to be filtered out Where should date info go?
![Page 35: Introduction to Map/Reduce - support.pdnsoft.comsupport.pdnsoft.com/...Meeting/...Nezarat_MapReduce_Example-ver1… · Introduction to Map/Reduce Examples and Principles . Recall](https://reader035.vdocuments.net/reader035/viewer/2022071011/5fc9e63d42449c547531b698/html5/thumbnails/35.jpg)
Key-Value & Task Decomposition <word, date day-count total-count > put date into value field
![Page 36: Introduction to Map/Reduce - support.pdnsoft.comsupport.pdnsoft.com/...Meeting/...Nezarat_MapReduce_Example-ver1… · Introduction to Map/Reduce Examples and Principles . Recall](https://reader035.vdocuments.net/reader035/viewer/2022071011/5fc9e63d42449c547531b698/html5/thumbnails/36.jpg)
Task Decomposition • Now data sets are:
File B_new: <word, date count> File A: <word, total-count> able , Jan 16 2 actor , Feb-22 15 actor , May-03 3 burger , Jul-04 20 . . .
able , 5 actor , 18 burger , 25 . . .
![Page 37: Introduction to Map/Reduce - support.pdnsoft.comsupport.pdnsoft.com/...Meeting/...Nezarat_MapReduce_Example-ver1… · Introduction to Map/Reduce Examples and Principles . Recall](https://reader035.vdocuments.net/reader035/viewer/2022071011/5fc9e63d42449c547531b698/html5/thumbnails/37.jpg)
Task Decomposition • How will Hadoop shuffle & group these?
File B_new: <word, date day-count> File A: <word, total-count> able , Jan-16 2 actor , Feb-22 15 actor , May-03 3 burger , Jul-04 20 . . .
able , 5 actor , 18 burger , 25 . . .
![Page 38: Introduction to Map/Reduce - support.pdnsoft.comsupport.pdnsoft.com/...Meeting/...Nezarat_MapReduce_Example-ver1… · Introduction to Map/Reduce Examples and Principles . Recall](https://reader035.vdocuments.net/reader035/viewer/2022071011/5fc9e63d42449c547531b698/html5/thumbnails/38.jpg)
Task Decomposition • How will Hadoop shuffle & group these?
actor , Feb-22 15 actor , May-03 3
actor , 18
Let’s focus on 1 key:
![Page 39: Introduction to Map/Reduce - support.pdnsoft.comsupport.pdnsoft.com/...Meeting/...Nezarat_MapReduce_Example-ver1… · Introduction to Map/Reduce Examples and Principles . Recall](https://reader035.vdocuments.net/reader035/viewer/2022071011/5fc9e63d42449c547531b698/html5/thumbnails/39.jpg)
Task Decomposition • Hadoop gathers the data for a join
actor , Feb-22 15 actor , May-03 3
actor , 18
actor , Feb-22 15 actor , 18 actor , May-03 3
![Page 40: Introduction to Map/Reduce - support.pdnsoft.comsupport.pdnsoft.com/...Meeting/...Nezarat_MapReduce_Example-ver1… · Introduction to Map/Reduce Examples and Principles . Recall](https://reader035.vdocuments.net/reader035/viewer/2022071011/5fc9e63d42449c547531b698/html5/thumbnails/40.jpg)
Task Decomposition • Reducer now has all the data for same
word grouped together
actor , 18 actor , Feb-22 15 actor , May-03 3
A number or date indicates file source
![Page 41: Introduction to Map/Reduce - support.pdnsoft.comsupport.pdnsoft.com/...Meeting/...Nezarat_MapReduce_Example-ver1… · Introduction to Map/Reduce Examples and Principles . Recall](https://reader035.vdocuments.net/reader035/viewer/2022071011/5fc9e63d42449c547531b698/html5/thumbnails/41.jpg)
Task Decomposition • Reducer can now join the data and put
date back into key
actor , 18 actor , Feb-22 15 actor , May-03 3
Feb-22 actor, 15 18 May-03 actor, 3 18
![Page 42: Introduction to Map/Reduce - support.pdnsoft.comsupport.pdnsoft.com/...Meeting/...Nezarat_MapReduce_Example-ver1… · Introduction to Map/Reduce Examples and Principles . Recall](https://reader035.vdocuments.net/reader035/viewer/2022071011/5fc9e63d42449c547531b698/html5/thumbnails/42.jpg)
Example: Vector Multiplication
![Page 43: Introduction to Map/Reduce - support.pdnsoft.comsupport.pdnsoft.com/...Meeting/...Nezarat_MapReduce_Example-ver1… · Introduction to Map/Reduce Examples and Principles . Recall](https://reader035.vdocuments.net/reader035/viewer/2022071011/5fc9e63d42449c547531b698/html5/thumbnails/43.jpg)
Vector Multiplication • Task: multiply 2 arrays of N numbers
– A basic mathematical operation – Let’s assume N is very large
![Page 44: Introduction to Map/Reduce - support.pdnsoft.comsupport.pdnsoft.com/...Meeting/...Nezarat_MapReduce_Example-ver1… · Introduction to Map/Reduce Examples and Principles . Recall](https://reader035.vdocuments.net/reader035/viewer/2022071011/5fc9e63d42449c547531b698/html5/thumbnails/44.jpg)
Vector Multiplication
X A B = (𝟓 x 𝟐.𝟕) # 1st of A & B + (𝟒 x 1.9) # 2nd of A & B + (–𝟑.𝟐 x –1.3) # 3rd … . . . + (–𝟐 x 𝟏) # Nth of A & B
5 4 -3.2 . . . -2
2.7 1.9 -1.3 . . . 1
• Task: multiply 2 arrays of N numbers
![Page 45: Introduction to Map/Reduce - support.pdnsoft.comsupport.pdnsoft.com/...Meeting/...Nezarat_MapReduce_Example-ver1… · Introduction to Map/Reduce Examples and Principles . Recall](https://reader035.vdocuments.net/reader035/viewer/2022071011/5fc9e63d42449c547531b698/html5/thumbnails/45.jpg)
Vector Multiplication
• Recall: – data partitioned in HDFS
5 4
-3.2 .
. -2
2.7
1.9 -1.3 .
. 1
. . . . . .
A B
![Page 46: Introduction to Map/Reduce - support.pdnsoft.comsupport.pdnsoft.com/...Meeting/...Nezarat_MapReduce_Example-ver1… · Introduction to Map/Reduce Examples and Principles . Recall](https://reader035.vdocuments.net/reader035/viewer/2022071011/5fc9e63d42449c547531b698/html5/thumbnails/46.jpg)
Vector Multiplication • Main design consideration: need elements with same index together Let <key, value> = <index, number>
![Page 47: Introduction to Map/Reduce - support.pdnsoft.comsupport.pdnsoft.com/...Meeting/...Nezarat_MapReduce_Example-ver1… · Introduction to Map/Reduce Examples and Principles . Recall](https://reader035.vdocuments.net/reader035/viewer/2022071011/5fc9e63d42449c547531b698/html5/thumbnails/47.jpg)
Vector Multiplication
• Problem: array partitions don’t have an index
5 4
-3.2 .
. -2
2.7
1.9 -1.3 .
. 1
. . . . . .
A B
![Page 48: Introduction to Map/Reduce - support.pdnsoft.comsupport.pdnsoft.com/...Meeting/...Nezarat_MapReduce_Example-ver1… · Introduction to Map/Reduce Examples and Principles . Recall](https://reader035.vdocuments.net/reader035/viewer/2022071011/5fc9e63d42449c547531b698/html5/thumbnails/48.jpg)
Vector Multiplication
• Problem: array partitions don’t have an index
5 4
-3.2 .
. -2
2.7
1.9 -1.3 .
. 1
. . . . . .
A B
no index!
![Page 49: Introduction to Map/Reduce - support.pdnsoft.comsupport.pdnsoft.com/...Meeting/...Nezarat_MapReduce_Example-ver1… · Introduction to Map/Reduce Examples and Principles . Recall](https://reader035.vdocuments.net/reader035/viewer/2022071011/5fc9e63d42449c547531b698/html5/thumbnails/49.jpg)
Vector Multiplication
5 4
-3.2 .
. -2
2.7
1.9 -1.3 .
. 1
. . . . . .
A B
Environment Information
![Page 50: Introduction to Map/Reduce - support.pdnsoft.comsupport.pdnsoft.com/...Meeting/...Nezarat_MapReduce_Example-ver1… · Introduction to Map/Reduce Examples and Principles . Recall](https://reader035.vdocuments.net/reader035/viewer/2022071011/5fc9e63d42449c547531b698/html5/thumbnails/50.jpg)
Vector Multiplication
5 4
-3.2 .
. -2
2.7
1.9 -1.3 .
. 1
. . . . . .
A B
Environment Information
map() os.getenv ('map_input_file')
Side effect
map() can access info
info outside map/reduce <key, value>
![Page 51: Introduction to Map/Reduce - support.pdnsoft.comsupport.pdnsoft.com/...Meeting/...Nezarat_MapReduce_Example-ver1… · Introduction to Map/Reduce Examples and Principles . Recall](https://reader035.vdocuments.net/reader035/viewer/2022071011/5fc9e63d42449c547531b698/html5/thumbnails/51.jpg)
Vector Multiplication • Let’s assume:
– each line already has <index, number>
B A 1, 5 2, 4
3, -3.2 .
. N, -2
1, 2.7
2, 1.9 3,-1.3 .
. N, 1
. . . . . .
![Page 52: Introduction to Map/Reduce - support.pdnsoft.comsupport.pdnsoft.com/...Meeting/...Nezarat_MapReduce_Example-ver1… · Introduction to Map/Reduce Examples and Principles . Recall](https://reader035.vdocuments.net/reader035/viewer/2022071011/5fc9e63d42449c547531b698/html5/thumbnails/52.jpg)
Vector Multiplication • Let’s assume:
– each line already has <index, number>
B A 1, 5 2, 4
3, -3.2 .
. N, -2
1, 2.7
2, 1.9 3,-1.3 .
. N, 1
. . . . . . Note: mapper only needs to pass data (identity function)
![Page 53: Introduction to Map/Reduce - support.pdnsoft.comsupport.pdnsoft.com/...Meeting/...Nezarat_MapReduce_Example-ver1… · Introduction to Map/Reduce Examples and Principles . Recall](https://reader035.vdocuments.net/reader035/viewer/2022071011/5fc9e63d42449c547531b698/html5/thumbnails/53.jpg)
Vector Multiplication
shuffle & group indices
B A
1, 5 2, 4
3, -3.2 .
. N, -2
1, 2.7
2, 1.9 3,-1.3 .
. N, 1
. . . . . .
<index, num> 1, 5
1, 2.7 3, -1.3 3, -3.2
2, 1.9 2, 4 …
<index, num> <index, num>
![Page 54: Introduction to Map/Reduce - support.pdnsoft.comsupport.pdnsoft.com/...Meeting/...Nezarat_MapReduce_Example-ver1… · Introduction to Map/Reduce Examples and Principles . Recall](https://reader035.vdocuments.net/reader035/viewer/2022071011/5fc9e63d42449c547531b698/html5/thumbnails/54.jpg)
Vector Multiplication What should reducers do?
1, 5 1, 2.7 3, -1.3 3, -3.2
2, 1.9 2, 4
<index, num>
. . .
A,B grouped
![Page 55: Introduction to Map/Reduce - support.pdnsoft.comsupport.pdnsoft.com/...Meeting/...Nezarat_MapReduce_Example-ver1… · Introduction to Map/Reduce Examples and Principles . Recall](https://reader035.vdocuments.net/reader035/viewer/2022071011/5fc9e63d42449c547531b698/html5/thumbnails/55.jpg)
Vector Multiplication Reducer: -get pairs of <index, number>
1, 5 1, 2.7 3, -1.3 3, -3.2
2, 1.9 2, 4
<index, num>
A,B grouped
. . .
![Page 56: Introduction to Map/Reduce - support.pdnsoft.comsupport.pdnsoft.com/...Meeting/...Nezarat_MapReduce_Example-ver1… · Introduction to Map/Reduce Examples and Principles . Recall](https://reader035.vdocuments.net/reader035/viewer/2022071011/5fc9e63d42449c547531b698/html5/thumbnails/56.jpg)
Vector Multiplication Reducer: -get pairs of <index, number> -multiply & add
1, 5 1, 2.7 3, -1.3 3, -3.2
2, 1.9 2, 4
<index, num>
A,B grouped
17.66
7.6
subtotals
. . .
+
![Page 57: Introduction to Map/Reduce - support.pdnsoft.comsupport.pdnsoft.com/...Meeting/...Nezarat_MapReduce_Example-ver1… · Introduction to Map/Reduce Examples and Principles . Recall](https://reader035.vdocuments.net/reader035/viewer/2022071011/5fc9e63d42449c547531b698/html5/thumbnails/57.jpg)
Vector Multiplication Reducer: -get pairs of <index, number> -multiply & add (Still need get total sum, but should be largely reduced)
1, 5 1, 2.7 3, -1.3 3, -3.2
2, 1.9 2, 4
<index, num>
A,B grouped
17.66
7.6
subtotals
. . .
+
![Page 58: Introduction to Map/Reduce - support.pdnsoft.comsupport.pdnsoft.com/...Meeting/...Nezarat_MapReduce_Example-ver1… · Introduction to Map/Reduce Examples and Principles . Recall](https://reader035.vdocuments.net/reader035/viewer/2022071011/5fc9e63d42449c547531b698/html5/thumbnails/58.jpg)
Computational Costs
In Vector Multiplication
![Page 59: Introduction to Map/Reduce - support.pdnsoft.comsupport.pdnsoft.com/...Meeting/...Nezarat_MapReduce_Example-ver1… · Introduction to Map/Reduce Examples and Principles . Recall](https://reader035.vdocuments.net/reader035/viewer/2022071011/5fc9e63d42449c547531b698/html5/thumbnails/59.jpg)
Computational Costs
• For Vector Multiplication – How many <index, number> are
output from map()?
![Page 60: Introduction to Map/Reduce - support.pdnsoft.comsupport.pdnsoft.com/...Meeting/...Nezarat_MapReduce_Example-ver1… · Introduction to Map/Reduce Examples and Principles . Recall](https://reader035.vdocuments.net/reader035/viewer/2022071011/5fc9e63d42449c547531b698/html5/thumbnails/60.jpg)
Computational Costs
• For Vector Multiplication – How many <index, number> are
output from map()? – How many <index> groups have to
be shuffled?
![Page 61: Introduction to Map/Reduce - support.pdnsoft.comsupport.pdnsoft.com/...Meeting/...Nezarat_MapReduce_Example-ver1… · Introduction to Map/Reduce Examples and Principles . Recall](https://reader035.vdocuments.net/reader035/viewer/2022071011/5fc9e63d42449c547531b698/html5/thumbnails/61.jpg)
Computational Costs
• How many <index, number> are output ? A B
1, 5 2, 4 3, -3.2 . . . N, -2
1, 2.7 2, 1.9 3, -1.3 . . . N, 1
For: 2 Vectors with N indices each Then: 2N <index, number> are output from map()
![Page 62: Introduction to Map/Reduce - support.pdnsoft.comsupport.pdnsoft.com/...Meeting/...Nezarat_MapReduce_Example-ver1… · Introduction to Map/Reduce Examples and Principles . Recall](https://reader035.vdocuments.net/reader035/viewer/2022071011/5fc9e63d42449c547531b698/html5/thumbnails/62.jpg)
Computational Costs • How many <index> groups have to be shuffled?
For: 2N indices and N pairs Then: N groups are shuffled to reducers
1, 5 1, 2.7 3, -1.3 3, -3.2
2, 1.9 2, 4 …
<index, num>
A,B grouped
![Page 63: Introduction to Map/Reduce - support.pdnsoft.comsupport.pdnsoft.com/...Meeting/...Nezarat_MapReduce_Example-ver1… · Introduction to Map/Reduce Examples and Principles . Recall](https://reader035.vdocuments.net/reader035/viewer/2022071011/5fc9e63d42449c547531b698/html5/thumbnails/63.jpg)
Computational Costs • Can we reduce shuffling?
![Page 64: Introduction to Map/Reduce - support.pdnsoft.comsupport.pdnsoft.com/...Meeting/...Nezarat_MapReduce_Example-ver1… · Introduction to Map/Reduce Examples and Principles . Recall](https://reader035.vdocuments.net/reader035/viewer/2022071011/5fc9e63d42449c547531b698/html5/thumbnails/64.jpg)
Computational Costs • Can we reduce shuffling?
• Try: ‘combine’ map indices in mapper
(works better for Wordcount)
![Page 65: Introduction to Map/Reduce - support.pdnsoft.comsupport.pdnsoft.com/...Meeting/...Nezarat_MapReduce_Example-ver1… · Introduction to Map/Reduce Examples and Principles . Recall](https://reader035.vdocuments.net/reader035/viewer/2022071011/5fc9e63d42449c547531b698/html5/thumbnails/65.jpg)
Computational Costs • Can we reduce shuffling?
• Or Try: use index ranges of length R
![Page 66: Introduction to Map/Reduce - support.pdnsoft.comsupport.pdnsoft.com/...Meeting/...Nezarat_MapReduce_Example-ver1… · Introduction to Map/Reduce Examples and Principles . Recall](https://reader035.vdocuments.net/reader035/viewer/2022071011/5fc9e63d42449c547531b698/html5/thumbnails/66.jpg)
Computational Costs • Index Ranges: let R=10 & bin the array indices
1 2 3 4 … 10 11 12 …..19 20 21 …. (N-9) …. N N keys
![Page 67: Introduction to Map/Reduce - support.pdnsoft.comsupport.pdnsoft.com/...Meeting/...Nezarat_MapReduce_Example-ver1… · Introduction to Map/Reduce Examples and Principles . Recall](https://reader035.vdocuments.net/reader035/viewer/2022071011/5fc9e63d42449c547531b698/html5/thumbnails/67.jpg)
Computational Costs • Index Ranges: let R=10 & bin the array indices
1 2 3 4 … 10 11 12 …..19 20 21 …. (N-9) …. N
1 2 3 . . . N/R
place in bins
![Page 68: Introduction to Map/Reduce - support.pdnsoft.comsupport.pdnsoft.com/...Meeting/...Nezarat_MapReduce_Example-ver1… · Introduction to Map/Reduce Examples and Principles . Recall](https://reader035.vdocuments.net/reader035/viewer/2022071011/5fc9e63d42449c547531b698/html5/thumbnails/68.jpg)
Computational Costs • For example, let R=10, and bin the array indices
1 2 3 4 … 10 11 12 …..19 20 21 …. (N-9) …. N
1 2 3 . . . N/R
N keys are now N/R = N/10 keys
![Page 69: Introduction to Map/Reduce - support.pdnsoft.comsupport.pdnsoft.com/...Meeting/...Nezarat_MapReduce_Example-ver1… · Introduction to Map/Reduce Examples and Principles . Recall](https://reader035.vdocuments.net/reader035/viewer/2022071011/5fc9e63d42449c547531b698/html5/thumbnails/69.jpg)
Computational Costs • For example, let R=10, and bin the array indices
1 2 3 4 … 10 11 12 …..19 20 21 …. (N-9) …. N
1 2 3 . . . N/R
N keys are now N/R=N/10 keys <key,value> is now <index bin, original-index number>
![Page 70: Introduction to Map/Reduce - support.pdnsoft.comsupport.pdnsoft.com/...Meeting/...Nezarat_MapReduce_Example-ver1… · Introduction to Map/Reduce Examples and Principles . Recall](https://reader035.vdocuments.net/reader035/viewer/2022071011/5fc9e63d42449c547531b698/html5/thumbnails/70.jpg)
Computational Costs • Now shuffling costs depend on N/R groups
If: R=1 Then: N/R=N groups (same as before) If: R>1 Then: N/R<N (less shuffling to do)
![Page 71: Introduction to Map/Reduce - support.pdnsoft.comsupport.pdnsoft.com/...Meeting/...Nezarat_MapReduce_Example-ver1… · Introduction to Map/Reduce Examples and Principles . Recall](https://reader035.vdocuments.net/reader035/viewer/2022071011/5fc9e63d42449c547531b698/html5/thumbnails/71.jpg)
• Trade-offs:
Computational Costs
If: size of (N/R) ↑ Then: shuffle costs ↑
![Page 72: Introduction to Map/Reduce - support.pdnsoft.comsupport.pdnsoft.com/...Meeting/...Nezarat_MapReduce_Example-ver1… · Introduction to Map/Reduce Examples and Principles . Recall](https://reader035.vdocuments.net/reader035/viewer/2022071011/5fc9e63d42449c547531b698/html5/thumbnails/72.jpg)
• Trade-offs:
Computational Costs
If: size of (N/R) ↑ Then: shuffle costs ↑ But: reducer complexity ↓
![Page 73: Introduction to Map/Reduce - support.pdnsoft.comsupport.pdnsoft.com/...Meeting/...Nezarat_MapReduce_Example-ver1… · Introduction to Map/Reduce Examples and Principles . Recall](https://reader035.vdocuments.net/reader035/viewer/2022071011/5fc9e63d42449c547531b698/html5/thumbnails/73.jpg)
• Trade-offs:
Computational Costs
If: size of (N/R) ↑ Then: shuffle costs ↑ But: reducer complexity ↓
-you control R (specific tradeoffs depend on data and hardware)
![Page 74: Introduction to Map/Reduce - support.pdnsoft.comsupport.pdnsoft.com/...Meeting/...Nezarat_MapReduce_Example-ver1… · Introduction to Map/Reduce Examples and Principles . Recall](https://reader035.vdocuments.net/reader035/viewer/2022071011/5fc9e63d42449c547531b698/html5/thumbnails/74.jpg)
Vector to Matrices • Matrix multiplication needs row-index and
col-index in the keys
• Matrix multiplication more pertinent to data analytic topics
![Page 75: Introduction to Map/Reduce - support.pdnsoft.comsupport.pdnsoft.com/...Meeting/...Nezarat_MapReduce_Example-ver1… · Introduction to Map/Reduce Examples and Principles . Recall](https://reader035.vdocuments.net/reader035/viewer/2022071011/5fc9e63d42449c547531b698/html5/thumbnails/75.jpg)
Summary
And Looking Beyond
![Page 76: Introduction to Map/Reduce - support.pdnsoft.comsupport.pdnsoft.com/...Meeting/...Nezarat_MapReduce_Example-ver1… · Introduction to Map/Reduce Examples and Principles . Recall](https://reader035.vdocuments.net/reader035/viewer/2022071011/5fc9e63d42449c547531b698/html5/thumbnails/76.jpg)
Task Decomposition
• mappers are separate and independent • mappers work on data parts
![Page 77: Introduction to Map/Reduce - support.pdnsoft.comsupport.pdnsoft.com/...Meeting/...Nezarat_MapReduce_Example-ver1… · Introduction to Map/Reduce Examples and Principles . Recall](https://reader035.vdocuments.net/reader035/viewer/2022071011/5fc9e63d42449c547531b698/html5/thumbnails/77.jpg)
Summary of Map/Reduce Design Considerations
• <key, value> must enable correct output • Let Hadoop do the hard work • Trade-offs
Hadoop
Programming
![Page 78: Introduction to Map/Reduce - support.pdnsoft.comsupport.pdnsoft.com/...Meeting/...Nezarat_MapReduce_Example-ver1… · Introduction to Map/Reduce Examples and Principles . Recall](https://reader035.vdocuments.net/reader035/viewer/2022071011/5fc9e63d42449c547531b698/html5/thumbnails/78.jpg)
Summary of Map/Reduce Design Considerations
• Common mappers: – Filter (subset data) – Identity (just pass data) – Splitter (as for counting)
![Page 79: Introduction to Map/Reduce - support.pdnsoft.comsupport.pdnsoft.com/...Meeting/...Nezarat_MapReduce_Example-ver1… · Introduction to Map/Reduce Examples and Principles . Recall](https://reader035.vdocuments.net/reader035/viewer/2022071011/5fc9e63d42449c547531b698/html5/thumbnails/79.jpg)
Summary of Map/Reduce Design Considerations
• Composite <keys>
![Page 80: Introduction to Map/Reduce - support.pdnsoft.comsupport.pdnsoft.com/...Meeting/...Nezarat_MapReduce_Example-ver1… · Introduction to Map/Reduce Examples and Principles . Recall](https://reader035.vdocuments.net/reader035/viewer/2022071011/5fc9e63d42449c547531b698/html5/thumbnails/80.jpg)
Summary of Map/Reduce Design Considerations
• Composite <keys> • Extra info in <values>
![Page 81: Introduction to Map/Reduce - support.pdnsoft.comsupport.pdnsoft.com/...Meeting/...Nezarat_MapReduce_Example-ver1… · Introduction to Map/Reduce Examples and Principles . Recall](https://reader035.vdocuments.net/reader035/viewer/2022071011/5fc9e63d42449c547531b698/html5/thumbnails/81.jpg)
Summary of Map/Reduce Design Considerations
• Composite <keys> • Extra info in <values> • Cascade Map/Reduce jobs
![Page 82: Introduction to Map/Reduce - support.pdnsoft.comsupport.pdnsoft.com/...Meeting/...Nezarat_MapReduce_Example-ver1… · Introduction to Map/Reduce Examples and Principles . Recall](https://reader035.vdocuments.net/reader035/viewer/2022071011/5fc9e63d42449c547531b698/html5/thumbnails/82.jpg)
Summary of Map/Reduce Design Considerations
• Composite <keys> • Extra info in <values> • Cascade Map/Reduce jobs • Bin keys into ranges
![Page 83: Introduction to Map/Reduce - support.pdnsoft.comsupport.pdnsoft.com/...Meeting/...Nezarat_MapReduce_Example-ver1… · Introduction to Map/Reduce Examples and Principles . Recall](https://reader035.vdocuments.net/reader035/viewer/2022071011/5fc9e63d42449c547531b698/html5/thumbnails/83.jpg)
Summary of Map/Reduce Design Considerations
• Composite <keys> • Extra info in <values> • Cascade Map/Reduce jobs • Bin keys into ranges • Aggregate map output when possible
(combiner option)
![Page 84: Introduction to Map/Reduce - support.pdnsoft.comsupport.pdnsoft.com/...Meeting/...Nezarat_MapReduce_Example-ver1… · Introduction to Map/Reduce Examples and Principles . Recall](https://reader035.vdocuments.net/reader035/viewer/2022071011/5fc9e63d42449c547531b698/html5/thumbnails/84.jpg)
Potential Limitations Map/Reduce
• Must fit <key, value> paradigm • Map/Reduce data not persistent • Requires programming/debugging • Not interactive
![Page 85: Introduction to Map/Reduce - support.pdnsoft.comsupport.pdnsoft.com/...Meeting/...Nezarat_MapReduce_Example-ver1… · Introduction to Map/Reduce Examples and Principles . Recall](https://reader035.vdocuments.net/reader035/viewer/2022071011/5fc9e63d42449c547531b698/html5/thumbnails/85.jpg)
Beyond Map/Reduce
• Data access tools (Pig, HIVE) – SQL like syntax
• Interactivity & Persistency (Spark)