lecture 2 – mapreduce: theory and implementation
DESCRIPTION
Lecture 2 – MapReduce: Theory and Implementation. CSE 490h – Introduction to Distributed Computing, Winter 2008. Except as otherwise noted, the content of this presentation is licensed under the Creative Commons Attribution 2.5 License. Last Class. How do I process lots of data? - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Lecture 2 – MapReduce: Theory and Implementation](https://reader035.vdocuments.net/reader035/viewer/2022062521/56814d50550346895dba85d7/html5/thumbnails/1.jpg)
Lecture 2 – MapReduce: Theory and Implementation
CSE 490h – Introduction to Distributed Computing, Winter 2008
Except as otherwise noted, the content of this presentation is licensed under the Creative Commons Attribution 2.5 License.
![Page 2: Lecture 2 – MapReduce: Theory and Implementation](https://reader035.vdocuments.net/reader035/viewer/2022062521/56814d50550346895dba85d7/html5/thumbnails/2.jpg)
Last Class
How do I process lots of data?Distribute the work
Can I distribute the work?Maybe… if it’s not dependent on other tasksExample: Fibonnaci.
![Page 3: Lecture 2 – MapReduce: Theory and Implementation](https://reader035.vdocuments.net/reader035/viewer/2022062521/56814d50550346895dba85d7/html5/thumbnails/3.jpg)
Last Class
What problems can occur?Large tasksUnpredictable bugsMachine failure
How do solve / avoid these?Break up into small chunks?Restart tasks?Use known working solutions
![Page 4: Lecture 2 – MapReduce: Theory and Implementation](https://reader035.vdocuments.net/reader035/viewer/2022062521/56814d50550346895dba85d7/html5/thumbnails/4.jpg)
MapReduce
Concept from functional programming Implemented by Google Applied to large number of problems
![Page 5: Lecture 2 – MapReduce: Theory and Implementation](https://reader035.vdocuments.net/reader035/viewer/2022062521/56814d50550346895dba85d7/html5/thumbnails/5.jpg)
Functional Programming Review
Java:int fooA(String[] list) {
return bar1(list) + bar2(list); }
int fooB(String[] list) { return bar2(list) + bar1(list); }
Do they give the same result?
![Page 6: Lecture 2 – MapReduce: Theory and Implementation](https://reader035.vdocuments.net/reader035/viewer/2022062521/56814d50550346895dba85d7/html5/thumbnails/6.jpg)
Functional Programming Review
Functional Programming:fun fooA(l: int list) =
bar1(l) + bar2(l)
fun fooB(l: int list) = bar2(l) + bar1(l)
Do they give the same result?
![Page 7: Lecture 2 – MapReduce: Theory and Implementation](https://reader035.vdocuments.net/reader035/viewer/2022062521/56814d50550346895dba85d7/html5/thumbnails/7.jpg)
Functional Programming Review
Operations do not modify data structures: They always create new ones
Original data still exists in unmodified form
![Page 8: Lecture 2 – MapReduce: Theory and Implementation](https://reader035.vdocuments.net/reader035/viewer/2022062521/56814d50550346895dba85d7/html5/thumbnails/8.jpg)
Functional Updates Do Not Modify Structuresfun foo(x, lst) = let lst' = reverse lst in reverse ( x :: lst' )foo: a’ -> a’ list -> a’ list
The foo() function above reverses a list, adds a new element to the front, and returns all of that, reversed, which appends an item.
But it never modifies lst!
![Page 9: Lecture 2 – MapReduce: Theory and Implementation](https://reader035.vdocuments.net/reader035/viewer/2022062521/56814d50550346895dba85d7/html5/thumbnails/9.jpg)
Functions Can Be Used As Argumentsfun DoDouble(f, x) = f (f x)It does not matter what f does to its argument; DoDouble() will do it twice.
What is the type of this function? x: a’ f: a’ -> a’ DoDouble: (a’ -> a’) -> a’ -> a’
![Page 10: Lecture 2 – MapReduce: Theory and Implementation](https://reader035.vdocuments.net/reader035/viewer/2022062521/56814d50550346895dba85d7/html5/thumbnails/10.jpg)
map (Functional Programming)
Creates a new list by applying f to each element of the input list; returns output in order.
f f f f f f
map f lst: (’a->’b) -> (’a list) -> (’b list)
![Page 11: Lecture 2 – MapReduce: Theory and Implementation](https://reader035.vdocuments.net/reader035/viewer/2022062521/56814d50550346895dba85d7/html5/thumbnails/11.jpg)
map Implementation
This implementation moves left-to-right across the list, mapping elements one at a time
… But does it need to?
fun map f [] = [] | map f (x::xs) = (f x) :: (map f xs)
![Page 12: Lecture 2 – MapReduce: Theory and Implementation](https://reader035.vdocuments.net/reader035/viewer/2022062521/56814d50550346895dba85d7/html5/thumbnails/12.jpg)
Implicit Parallelism In map
In a purely functional setting, elements of a list being computed by map cannot see the effects of the computations on other elements
If order of application of f to elements in list is commutative, we can reorder or parallelize execution
This is the “secret” that MapReduce exploits
![Page 13: Lecture 2 – MapReduce: Theory and Implementation](https://reader035.vdocuments.net/reader035/viewer/2022062521/56814d50550346895dba85d7/html5/thumbnails/13.jpg)
FoldMoves across a list, applying f to each element
plus an accumulator. f returns the next accumulator value, which is combined with the next element of the list
f f f f f returned
initial
fold f x0 lst: ('a*'b->'b)->'b->('a list)->'b
![Page 14: Lecture 2 – MapReduce: Theory and Implementation](https://reader035.vdocuments.net/reader035/viewer/2022062521/56814d50550346895dba85d7/html5/thumbnails/14.jpg)
fold left vs. fold right
Order of list elements can be significant Fold left moves left-to-right across the list Fold right moves from right-to-leftSML Implementation:
fun foldl f a [] = a | foldl f a (x::xs) = foldl f (f(x, a)) xs
fun foldr f a [] = a | foldr f a (x::xs) = f(x, (foldr f a xs))
![Page 15: Lecture 2 – MapReduce: Theory and Implementation](https://reader035.vdocuments.net/reader035/viewer/2022062521/56814d50550346895dba85d7/html5/thumbnails/15.jpg)
Example
fun foo(l: int list) = sum(l) + mul(l) + length(l)
How can we implement this?
![Page 16: Lecture 2 – MapReduce: Theory and Implementation](https://reader035.vdocuments.net/reader035/viewer/2022062521/56814d50550346895dba85d7/html5/thumbnails/16.jpg)
Example (Solved)
fun foo(l: int list) = sum(l) + mul(l) + length(l)
fun sum(lst) = foldl (fn (x,a)=>x+a) 0 lstfun mul(lst) = foldl (fn (x,a)=>x*a) 1 lstfun length(lst) = foldl (fn (x,a)=>1+a) 0 lst
![Page 17: Lecture 2 – MapReduce: Theory and Implementation](https://reader035.vdocuments.net/reader035/viewer/2022062521/56814d50550346895dba85d7/html5/thumbnails/17.jpg)
Google MapReduce
Input Handling Map function Partition Function Compare Function Reduce Function Output Writer
![Page 18: Lecture 2 – MapReduce: Theory and Implementation](https://reader035.vdocuments.net/reader035/viewer/2022062521/56814d50550346895dba85d7/html5/thumbnails/18.jpg)
Input Handling
Divides up data into bite-size chunks Starts up tasks Assigns tasks to idle workers
![Page 19: Lecture 2 – MapReduce: Theory and Implementation](https://reader035.vdocuments.net/reader035/viewer/2022062521/56814d50550346895dba85d7/html5/thumbnails/19.jpg)
Map
Input: Key, Value pair Output: Key, Value pairs Example: Annual Rainfall Per City
![Page 20: Lecture 2 – MapReduce: Theory and Implementation](https://reader035.vdocuments.net/reader035/viewer/2022062521/56814d50550346895dba85d7/html5/thumbnails/20.jpg)
Map (Example)
Example: Annual Rainfall Per City map(String key, String value): // key: date // value: weather info foreach (City c in value) EmitIntermediate(c, c.temperature)
![Page 21: Lecture 2 – MapReduce: Theory and Implementation](https://reader035.vdocuments.net/reader035/viewer/2022062521/56814d50550346895dba85d7/html5/thumbnails/21.jpg)
Partition Function
Allocates map output to particular reduces Input: key, number of reduces Output: Index of desired reduce Typical: hash(key) % numberOfReduces
![Page 22: Lecture 2 – MapReduce: Theory and Implementation](https://reader035.vdocuments.net/reader035/viewer/2022062521/56814d50550346895dba85d7/html5/thumbnails/22.jpg)
Comparison
Sorts input for each reduce Example: Annual rainfall per city
Sorts rainfall data for each citySeattle: {0, 0, 0, 1, 4, 7, 10, …}
![Page 23: Lecture 2 – MapReduce: Theory and Implementation](https://reader035.vdocuments.net/reader035/viewer/2022062521/56814d50550346895dba85d7/html5/thumbnails/23.jpg)
Reduce
Input: Key, Sorted list of values Output: Single value Example: Annual rainfall per city
![Page 24: Lecture 2 – MapReduce: Theory and Implementation](https://reader035.vdocuments.net/reader035/viewer/2022062521/56814d50550346895dba85d7/html5/thumbnails/24.jpg)
Reduce
Input: Key, Sorted list of values Output: Single value Example: Annual rainfall per city
![Page 25: Lecture 2 – MapReduce: Theory and Implementation](https://reader035.vdocuments.net/reader035/viewer/2022062521/56814d50550346895dba85d7/html5/thumbnails/25.jpg)
Reduce (Example)
Example: Annual rainfall per city reduce(String key, Iterator values):
// key: city // values: temperature sum = 0, count = 0 for each (v in values) sum += v count = count + 1 Emit(sum / count)
![Page 26: Lecture 2 – MapReduce: Theory and Implementation](https://reader035.vdocuments.net/reader035/viewer/2022062521/56814d50550346895dba85d7/html5/thumbnails/26.jpg)
Output
Writes the output to storage (GFS, etc)
![Page 27: Lecture 2 – MapReduce: Theory and Implementation](https://reader035.vdocuments.net/reader035/viewer/2022062521/56814d50550346895dba85d7/html5/thumbnails/27.jpg)
Data store 1 Data store nmap
(key 1, values...)
(key 2, values...)
(key 3, values...)
map
(key 1, values...)
(key 2, values...)
(key 3, values...)
Input key*value pairs
Input key*value pairs
== Barrier == : Aggregates intermediate values by output key
reduce reduce reduce
key 1, intermediate
values
key 2, intermediate
values
key 3, intermediate
values
final key 1 values
final key 2 values
final key 3 values
...
![Page 28: Lecture 2 – MapReduce: Theory and Implementation](https://reader035.vdocuments.net/reader035/viewer/2022062521/56814d50550346895dba85d7/html5/thumbnails/28.jpg)
MapReduce for Google Local
Intersections Rendering Tiles Finding nearest gas stations