a homomorphism-based framework for systematic parallel programming with mapreduce
TRANSCRIPT
![Page 1: A Homomorphism-based Framework for Systematic Parallel Programming with MapReduce](https://reader031.vdocuments.net/reader031/viewer/2022030402/589bcece1a28ab92618b5425/html5/thumbnails/1.jpg)
A Homomorphism-based Framework forSystematic Parallel Programming with MapReduce
Yu Liu1, Zhenjiang Hu2
1 The Graduate University for Advanced Studies,Tokyo, [email protected]
2 National Institute of Informatics,Tokyo, Japan2 [email protected]
Mar. 10th, 2011
Yu Liu1, Zhenjiang Hu2 A Homomorphism-based Framework for Systematic Parallel Programming with MapReduce
![Page 2: A Homomorphism-based Framework for Systematic Parallel Programming with MapReduce](https://reader031.vdocuments.net/reader031/viewer/2022030402/589bcece1a28ab92618b5425/html5/thumbnails/2.jpg)
Background
MapReduce
Google’s MapReduce is a popular parallel-distributed programmingmodel, for processing large data sets. It has been the de factostandard for large scale data analysis.
Concepts from functional programming languages
Automatic parallel processing, fault tolerance
Good scalability
Yu Liu1, Zhenjiang Hu2 A Homomorphism-based Framework for Systematic Parallel Programming with MapReduce
![Page 3: A Homomorphism-based Framework for Systematic Parallel Programming with MapReduce](https://reader031.vdocuments.net/reader031/viewer/2022030402/589bcece1a28ab92618b5425/html5/thumbnails/3.jpg)
MapReduce
Yu Liu1, Zhenjiang Hu2 A Homomorphism-based Framework for Systematic Parallel Programming with MapReduce
![Page 4: A Homomorphism-based Framework for Systematic Parallel Programming with MapReduce](https://reader031.vdocuments.net/reader031/viewer/2022030402/589bcece1a28ab92618b5425/html5/thumbnails/4.jpg)
MapReduce
Yu Liu1, Zhenjiang Hu2 A Homomorphism-based Framework for Systematic Parallel Programming with MapReduce
![Page 5: A Homomorphism-based Framework for Systematic Parallel Programming with MapReduce](https://reader031.vdocuments.net/reader031/viewer/2022030402/589bcece1a28ab92618b5425/html5/thumbnails/5.jpg)
MapReduce
Yu Liu1, Zhenjiang Hu2 A Homomorphism-based Framework for Systematic Parallel Programming with MapReduce
![Page 6: A Homomorphism-based Framework for Systematic Parallel Programming with MapReduce](https://reader031.vdocuments.net/reader031/viewer/2022030402/589bcece1a28ab92618b5425/html5/thumbnails/6.jpg)
MapReduce
Yu Liu1, Zhenjiang Hu2 A Homomorphism-based Framework for Systematic Parallel Programming with MapReduce
![Page 7: A Homomorphism-based Framework for Systematic Parallel Programming with MapReduce](https://reader031.vdocuments.net/reader031/viewer/2022030402/589bcece1a28ab92618b5425/html5/thumbnails/7.jpg)
MapReduce
Yu Liu1, Zhenjiang Hu2 A Homomorphism-based Framework for Systematic Parallel Programming with MapReduce
![Page 8: A Homomorphism-based Framework for Systematic Parallel Programming with MapReduce](https://reader031.vdocuments.net/reader031/viewer/2022030402/589bcece1a28ab92618b5425/html5/thumbnails/8.jpg)
Programming with MapReduce
A user has to
design a D&C algorithm that fits MapReduce paradigm
map this algorithm to MapReduce.
Difficulties of programming with MapReduce
How to resolve the constrains on computing order.
How to resolve the data dependency.
Yu Liu1, Zhenjiang Hu2 A Homomorphism-based Framework for Systematic Parallel Programming with MapReduce
![Page 9: A Homomorphism-based Framework for Systematic Parallel Programming with MapReduce](https://reader031.vdocuments.net/reader031/viewer/2022030402/589bcece1a28ab92618b5425/html5/thumbnails/9.jpg)
Programming with MapReduce
A user has to
design a D&C algorithm that fits MapReduce paradigm
map this algorithm to MapReduce.
Difficulties of programming with MapReduce
How to resolve the constrains on computing order.
How to resolve the data dependency.
Yu Liu1, Zhenjiang Hu2 A Homomorphism-based Framework for Systematic Parallel Programming with MapReduce
![Page 10: A Homomorphism-based Framework for Systematic Parallel Programming with MapReduce](https://reader031.vdocuments.net/reader031/viewer/2022030402/589bcece1a28ab92618b5425/html5/thumbnails/10.jpg)
Example
The Maximum Prefix Sum problem
mps [3,−1, 4,−1,−5, 9, 2,−6, 5,−10] = 11
A sequential program for MPS in O(n) time
Yu Liu1, Zhenjiang Hu2 A Homomorphism-based Framework for Systematic Parallel Programming with MapReduce
![Page 11: A Homomorphism-based Framework for Systematic Parallel Programming with MapReduce](https://reader031.vdocuments.net/reader031/viewer/2022030402/589bcece1a28ab92618b5425/html5/thumbnails/11.jpg)
Example
The Maximum Prefix Sum problem
mps [3,−1, 4,−1,−5, 9, 2,−6, 5,−10] = 11
Hard to compute MPS with MapReduce
Computation has order.
MPS of sub-lists cannot be conquered directly.
Yu Liu1, Zhenjiang Hu2 A Homomorphism-based Framework for Systematic Parallel Programming with MapReduce
![Page 12: A Homomorphism-based Framework for Systematic Parallel Programming with MapReduce](https://reader031.vdocuments.net/reader031/viewer/2022030402/589bcece1a28ab92618b5425/html5/thumbnails/12.jpg)
Questions
Is there a systematic way to resolving such problems withMapReduce ?
How to handle the problems with district order ?
How to systematically design the divide-and-conqueralgorithm ?
Yu Liu1, Zhenjiang Hu2 A Homomorphism-based Framework for Systematic Parallel Programming with MapReduce
![Page 13: A Homomorphism-based Framework for Systematic Parallel Programming with MapReduce](https://reader031.vdocuments.net/reader031/viewer/2022030402/589bcece1a28ab92618b5425/html5/thumbnails/13.jpg)
Questions
Is there a systematic way to resolving such problems withMapReduce ?
How to handle the problems with district order ?
How to systematically design the divide-and-conqueralgorithm ?
Yu Liu1, Zhenjiang Hu2 A Homomorphism-based Framework for Systematic Parallel Programming with MapReduce
![Page 14: A Homomorphism-based Framework for Systematic Parallel Programming with MapReduce](https://reader031.vdocuments.net/reader031/viewer/2022030402/589bcece1a28ab92618b5425/html5/thumbnails/14.jpg)
Motivation and objective
We propose a systematic approach to automatically generate fullyparallelized and scalable MapReduce programs.A new framework which provides algorithmic programminginterfaces has been implemented.
Yu Liu1, Zhenjiang Hu2 A Homomorphism-based Framework for Systematic Parallel Programming with MapReduce
![Page 15: A Homomorphism-based Framework for Systematic Parallel Programming with MapReduce](https://reader031.vdocuments.net/reader031/viewer/2022030402/589bcece1a28ab92618b5425/html5/thumbnails/15.jpg)
A systematic approach for programming with MapReduce
Firstly, derive a function h.
Yu Liu1, Zhenjiang Hu2 A Homomorphism-based Framework for Systematic Parallel Programming with MapReduce
![Page 16: A Homomorphism-based Framework for Systematic Parallel Programming with MapReduce](https://reader031.vdocuments.net/reader031/viewer/2022030402/589bcece1a28ab92618b5425/html5/thumbnails/16.jpg)
A systematic approach for programming with MapReduce
Then write a inverse function h◦.
Yu Liu1, Zhenjiang Hu2 A Homomorphism-based Framework for Systematic Parallel Programming with MapReduce
![Page 17: A Homomorphism-based Framework for Systematic Parallel Programming with MapReduce](https://reader031.vdocuments.net/reader031/viewer/2022030402/589bcece1a28ab92618b5425/html5/thumbnails/17.jpg)
A systematic approach for programming with MapReduce
D&C algorithm can be gotten.
Yu Liu1, Zhenjiang Hu2 A Homomorphism-based Framework for Systematic Parallel Programming with MapReduce
![Page 18: A Homomorphism-based Framework for Systematic Parallel Programming with MapReduce](https://reader031.vdocuments.net/reader031/viewer/2022030402/589bcece1a28ab92618b5425/html5/thumbnails/18.jpg)
A systematic approach for programming with MapReduce
Map it to MapReduce paradigm.
Yu Liu1, Zhenjiang Hu2 A Homomorphism-based Framework for Systematic Parallel Programming with MapReduce
![Page 19: A Homomorphism-based Framework for Systematic Parallel Programming with MapReduce](https://reader031.vdocuments.net/reader031/viewer/2022030402/589bcece1a28ab92618b5425/html5/thumbnails/19.jpg)
A systematic approach for programming with MapReduce
Parallelization is in a black box.
Yu Liu1, Zhenjiang Hu2 A Homomorphism-based Framework for Systematic Parallel Programming with MapReduce
![Page 20: A Homomorphism-based Framework for Systematic Parallel Programming with MapReduce](https://reader031.vdocuments.net/reader031/viewer/2022030402/589bcece1a28ab92618b5425/html5/thumbnails/20.jpg)
A systematic approach for programming with MapReduce
Implemented by multi-phases MapReduce processing.
Yu Liu1, Zhenjiang Hu2 A Homomorphism-based Framework for Systematic Parallel Programming with MapReduce
![Page 21: A Homomorphism-based Framework for Systematic Parallel Programming with MapReduce](https://reader031.vdocuments.net/reader031/viewer/2022030402/589bcece1a28ab92618b5425/html5/thumbnails/21.jpg)
Conditions of this f function
Theorem
If there exists a binary operator � such that
f (xs ++ ys) = f xs � f ysthen such � can be defined as :x � y = f (f ◦x ++ f ◦x)
where ++ islistconcatenation.
Yu Liu1, Zhenjiang Hu2 A Homomorphism-based Framework for Systematic Parallel Programming with MapReduce
![Page 22: A Homomorphism-based Framework for Systematic Parallel Programming with MapReduce](https://reader031.vdocuments.net/reader031/viewer/2022030402/589bcece1a28ab92618b5425/html5/thumbnails/22.jpg)
Iff a function can be defined both rightwards and leftwards, such �exists. We can derive a divide-and-conquer algorithm like this:
Divide-and-conquer
f (xs ++ ys) = f (f ◦ (f xs) ++ f ◦ (f ys))
Such functions are so called: homomorphisms.
Yu Liu1, Zhenjiang Hu2 A Homomorphism-based Framework for Systematic Parallel Programming with MapReduce
![Page 23: A Homomorphism-based Framework for Systematic Parallel Programming with MapReduce](https://reader031.vdocuments.net/reader031/viewer/2022030402/589bcece1a28ab92618b5425/html5/thumbnails/23.jpg)
Programming Interface
Fold and unfold
fold :: [α]→ βunfold :: β → [α].
The implementation in Java
Yu Liu1, Zhenjiang Hu2 A Homomorphism-based Framework for Systematic Parallel Programming with MapReduce
![Page 24: A Homomorphism-based Framework for Systematic Parallel Programming with MapReduce](https://reader031.vdocuments.net/reader031/viewer/2022030402/589bcece1a28ab92618b5425/html5/thumbnails/24.jpg)
A function which computes MPS and its right inverse can bewritten as followings:
fold xs = mps 4 sum xsunfold (m, s) = [m, s −m]
Yu Liu1, Zhenjiang Hu2 A Homomorphism-based Framework for Systematic Parallel Programming with MapReduce
![Page 25: A Homomorphism-based Framework for Systematic Parallel Programming with MapReduce](https://reader031.vdocuments.net/reader031/viewer/2022030402/589bcece1a28ab92618b5425/html5/thumbnails/25.jpg)
The computation inside framework
Use fold and unfold functions doing the computation:
Yu Liu1, Zhenjiang Hu2 A Homomorphism-based Framework for Systematic Parallel Programming with MapReduce
![Page 26: A Homomorphism-based Framework for Systematic Parallel Programming with MapReduce](https://reader031.vdocuments.net/reader031/viewer/2022030402/589bcece1a28ab92618b5425/html5/thumbnails/26.jpg)
Autonomous intermediate data
Each record of the intermediate data has the information ofposition, thus the distribution of data is indifferent.
< id , val > → << parId , id >, val >
By taking use of sorting and grouping mechanism of MapReduceframework, lists can be reconstructed when necessary.
Yu Liu1, Zhenjiang Hu2 A Homomorphism-based Framework for Systematic Parallel Programming with MapReduce
![Page 27: A Homomorphism-based Framework for Systematic Parallel Programming with MapReduce](https://reader031.vdocuments.net/reader031/viewer/2022030402/589bcece1a28ab92618b5425/html5/thumbnails/27.jpg)
A formal definitation
homMR
homMR :: (α→ β)→ (β → β → β)→ {(ID, α)} → βhomMR f (⊕) = getValue ◦MapReduce mapper2 reducer2
◦MapReduce mapper1 reducer1where
mapper1 :: (ID, α)→ [((PID, ID), α)]mapper1 (i , a) = [(pid , i), a))]
where pid = makePid ireducer1 :: ((PID, ID), [α])→ ((PID, ID), β)reducer1 ((pid , j), ias) = ((pid , j), hom f (⊕) ias)
Yu Liu1, Zhenjiang Hu2 A Homomorphism-based Framework for Systematic Parallel Programming with MapReduce
![Page 28: A Homomorphism-based Framework for Systematic Parallel Programming with MapReduce](https://reader031.vdocuments.net/reader031/viewer/2022030402/589bcece1a28ab92618b5425/html5/thumbnails/28.jpg)
continued
mapper2 :: ((PID, ID), β)→ ((PID, ID), β)mapper2 ((pid , j), b) = ((c0, j), b)
where c0 is a predefined constant pidreducer2 :: ((PID, ID), [β])→ ((PID, ID), β)reducer2 ((c0, k), jbs) = ((c0, k), hom f (⊕) jbs)
getValue :: ((PID, ID), β)→ βgetValue ((c0, k), c) = c
Where, hom f (⊕) denotes a sequential version of ([f ,⊕]).
Yu Liu1, Zhenjiang Hu2 A Homomorphism-based Framework for Systematic Parallel Programming with MapReduce
![Page 29: A Homomorphism-based Framework for Systematic Parallel Programming with MapReduce](https://reader031.vdocuments.net/reader031/viewer/2022030402/589bcece1a28ab92618b5425/html5/thumbnails/29.jpg)
Actual user-program for MPS
http://screwdriver.googlecode.com
Yu Liu1, Zhenjiang Hu2 A Homomorphism-based Framework for Systematic Parallel Programming with MapReduce
![Page 30: A Homomorphism-based Framework for Systematic Parallel Programming with MapReduce](https://reader031.vdocuments.net/reader031/viewer/2022030402/589bcece1a28ab92618b5425/html5/thumbnails/30.jpg)
Performance evaluation
Environment: hardware
We configured clusters with 2, 4, 8, and 16 nodes. Eachcomputing/data node has two Xeon CPUs (Nocona, single-core,2.8 GHz), 2 GB memory. The nodes are connected with GigabitEthernet.
Environment: software
Linux2.6.26 ,Hadoop 0.21.0 +HDFS
Hadoop configuration: heap size= 1024MB
maximum mapper per node: 2
maximum reducer per node: 1
Yu Liu1, Zhenjiang Hu2 A Homomorphism-based Framework for Systematic Parallel Programming with MapReduce
![Page 31: A Homomorphism-based Framework for Systematic Parallel Programming with MapReduce](https://reader031.vdocuments.net/reader031/viewer/2022030402/589bcece1a28ab92618b5425/html5/thumbnails/31.jpg)
Test cases
We implemented several programs for three problems on ourframework and Hadoop:
1 the maximum-prefix-sum problem.
MPS-lh is implemented using our framework’ API.MPS-mr is implemented by Hadoop API.
2 parallel sum of 64-bit integers
SUM-lh is implemented by our framework’ API.SUM-mr is implemented by Hadoop API.
3 VAR-lh computes the variance of 32-bit floating-pointnumbers;
Yu Liu1, Zhenjiang Hu2 A Homomorphism-based Framework for Systematic Parallel Programming with MapReduce
![Page 32: A Homomorphism-based Framework for Systematic Parallel Programming with MapReduce](https://reader031.vdocuments.net/reader031/viewer/2022030402/589bcece1a28ab92618b5425/html5/thumbnails/32.jpg)
Test cases
Test data
100 million 64-bit integers (2.87 GB) for MPS, SUM.100 million 32-bit floating-point numbers (2.76 GB) for VAR.
Yu Liu1, Zhenjiang Hu2 A Homomorphism-based Framework for Systematic Parallel Programming with MapReduce
![Page 33: A Homomorphism-based Framework for Systematic Parallel Programming with MapReduce](https://reader031.vdocuments.net/reader031/viewer/2022030402/589bcece1a28ab92618b5425/html5/thumbnails/33.jpg)
Performance
The experiment results are summarized :
With 16 nodes speedup of all cases are more than 7.
Yu Liu1, Zhenjiang Hu2 A Homomorphism-based Framework for Systematic Parallel Programming with MapReduce
![Page 34: A Homomorphism-based Framework for Systematic Parallel Programming with MapReduce](https://reader031.vdocuments.net/reader031/viewer/2022030402/589bcece1a28ab92618b5425/html5/thumbnails/34.jpg)
Performance
Time curves:
Yu Liu1, Zhenjiang Hu2 A Homomorphism-based Framework for Systematic Parallel Programming with MapReduce
![Page 35: A Homomorphism-based Framework for Systematic Parallel Programming with MapReduce](https://reader031.vdocuments.net/reader031/viewer/2022030402/589bcece1a28ab92618b5425/html5/thumbnails/35.jpg)
Concluding remarks
In this research:
Introduced a systematic way of parallel programming onMapReduce.
Developed a framework on top of Hadoop.
Algorithmic programming interfaces let user can focus on thealgebraic properties of problem.Details of MapReduce are hidden.Achieved good scalability and parallelism.Automatic optimization can be equipped.
Yu Liu1, Zhenjiang Hu2 A Homomorphism-based Framework for Systematic Parallel Programming with MapReduce
![Page 36: A Homomorphism-based Framework for Systematic Parallel Programming with MapReduce](https://reader031.vdocuments.net/reader031/viewer/2022030402/589bcece1a28ab92618b5425/html5/thumbnails/36.jpg)
Concluding remarks
In this research:
Introduced a systematic way of parallel programming onMapReduce.
Developed a framework on top of Hadoop.
Algorithmic programming interfaces let user can focus on thealgebraic properties of problem.Details of MapReduce are hidden.Achieved good scalability and parallelism.Automatic optimization can be equipped.
Yu Liu1, Zhenjiang Hu2 A Homomorphism-based Framework for Systematic Parallel Programming with MapReduce
![Page 37: A Homomorphism-based Framework for Systematic Parallel Programming with MapReduce](https://reader031.vdocuments.net/reader031/viewer/2022030402/589bcece1a28ab92618b5425/html5/thumbnails/37.jpg)
Concluding remarks
In this research:
Introduced a systematic way of parallel programming onMapReduce.
Developed a framework on top of Hadoop.
Algorithmic programming interfaces let user can focus on thealgebraic properties of problem.
Details of MapReduce are hidden.Achieved good scalability and parallelism.Automatic optimization can be equipped.
Yu Liu1, Zhenjiang Hu2 A Homomorphism-based Framework for Systematic Parallel Programming with MapReduce
![Page 38: A Homomorphism-based Framework for Systematic Parallel Programming with MapReduce](https://reader031.vdocuments.net/reader031/viewer/2022030402/589bcece1a28ab92618b5425/html5/thumbnails/38.jpg)
Concluding remarks
In this research:
Introduced a systematic way of parallel programming onMapReduce.
Developed a framework on top of Hadoop.
Algorithmic programming interfaces let user can focus on thealgebraic properties of problem.Details of MapReduce are hidden.
Achieved good scalability and parallelism.Automatic optimization can be equipped.
Yu Liu1, Zhenjiang Hu2 A Homomorphism-based Framework for Systematic Parallel Programming with MapReduce
![Page 39: A Homomorphism-based Framework for Systematic Parallel Programming with MapReduce](https://reader031.vdocuments.net/reader031/viewer/2022030402/589bcece1a28ab92618b5425/html5/thumbnails/39.jpg)
Concluding remarks
In this research:
Introduced a systematic way of parallel programming onMapReduce.
Developed a framework on top of Hadoop.
Algorithmic programming interfaces let user can focus on thealgebraic properties of problem.Details of MapReduce are hidden.Achieved good scalability and parallelism.
Automatic optimization can be equipped.
Yu Liu1, Zhenjiang Hu2 A Homomorphism-based Framework for Systematic Parallel Programming with MapReduce
![Page 40: A Homomorphism-based Framework for Systematic Parallel Programming with MapReduce](https://reader031.vdocuments.net/reader031/viewer/2022030402/589bcece1a28ab92618b5425/html5/thumbnails/40.jpg)
Concluding remarks
In this research:
Introduced a systematic way of parallel programming onMapReduce.
Developed a framework on top of Hadoop.
Algorithmic programming interfaces let user can focus on thealgebraic properties of problem.Details of MapReduce are hidden.Achieved good scalability and parallelism.Automatic optimization can be equipped.
Yu Liu1, Zhenjiang Hu2 A Homomorphism-based Framework for Systematic Parallel Programming with MapReduce
![Page 41: A Homomorphism-based Framework for Systematic Parallel Programming with MapReduce](https://reader031.vdocuments.net/reader031/viewer/2022030402/589bcece1a28ab92618b5425/html5/thumbnails/41.jpg)
Future work
Decrease the system overhead and do more optimization.
Extend to more complex data structure such as tree andgraph.
Yu Liu1, Zhenjiang Hu2 A Homomorphism-based Framework for Systematic Parallel Programming with MapReduce
![Page 42: A Homomorphism-based Framework for Systematic Parallel Programming with MapReduce](https://reader031.vdocuments.net/reader031/viewer/2022030402/589bcece1a28ab92618b5425/html5/thumbnails/42.jpg)
Related work
Parallel programming with list homomorphisms (M.Cole 95)
The Third Homomorphism Theorem(J.Gibbons 96).
Systematic extraction and implementation ofdivide-and-conquer parallelism (Gorlatch PLILP96).
Automatic inversion generates divide-and-conquer parallelprograms(Morita et.al., PLDI07).
The third homomorphism theorem on trees: downward &upward lead to divide-and-conquer (Morihata, POPL09)
Yu Liu1, Zhenjiang Hu2 A Homomorphism-based Framework for Systematic Parallel Programming with MapReduce
![Page 43: A Homomorphism-based Framework for Systematic Parallel Programming with MapReduce](https://reader031.vdocuments.net/reader031/viewer/2022030402/589bcece1a28ab92618b5425/html5/thumbnails/43.jpg)
Thank you very much.Questions?
Yu Liu1, Zhenjiang Hu2 A Homomorphism-based Framework for Systematic Parallel Programming with MapReduce
![Page 44: A Homomorphism-based Framework for Systematic Parallel Programming with MapReduce](https://reader031.vdocuments.net/reader031/viewer/2022030402/589bcece1a28ab92618b5425/html5/thumbnails/44.jpg)
List Homomorphism
Function h is said to be a list homomorphism
If there are a function f and an associative operator � such thatfor any list x and list y
h [a] = f ah (x ++ y) = h(x)� h(y).
Where ++ is the list concatenation.
Instance of a list homomorphism
sum [a] = asum (x ++ y) = sum x + sum y .
Yu Liu1, Zhenjiang Hu2 A Homomorphism-based Framework for Systematic Parallel Programming with MapReduce
![Page 45: A Homomorphism-based Framework for Systematic Parallel Programming with MapReduce](https://reader031.vdocuments.net/reader031/viewer/2022030402/589bcece1a28ab92618b5425/html5/thumbnails/45.jpg)
Theorem (The Third Homomorphism Theorem (Gibbons,96) )
Let h be a given function and and � be binary operators. If thefollowing two equations hold for any element a and list y
h ([a] ++ y) = a h yh (y ++ [a]) = h y � a
then the function h is a homomorphism.
In fact, for a function h, if we have one of its right inverse h◦ thatsatisfies h ◦ h◦ ◦ h = h, then we can obtain the list-homomorphicdefinition as follows.
h = ([f ,�]) wheref a = h [a]l � r = h (h◦ l ++ h◦ r)
Yu Liu1, Zhenjiang Hu2 A Homomorphism-based Framework for Systematic Parallel Programming with MapReduce
![Page 46: A Homomorphism-based Framework for Systematic Parallel Programming with MapReduce](https://reader031.vdocuments.net/reader031/viewer/2022030402/589bcece1a28ab92618b5425/html5/thumbnails/46.jpg)
MapReduce programs can be automatically obtained bytwo sequential functions
homomorphism ([f ,⊕])
f :: a→ b⊕ :: b → b → b(a⊕ b)⊕ c = a⊕ (b ⊕ c).
fold and unfold, that compose leftwards and rightwards functions
fold([a] ++ x) = fold([a] ++ unfold(fold(x)))fold(x ++ [a]) = fold(unfold(fold(x)) ++ [a]).
Yu Liu1, Zhenjiang Hu2 A Homomorphism-based Framework for Systematic Parallel Programming with MapReduce
![Page 47: A Homomorphism-based Framework for Systematic Parallel Programming with MapReduce](https://reader031.vdocuments.net/reader031/viewer/2022030402/589bcece1a28ab92618b5425/html5/thumbnails/47.jpg)
Currently, Screwdriver provides two kinds of programminginterfaces:
Programming interface corresponding to definition of listhomomorphism;
Programming interface corresponding to the 3rdhomomorphism theorem.
Yu Liu1, Zhenjiang Hu2 A Homomorphism-based Framework for Systematic Parallel Programming with MapReduce
![Page 48: A Homomorphism-based Framework for Systematic Parallel Programming with MapReduce](https://reader031.vdocuments.net/reader031/viewer/2022030402/589bcece1a28ab92618b5425/html5/thumbnails/48.jpg)
Basic Homomorphism-Programming Interface
Two functions which define an homomorphism
filter :: a→ bplus :: b → b → b.
The implementation in Java
Yu Liu1, Zhenjiang Hu2 A Homomorphism-based Framework for Systematic Parallel Programming with MapReduce
![Page 49: A Homomorphism-based Framework for Systematic Parallel Programming with MapReduce](https://reader031.vdocuments.net/reader031/viewer/2022030402/589bcece1a28ab92618b5425/html5/thumbnails/49.jpg)
Programming Interface based on the 3rd homomorphismtheorem
A function and its right inverse
fold :: [a]→ bunfold :: b → [a].
The implementation in Java
Yu Liu1, Zhenjiang Hu2 A Homomorphism-based Framework for Systematic Parallel Programming with MapReduce
![Page 50: A Homomorphism-based Framework for Systematic Parallel Programming with MapReduce](https://reader031.vdocuments.net/reader031/viewer/2022030402/589bcece1a28ab92618b5425/html5/thumbnails/50.jpg)
The implementation of Screwdriver : list representation
To implement our programming interface with Hadoop, we need toconsider how to represent lists in a distributed manner.
Input data: index-value pairs
We use integer as the index’s type, the list [a, b, c , d , e] isrepresented by {(3, d), (1, b), (2, c), (0, a), (4, e)}.
Yu Liu1, Zhenjiang Hu2 A Homomorphism-based Framework for Systematic Parallel Programming with MapReduce
![Page 51: A Homomorphism-based Framework for Systematic Parallel Programming with MapReduce](https://reader031.vdocuments.net/reader031/viewer/2022030402/589bcece1a28ab92618b5425/html5/thumbnails/51.jpg)
Partition of input list
The pid(partition-id) of type PID is the index of a partial list. Theframework produces a same pid for the records which will begrouped together. These records have continues id .
Intermediate data: nested pairs ((pid , id), val)
Suppose the above list was divided to two parts and in differentnodes, then they are represented as{((0, 1), b), ((0, 2), c), ((0, 0), a)} and {((1, 3), d), ((1, 4), e)}.
Yu Liu1, Zhenjiang Hu2 A Homomorphism-based Framework for Systematic Parallel Programming with MapReduce
![Page 52: A Homomorphism-based Framework for Systematic Parallel Programming with MapReduce](https://reader031.vdocuments.net/reader031/viewer/2022030402/589bcece1a28ab92618b5425/html5/thumbnails/52.jpg)
Grouping and sorting of intermediate data
We defined two functions: the comparatorG and comparatorS asfollows:
comparatorG (pid1, id1) (pid2, id2) = if pid1 == pid2
then 0else − 1
comparatorS (pid1, id1) (pid2, id2) = if id1 > id2then 1else − 1
for grouping intermediate records with same pid and sorting themby id .
Yu Liu1, Zhenjiang Hu2 A Homomorphism-based Framework for Systematic Parallel Programming with MapReduce
![Page 53: A Homomorphism-based Framework for Systematic Parallel Programming with MapReduce](https://reader031.vdocuments.net/reader031/viewer/2022030402/589bcece1a28ab92618b5425/html5/thumbnails/53.jpg)
Data partition
1 In MAP task,
intermediate records with same pid are grouped together andsorted by id .a partitioner dispatches the groups to different reducers.
2 In REDUCE task, reducers apply merge-sort on all groupswith same pid
Yu Liu1, Zhenjiang Hu2 A Homomorphism-based Framework for Systematic Parallel Programming with MapReduce