mapreduce with parallelizable reduce - dimacs
TRANSCRIPT
![Page 1: Mapreduce With Parallelizable Reduce - Dimacs](https://reader030.vdocuments.net/reader030/viewer/2022021006/62038d68da24ad121e4abe60/html5/thumbnails/1.jpg)
Mapreduce With Parallelizable Reduce
S. Muthu Muthukrishnan
![Page 2: Mapreduce With Parallelizable Reduce - Dimacs](https://reader030.vdocuments.net/reader030/viewer/2022021006/62038d68da24ad121e4abe60/html5/thumbnails/2.jpg)
Some Premises
I At a deliberately high level, we know the MapReducesystem.
I Parallel. Map and Reduce functions. Used when data islarge. Changing system.
I There is nice PRAM theory of parallel algorithms.I NC, prefix sums, list ranking, and more.
I Goal: Develop a useful theory of MapReduce algorithms.I An algorithmus role. Interesting problems, algorithms.
Bridge from the other side.
![Page 3: Mapreduce With Parallelizable Reduce - Dimacs](https://reader030.vdocuments.net/reader030/viewer/2022021006/62038d68da24ad121e4abe60/html5/thumbnails/3.jpg)
Some Premises
I At a deliberately high level, we know the MapReducesystem.I Parallel. Map and Reduce functions. Used when data is
large. Changing system.
I There is nice PRAM theory of parallel algorithms.I NC, prefix sums, list ranking, and more.
I Goal: Develop a useful theory of MapReduce algorithms.I An algorithmus role. Interesting problems, algorithms.
Bridge from the other side.
![Page 4: Mapreduce With Parallelizable Reduce - Dimacs](https://reader030.vdocuments.net/reader030/viewer/2022021006/62038d68da24ad121e4abe60/html5/thumbnails/4.jpg)
Some Premises
I At a deliberately high level, we know the MapReducesystem.I Parallel. Map and Reduce functions. Used when data is
large. Changing system.I There is nice PRAM theory of parallel algorithms.
I NC, prefix sums, list ranking, and more.I Goal: Develop a useful theory of MapReduce algorithms.
I An algorithmus role. Interesting problems, algorithms.Bridge from the other side.
![Page 5: Mapreduce With Parallelizable Reduce - Dimacs](https://reader030.vdocuments.net/reader030/viewer/2022021006/62038d68da24ad121e4abe60/html5/thumbnails/5.jpg)
Some Premises
I At a deliberately high level, we know the MapReducesystem.I Parallel. Map and Reduce functions. Used when data is
large. Changing system.I There is nice PRAM theory of parallel algorithms.
I NC, prefix sums, list ranking, and more.
I Goal: Develop a useful theory of MapReduce algorithms.I An algorithmus role. Interesting problems, algorithms.
Bridge from the other side.
![Page 6: Mapreduce With Parallelizable Reduce - Dimacs](https://reader030.vdocuments.net/reader030/viewer/2022021006/62038d68da24ad121e4abe60/html5/thumbnails/6.jpg)
Some Premises
I At a deliberately high level, we know the MapReducesystem.I Parallel. Map and Reduce functions. Used when data is
large. Changing system.I There is nice PRAM theory of parallel algorithms.
I NC, prefix sums, list ranking, and more.I Goal: Develop a useful theory of MapReduce algorithms.
I An algorithmus role. Interesting problems, algorithms.Bridge from the other side.
![Page 7: Mapreduce With Parallelizable Reduce - Dimacs](https://reader030.vdocuments.net/reader030/viewer/2022021006/62038d68da24ad121e4abe60/html5/thumbnails/7.jpg)
Some Premises
I At a deliberately high level, we know the MapReducesystem.I Parallel. Map and Reduce functions. Used when data is
large. Changing system.I There is nice PRAM theory of parallel algorithms.
I NC, prefix sums, list ranking, and more.I Goal: Develop a useful theory of MapReduce algorithms.
I An algorithmus role. Interesting problems, algorithms.Bridge from the other side.
![Page 8: Mapreduce With Parallelizable Reduce - Dimacs](https://reader030.vdocuments.net/reader030/viewer/2022021006/62038d68da24ad121e4abe60/html5/thumbnails/8.jpg)
Thoughts Circa 2006
I Prefix sum in O(1) rounds.
I Problem: A[1; : : : ;n ] ) PA[1; � � � ;n ] wherePA[i ] =
Pj�i A[j ].
I Solution:I Assign A[i
pn + 1; � � � ; (i + 1)
pn ] to key i .
I Solve problem on B [1;
pn ] with one proc,
B [i ] =P(i+1)
pn
ip
n+1 A[j ]. Doable?I Solve problem for key i with PB [i � 1]. Doable?
I List ranking in O(1) rounds?I Some graph algorithms in O(1) rounds recently.
![Page 9: Mapreduce With Parallelizable Reduce - Dimacs](https://reader030.vdocuments.net/reader030/viewer/2022021006/62038d68da24ad121e4abe60/html5/thumbnails/9.jpg)
Thoughts Circa 2006
I Prefix sum in O(1) rounds.I Problem: A[1; : : : ;n ] ) PA[1; � � � ;n ] where
PA[i ] =P
j�i A[j ].
I Solution:I Assign A[i
pn + 1; � � � ; (i + 1)
pn ] to key i .
I Solve problem on B [1;
pn ] with one proc,
B [i ] =P(i+1)
pn
ip
n+1 A[j ]. Doable?I Solve problem for key i with PB [i � 1]. Doable?
I List ranking in O(1) rounds?I Some graph algorithms in O(1) rounds recently.
![Page 10: Mapreduce With Parallelizable Reduce - Dimacs](https://reader030.vdocuments.net/reader030/viewer/2022021006/62038d68da24ad121e4abe60/html5/thumbnails/10.jpg)
Thoughts Circa 2006
I Prefix sum in O(1) rounds.I Problem: A[1; : : : ;n ] ) PA[1; � � � ;n ] where
PA[i ] =P
j�i A[j ].I Solution:
I Assign A[ip
n + 1; � � � ; (i + 1)p
n ] to key i .
I Solve problem on B [1;
pn ] with one proc,
B [i ] =P(i+1)
pn
ip
n+1 A[j ]. Doable?I Solve problem for key i with PB [i � 1]. Doable?
I List ranking in O(1) rounds?I Some graph algorithms in O(1) rounds recently.
![Page 11: Mapreduce With Parallelizable Reduce - Dimacs](https://reader030.vdocuments.net/reader030/viewer/2022021006/62038d68da24ad121e4abe60/html5/thumbnails/11.jpg)
Thoughts Circa 2006
I Prefix sum in O(1) rounds.I Problem: A[1; : : : ;n ] ) PA[1; � � � ;n ] where
PA[i ] =P
j�i A[j ].I Solution:
I Assign A[ip
n + 1; � � � ; (i + 1)p
n ] to key i .I Solve problem on B [1;
pn ] with one proc,
B [i ] =P(i+1)
pn
ip
n+1 A[j ]. Doable?
I Solve problem for key i with PB [i � 1]. Doable?
I List ranking in O(1) rounds?I Some graph algorithms in O(1) rounds recently.
![Page 12: Mapreduce With Parallelizable Reduce - Dimacs](https://reader030.vdocuments.net/reader030/viewer/2022021006/62038d68da24ad121e4abe60/html5/thumbnails/12.jpg)
Thoughts Circa 2006
I Prefix sum in O(1) rounds.I Problem: A[1; : : : ;n ] ) PA[1; � � � ;n ] where
PA[i ] =P
j�i A[j ].I Solution:
I Assign A[ip
n + 1; � � � ; (i + 1)p
n ] to key i .I Solve problem on B [1;
pn ] with one proc,
B [i ] =P(i+1)
pn
ip
n+1 A[j ]. Doable?I Solve problem for key i with PB [i � 1]. Doable?
I List ranking in O(1) rounds?I Some graph algorithms in O(1) rounds recently.
![Page 13: Mapreduce With Parallelizable Reduce - Dimacs](https://reader030.vdocuments.net/reader030/viewer/2022021006/62038d68da24ad121e4abe60/html5/thumbnails/13.jpg)
Thoughts Circa 2006
I Prefix sum in O(1) rounds.I Problem: A[1; : : : ;n ] ) PA[1; � � � ;n ] where
PA[i ] =P
j�i A[j ].I Solution:
I Assign A[ip
n + 1; � � � ; (i + 1)p
n ] to key i .I Solve problem on B [1;
pn ] with one proc,
B [i ] =P(i+1)
pn
ip
n+1 A[j ]. Doable?I Solve problem for key i with PB [i � 1]. Doable?
I List ranking in O(1) rounds?I Some graph algorithms in O(1) rounds recently.
![Page 14: Mapreduce With Parallelizable Reduce - Dimacs](https://reader030.vdocuments.net/reader030/viewer/2022021006/62038d68da24ad121e4abe60/html5/thumbnails/14.jpg)
SIROCCO Challenge
I Problem: Given graph G = (V ;E), count the number oftriangles.1
I Solution:I For each edge (u ; v), generate a tuple (u ; v ; 0).I For each vertex v and for each pair of neighbors x ; z of v ,
generate a tuple (x ; z ; 1).I Presence of both 0 and 1 tuple for an edge is a triangle.
I Solution: The number of triangles isP
i �3i
6 where �i areeigenvalues of adjacency matrix A of G in sorted order.I A3
ii is the number of triangles involving i .I The trace is 6 times the number of triangles.I If � is eigenvalue of A, ie., Ax = �x , then �3 is eigenvalue
of A3.I In practice, computing top few eigenvalues suffices.
1For ex, see. Fast Counting of Triangles in Large Real Networks withoutcounting: Algorithms and Laws, ICDM 08, by C. Tsourakakis.
![Page 15: Mapreduce With Parallelizable Reduce - Dimacs](https://reader030.vdocuments.net/reader030/viewer/2022021006/62038d68da24ad121e4abe60/html5/thumbnails/15.jpg)
SIROCCO Challenge
I Problem: Given graph G = (V ;E), count the number oftriangles.1
I Solution:I For each edge (u ; v), generate a tuple (u ; v ; 0).I For each vertex v and for each pair of neighbors x ; z of v ,
generate a tuple (x ; z ; 1).I Presence of both 0 and 1 tuple for an edge is a triangle.
I Solution: The number of triangles isP
i �3i
6 where �i areeigenvalues of adjacency matrix A of G in sorted order.I A3
ii is the number of triangles involving i .I The trace is 6 times the number of triangles.I If � is eigenvalue of A, ie., Ax = �x , then �3 is eigenvalue
of A3.I In practice, computing top few eigenvalues suffices.
1For ex, see. Fast Counting of Triangles in Large Real Networks withoutcounting: Algorithms and Laws, ICDM 08, by C. Tsourakakis.
![Page 16: Mapreduce With Parallelizable Reduce - Dimacs](https://reader030.vdocuments.net/reader030/viewer/2022021006/62038d68da24ad121e4abe60/html5/thumbnails/16.jpg)
SIROCCO Challenge
I Problem: Given graph G = (V ;E), count the number oftriangles.1
I Solution:I For each edge (u ; v), generate a tuple (u ; v ; 0).I For each vertex v and for each pair of neighbors x ; z of v ,
generate a tuple (x ; z ; 1).I Presence of both 0 and 1 tuple for an edge is a triangle.
I Solution: The number of triangles isP
i �3i
6 where �i areeigenvalues of adjacency matrix A of G in sorted order.I A3
ii is the number of triangles involving i .I The trace is 6 times the number of triangles.I If � is eigenvalue of A, ie., Ax = �x , then �3 is eigenvalue
of A3.I In practice, computing top few eigenvalues suffices.
1For ex, see. Fast Counting of Triangles in Large Real Networks withoutcounting: Algorithms and Laws, ICDM 08, by C. Tsourakakis.
![Page 17: Mapreduce With Parallelizable Reduce - Dimacs](https://reader030.vdocuments.net/reader030/viewer/2022021006/62038d68da24ad121e4abe60/html5/thumbnails/17.jpg)
Eigenvalue Estimation
A is a n � n real valued matrix.
I Lanczos method.
I Sketches. Ar for pseudo random n � d vector r , d << n .Will O(nd) sketch fit into one machine?
![Page 18: Mapreduce With Parallelizable Reduce - Dimacs](https://reader030.vdocuments.net/reader030/viewer/2022021006/62038d68da24ad121e4abe60/html5/thumbnails/18.jpg)
Eigenvalue Estimation
A is a n � n real valued matrix.
I Lanczos method.I Sketches. Ar for pseudo random n � d vector r , d << n .
Will O(nd) sketch fit into one machine?
![Page 19: Mapreduce With Parallelizable Reduce - Dimacs](https://reader030.vdocuments.net/reader030/viewer/2022021006/62038d68da24ad121e4abe60/html5/thumbnails/19.jpg)
Special Case
Motivation: Logs processing.
x = inputrecord;x-squared = x * x;aggregator: table sum;emit aggregator <- x-squared;
MUD Algorithm m = (�;�; �).I Local function � : �! Q maps input item to a message.I Aggregator � : Q �Q ! Q maps two messages to a single
message.I Post-processing operator � : Q ! � produces the final
output, applying mT (x).I Computes a function f if �(mT (�)) = f for all trees T .
![Page 20: Mapreduce With Parallelizable Reduce - Dimacs](https://reader030.vdocuments.net/reader030/viewer/2022021006/62038d68da24ad121e4abe60/html5/thumbnails/20.jpg)
MUD Examples
�(x ) = hx ; x i�(ha1; b1i; ha2; b2i) = hmin(a1; a2);max(b1; b2)i
�(ha ; bi) = b � a
Figure: mud algorithm for computing the total span (left)
![Page 21: Mapreduce With Parallelizable Reduce - Dimacs](https://reader030.vdocuments.net/reader030/viewer/2022021006/62038d68da24ad121e4abe60/html5/thumbnails/21.jpg)
MUD Examples
�(x ) = hx ; h(x ); 1i�(ha1; h(a1); c1i; ha2; h(a2); c2i)
=
(hai ; h(ai ); ci i if h(ai ) < h(aj )
ha1; h(a1); c1 + c2i otherwise
�(ha ; b; ci) = a if c = 1
Figure: Mud algorithms for computing a uniform random sample ofthe unique items in a set (right). Here h is an approximate minwisehash function.
![Page 22: Mapreduce With Parallelizable Reduce - Dimacs](https://reader030.vdocuments.net/reader030/viewer/2022021006/62038d68da24ad121e4abe60/html5/thumbnails/22.jpg)
Streaming
I streaming algorithm s = (�; �).I operator � : Q � �! QI � : Q ! � converts the final state to the output.I On input x 2 �n , the streaming algorithm computes
f = �(s0(x)), where 0 is the starting state, andsq(x) = �(�(: : : �(�(q ; x1); x2); : : : ; xk�1); xk ).
I Communication complexity is log jQ j
![Page 23: Mapreduce With Parallelizable Reduce - Dimacs](https://reader030.vdocuments.net/reader030/viewer/2022021006/62038d68da24ad121e4abe60/html5/thumbnails/23.jpg)
MUD vs Streaming
I For a mud algorithm m = (�;�; �), there is a streamingalgorithm s = (�; �) of the same complexity with sameoutput, by setting �(q ; x ) = �(q ;�(x )).
I Central question: Can MUD simulate streaming?I Count the occurrences of the first odd number on the
stream.I Symmetric problems? Symmetric index problem.
S = (a; 1; x1; p); (a; 2; x2; p); : : : ; (a; 2; xn ; p);
(b; 1; y1; q); (b; 2; y2; q); : : : ; (b; 2; yn ; q):
Additionally, we have xq = yp . Compute functionf (S) = xq .
![Page 24: Mapreduce With Parallelizable Reduce - Dimacs](https://reader030.vdocuments.net/reader030/viewer/2022021006/62038d68da24ad121e4abe60/html5/thumbnails/24.jpg)
MUD vs Streaming
I For a mud algorithm m = (�;�; �), there is a streamingalgorithm s = (�; �) of the same complexity with sameoutput, by setting �(q ; x ) = �(q ;�(x )).
I Central question: Can MUD simulate streaming?
I Count the occurrences of the first odd number on thestream.
I Symmetric problems? Symmetric index problem.
S = (a; 1; x1; p); (a; 2; x2; p); : : : ; (a; 2; xn ; p);
(b; 1; y1; q); (b; 2; y2; q); : : : ; (b; 2; yn ; q):
Additionally, we have xq = yp . Compute functionf (S) = xq .
![Page 25: Mapreduce With Parallelizable Reduce - Dimacs](https://reader030.vdocuments.net/reader030/viewer/2022021006/62038d68da24ad121e4abe60/html5/thumbnails/25.jpg)
MUD vs Streaming
I For a mud algorithm m = (�;�; �), there is a streamingalgorithm s = (�; �) of the same complexity with sameoutput, by setting �(q ; x ) = �(q ;�(x )).
I Central question: Can MUD simulate streaming?I Count the occurrences of the first odd number on the
stream.
I Symmetric problems? Symmetric index problem.
S = (a; 1; x1; p); (a; 2; x2; p); : : : ; (a; 2; xn ; p);
(b; 1; y1; q); (b; 2; y2; q); : : : ; (b; 2; yn ; q):
Additionally, we have xq = yp . Compute functionf (S) = xq .
![Page 26: Mapreduce With Parallelizable Reduce - Dimacs](https://reader030.vdocuments.net/reader030/viewer/2022021006/62038d68da24ad121e4abe60/html5/thumbnails/26.jpg)
MUD vs Streaming
I For a mud algorithm m = (�;�; �), there is a streamingalgorithm s = (�; �) of the same complexity with sameoutput, by setting �(q ; x ) = �(q ;�(x )).
I Central question: Can MUD simulate streaming?I Count the occurrences of the first odd number on the
stream.I Symmetric problems? Symmetric index problem.
S = (a; 1; x1; p); (a; 2; x2; p); : : : ; (a; 2; xn ; p);
(b; 1; y1; q); (b; 2; y2; q); : : : ; (b; 2; yn ; q):
Additionally, we have xq = yp . Compute functionf (S) = xq .
![Page 27: Mapreduce With Parallelizable Reduce - Dimacs](https://reader030.vdocuments.net/reader030/viewer/2022021006/62038d68da24ad121e4abe60/html5/thumbnails/27.jpg)
MUD vs Streaming
For any symmetric function f : �n ! � computed by ag(n)-space, c(n)-communication streaming algorithm (�; �),with g(n) = (logn) and c(n) = (logn),
there exists a O(c(n))-communication, O(g2(n))-space mudalgorithm (�;�; �) that also computes f .
![Page 28: Mapreduce With Parallelizable Reduce - Dimacs](https://reader030.vdocuments.net/reader030/viewer/2022021006/62038d68da24ad121e4abe60/html5/thumbnails/28.jpg)
MUD vs Streaming
For any symmetric function f : �n ! � computed by ag(n)-space, c(n)-communication streaming algorithm (�; �),with g(n) = (logn) and c(n) = (logn),
there exists a O(c(n))-communication, O(g2(n))-space mudalgorithm (�;�; �) that also computes f .
![Page 29: Mapreduce With Parallelizable Reduce - Dimacs](https://reader030.vdocuments.net/reader030/viewer/2022021006/62038d68da24ad121e4abe60/html5/thumbnails/29.jpg)
MUD vs Streaming: 2 parties
I xA and xB are partitions of the input sequence x sent toAlice and Bob.
I Alice runs the streaming algorithm on her input sequenceto produce the state qA = s0(xA), and sends this to Carol.Similarly, Bob sends qB = s0(xB ) to Carol.
I Carol receives the states qA and qB , which contain the sizesnA and nB of the input sequences xA and xB , and needs tocalculate f = s0(xAjjxB ).
![Page 30: Mapreduce With Parallelizable Reduce - Dimacs](https://reader030.vdocuments.net/reader030/viewer/2022021006/62038d68da24ad121e4abe60/html5/thumbnails/30.jpg)
MUD vs Streaming: 2 parties
I xA and xB are partitions of the input sequence x sent toAlice and Bob.
I Alice runs the streaming algorithm on her input sequenceto produce the state qA = s0(xA), and sends this to Carol.Similarly, Bob sends qB = s0(xB ) to Carol.
I Carol receives the states qA and qB , which contain the sizesnA and nB of the input sequences xA and xB , and needs tocalculate f = s0(xAjjxB ).
![Page 31: Mapreduce With Parallelizable Reduce - Dimacs](https://reader030.vdocuments.net/reader030/viewer/2022021006/62038d68da24ad121e4abe60/html5/thumbnails/31.jpg)
MUD vs Streaming: 2 parties
I xA and xB are partitions of the input sequence x sent toAlice and Bob.
I Alice runs the streaming algorithm on her input sequenceto produce the state qA = s0(xA), and sends this to Carol.Similarly, Bob sends qB = s0(xB ) to Carol.
I Carol receives the states qA and qB , which contain the sizesnA and nB of the input sequences xA and xB , and needs tocalculate f = s0(xAjjxB ).
![Page 32: Mapreduce With Parallelizable Reduce - Dimacs](https://reader030.vdocuments.net/reader030/viewer/2022021006/62038d68da24ad121e4abe60/html5/thumbnails/32.jpg)
2 Parties Communication
I Carol finds sequences x0A and x0B of length nA and nB suchthat qA = s0(x0A) and qB = s0(x0B ).
I Carol then outputs �(s0(x0A � x0B )).
�(s0(x0A � x0B )) = �(s0(xA � x0B ))= �(s0(x0B � xA))
= �(s0(xB � xA))
= �(s0(xA � xB ))
= f (xA � xB )
= f (x):
![Page 33: Mapreduce With Parallelizable Reduce - Dimacs](https://reader030.vdocuments.net/reader030/viewer/2022021006/62038d68da24ad121e4abe60/html5/thumbnails/33.jpg)
2 Parties Communication
I Carol finds sequences x0A and x0B of length nA and nB suchthat qA = s0(x0A) and qB = s0(x0B ).
I Carol then outputs �(s0(x0A � x0B )).
�(s0(x0A � x0B )) = �(s0(xA � x0B ))= �(s0(x0B � xA))
= �(s0(xB � xA))
= �(s0(xA � xB ))
= f (xA � xB )
= f (x):
![Page 34: Mapreduce With Parallelizable Reduce - Dimacs](https://reader030.vdocuments.net/reader030/viewer/2022021006/62038d68da24ad121e4abe60/html5/thumbnails/34.jpg)
Space Efficient 2 Party Communication
I Non-deterministic simulation:
I First, guess the symbols of x0A one at a time, simulating thestreaming algorithm s0(x0A) on the guess. If after nA
guessed symbols we have s0(x0A) 6= qA, reject this branch.Then, guess the symbols of x0B , simulating (in parallel)s0(x0B ) and sqA(x0B ). If after nB steps we have s0(x0B ) 6= qB ,reject this branch; otherwise, output qC = sqA(x0B ).
I This procedure is a non-deterministic, O(g(n))-spacealgorithm for computing a valid qC .
I By Savitch’s theorem, it follows that there is adeterministic, g2(n)-space algorithm.
I Simulation time is superpolynomial.
![Page 35: Mapreduce With Parallelizable Reduce - Dimacs](https://reader030.vdocuments.net/reader030/viewer/2022021006/62038d68da24ad121e4abe60/html5/thumbnails/35.jpg)
Space Efficient 2 Party Communication
I Non-deterministic simulation:I First, guess the symbols of x0A one at a time, simulating the
streaming algorithm s0(x0A) on the guess.
If after nA
guessed symbols we have s0(x0A) 6= qA, reject this branch.Then, guess the symbols of x0B , simulating (in parallel)s0(x0B ) and sqA(x0B ). If after nB steps we have s0(x0B ) 6= qB ,reject this branch; otherwise, output qC = sqA(x0B ).
I This procedure is a non-deterministic, O(g(n))-spacealgorithm for computing a valid qC .
I By Savitch’s theorem, it follows that there is adeterministic, g2(n)-space algorithm.
I Simulation time is superpolynomial.
![Page 36: Mapreduce With Parallelizable Reduce - Dimacs](https://reader030.vdocuments.net/reader030/viewer/2022021006/62038d68da24ad121e4abe60/html5/thumbnails/36.jpg)
Space Efficient 2 Party Communication
I Non-deterministic simulation:I First, guess the symbols of x0A one at a time, simulating the
streaming algorithm s0(x0A) on the guess. If after nA
guessed symbols we have s0(x0A) 6= qA, reject this branch.
Then, guess the symbols of x0B , simulating (in parallel)s0(x0B ) and sqA(x0B ). If after nB steps we have s0(x0B ) 6= qB ,reject this branch; otherwise, output qC = sqA(x0B ).
I This procedure is a non-deterministic, O(g(n))-spacealgorithm for computing a valid qC .
I By Savitch’s theorem, it follows that there is adeterministic, g2(n)-space algorithm.
I Simulation time is superpolynomial.
![Page 37: Mapreduce With Parallelizable Reduce - Dimacs](https://reader030.vdocuments.net/reader030/viewer/2022021006/62038d68da24ad121e4abe60/html5/thumbnails/37.jpg)
Space Efficient 2 Party Communication
I Non-deterministic simulation:I First, guess the symbols of x0A one at a time, simulating the
streaming algorithm s0(x0A) on the guess. If after nA
guessed symbols we have s0(x0A) 6= qA, reject this branch.Then, guess the symbols of x0B , simulating (in parallel)s0(x0B ) and sqA(x0B ).
If after nB steps we have s0(x0B ) 6= qB ,reject this branch; otherwise, output qC = sqA(x0B ).
I This procedure is a non-deterministic, O(g(n))-spacealgorithm for computing a valid qC .
I By Savitch’s theorem, it follows that there is adeterministic, g2(n)-space algorithm.
I Simulation time is superpolynomial.
![Page 38: Mapreduce With Parallelizable Reduce - Dimacs](https://reader030.vdocuments.net/reader030/viewer/2022021006/62038d68da24ad121e4abe60/html5/thumbnails/38.jpg)
Space Efficient 2 Party Communication
I Non-deterministic simulation:I First, guess the symbols of x0A one at a time, simulating the
streaming algorithm s0(x0A) on the guess. If after nA
guessed symbols we have s0(x0A) 6= qA, reject this branch.Then, guess the symbols of x0B , simulating (in parallel)s0(x0B ) and sqA(x0B ). If after nB steps we have s0(x0B ) 6= qB ,reject this branch; otherwise, output qC = sqA(x0B ).
I This procedure is a non-deterministic, O(g(n))-spacealgorithm for computing a valid qC .
I By Savitch’s theorem, it follows that there is adeterministic, g2(n)-space algorithm.
I Simulation time is superpolynomial.
![Page 39: Mapreduce With Parallelizable Reduce - Dimacs](https://reader030.vdocuments.net/reader030/viewer/2022021006/62038d68da24ad121e4abe60/html5/thumbnails/39.jpg)
Proof
I Finish the proof for arbitrary computation tree inductively.I Extends to streaming algorithms for approximating f that
work by computing some other function g exactly over thestream, for example, sketch-based algorithms that maintainci = hx;vi i where x is the input vector and some vi .Public randomness.
I Doesn’t extend to randomized algorithms with privaterandomness, partial functions, etc.
![Page 40: Mapreduce With Parallelizable Reduce - Dimacs](https://reader030.vdocuments.net/reader030/viewer/2022021006/62038d68da24ad121e4abe60/html5/thumbnails/40.jpg)
Multiple Keys
I Any N -processor, M -memory, T -time EREW-PRAMalgorithm which has a log(N +M )-bit word in everymemory location,can be simulated by a O(T )-round, (N +M )-key mudalgorithm with communication complexity O(log(N +M ))
bits per key.I In particular, any problem in class NC has a
polylog(n)-round, poly(n)-key mud algorithm withcommunication complexity O(log(n)) bits per key.
![Page 41: Mapreduce With Parallelizable Reduce - Dimacs](https://reader030.vdocuments.net/reader030/viewer/2022021006/62038d68da24ad121e4abe60/html5/thumbnails/41.jpg)
Concluding Remarks
I Jon Feldman, S. Muthukrishnan, Anastasios Sidiropoulos,Clifford Stein, Zoya Svitkina: On distributing symmetricstreaming computations. SODA 2008: 710-719