grade school again: a parallel perspective
DESCRIPTION
Grade School Again: A Parallel Perspective. Plus/Minus Binary (Extended Binary). Base 2: Each digit can be -1, 0, 1, Example: 1 -1 -1 = 4 -2 -1 = 1. One weight for each power of 3. Left = “negative”. Right = “positive”. How to add 2 n-bit numbers. * *. * *. * *. * *. * *. * *. * *. - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Grade School Again: A Parallel Perspective](https://reader036.vdocuments.net/reader036/viewer/2022070412/56814963550346895db6b78e/html5/thumbnails/1.jpg)
Grade School Again:A Parallel Perspective
Great Theoretical Ideas In Computer Science
Steven Rudich CS 15-251 Spring 2004
Lecture 17 Mar 16, 2004 Carnegie Mellon University
![Page 2: Grade School Again: A Parallel Perspective](https://reader036.vdocuments.net/reader036/viewer/2022070412/56814963550346895db6b78e/html5/thumbnails/2.jpg)
Plus/Minus Binary(Extended Binary)
Base 2: Each digit can be -1, 0, 1,
Example:
1 -1 -1 = 4 -2 -1 = 1
![Page 3: Grade School Again: A Parallel Perspective](https://reader036.vdocuments.net/reader036/viewer/2022070412/56814963550346895db6b78e/html5/thumbnails/3.jpg)
One weight for each power of 3. Left = “negative”. Right =
“positive”
![Page 4: Grade School Again: A Parallel Perspective](https://reader036.vdocuments.net/reader036/viewer/2022070412/56814963550346895db6b78e/html5/thumbnails/4.jpg)
How to add 2 n-bit numbers.
**
**
**
**
**
**
**
**
**
**
**
+
![Page 5: Grade School Again: A Parallel Perspective](https://reader036.vdocuments.net/reader036/viewer/2022070412/56814963550346895db6b78e/html5/thumbnails/5.jpg)
How to add 2 n-bit numbers.
**
*
**
**
**
**
**
* **
**
**
**
**
+
![Page 6: Grade School Again: A Parallel Perspective](https://reader036.vdocuments.net/reader036/viewer/2022070412/56814963550346895db6b78e/html5/thumbnails/6.jpg)
How to add 2 n-bit numbers.
**
*
**
**
**
**
* **
* **
*
**
**
**
**
+
![Page 7: Grade School Again: A Parallel Perspective](https://reader036.vdocuments.net/reader036/viewer/2022070412/56814963550346895db6b78e/html5/thumbnails/7.jpg)
How to add 2 n-bit numbers.
**
*
**
**
**
* **
* **
*
* **
*
**
**
**
**
+
![Page 8: Grade School Again: A Parallel Perspective](https://reader036.vdocuments.net/reader036/viewer/2022070412/56814963550346895db6b78e/html5/thumbnails/8.jpg)
How to add 2 n-bit numbers.
**
*
**
**
* **
* **
*
* **
*
* **
*
**
**
**
**
+
![Page 9: Grade School Again: A Parallel Perspective](https://reader036.vdocuments.net/reader036/viewer/2022070412/56814963550346895db6b78e/html5/thumbnails/9.jpg)
How to add 2 n-bit numbers.
**
*
* **
*
* **
*
* **
*
* **
*
* **
*
* **
*
* **
*
* **
*
* **
*
***
*
+*
*
![Page 10: Grade School Again: A Parallel Perspective](https://reader036.vdocuments.net/reader036/viewer/2022070412/56814963550346895db6b78e/html5/thumbnails/10.jpg)
How to add 2 n-bit numbers.
**
*
* **
*
* **
*
* **
*
* **
*
* **
*
* **
*
* **
*
* **
*
* **
*
***
*
+*
* Let k be the maximum time that it takes you to do
Time < kn is proportional to n
![Page 11: Grade School Again: A Parallel Perspective](https://reader036.vdocuments.net/reader036/viewer/2022070412/56814963550346895db6b78e/html5/thumbnails/11.jpg)
The time grow linearly with input size.
# of bits in numbers
time
kn
![Page 12: Grade School Again: A Parallel Perspective](https://reader036.vdocuments.net/reader036/viewer/2022070412/56814963550346895db6b78e/html5/thumbnails/12.jpg)
If n people agree to help you add two n bit numbers, it is not obvious that they can finish faster than if you had done it yourself.
![Page 13: Grade School Again: A Parallel Perspective](https://reader036.vdocuments.net/reader036/viewer/2022070412/56814963550346895db6b78e/html5/thumbnails/13.jpg)
Is it possible to add two n bit numbers in less than
linear parallel-time?
Darn those carries.
![Page 14: Grade School Again: A Parallel Perspective](https://reader036.vdocuments.net/reader036/viewer/2022070412/56814963550346895db6b78e/html5/thumbnails/14.jpg)
Fast parallel addition is no
obvious in usual binary. But it is
amazingly direct in Extended Binary!
![Page 15: Grade School Again: A Parallel Perspective](https://reader036.vdocuments.net/reader036/viewer/2022070412/56814963550346895db6b78e/html5/thumbnails/15.jpg)
Extended binary means base 2
allowing digits to be from {-1, 0, 1}. We can call each
digit a “trit”.
![Page 16: Grade School Again: A Parallel Perspective](https://reader036.vdocuments.net/reader036/viewer/2022070412/56814963550346895db6b78e/html5/thumbnails/16.jpg)
n people can add 2, n-trit, plus/minus binary numbers in constant time!
![Page 17: Grade School Again: A Parallel Perspective](https://reader036.vdocuments.net/reader036/viewer/2022070412/56814963550346895db6b78e/html5/thumbnails/17.jpg)
An Addition Party to Add 110-1 to -111-1
-
![Page 18: Grade School Again: A Parallel Perspective](https://reader036.vdocuments.net/reader036/viewer/2022070412/56814963550346895db6b78e/html5/thumbnails/18.jpg)
An Addition Party
1
-1
11
1
0
1
-1
-1
Invite n people to add two n-trit numbersAssign one person to each trit position
![Page 19: Grade School Again: A Parallel Perspective](https://reader036.vdocuments.net/reader036/viewer/2022070412/56814963550346895db6b78e/html5/thumbnails/19.jpg)
An Addition Party
0
0
12
-1
1
1
-2
Each person should add the two input trits in their possession.
Problem: 2 and -2 are not allowed in the final answer.
![Page 20: Grade School Again: A Parallel Perspective](https://reader036.vdocuments.net/reader036/viewer/2022070412/56814963550346895db6b78e/html5/thumbnails/20.jpg)
Pass Left
0
0
1
0
-1
-1
-21
If you have a 1 or a 2 subtract 2 from yourself and pass a 1 to the left. (Nobody keeps more than 0) Add in anything that is given to you from the right. (Nobody has more than a 1)
1
![Page 21: Grade School Again: A Parallel Perspective](https://reader036.vdocuments.net/reader036/viewer/2022070412/56814963550346895db6b78e/html5/thumbnails/21.jpg)
After passing left
1
0
11 -1
1
-2
There will never again be any 2sas everyone had at most 0and received at most 1 more
![Page 22: Grade School Again: A Parallel Perspective](https://reader036.vdocuments.net/reader036/viewer/2022070412/56814963550346895db6b78e/html5/thumbnails/22.jpg)
Passing left again
1
0
11
1
10
If you have a -1 or -2 add 2 to yourselfand pass a -1 to the left(Nobody keeps less than 0)
-1-1
![Page 23: Grade School Again: A Parallel Perspective](https://reader036.vdocuments.net/reader036/viewer/2022070412/56814963550346895db6b78e/html5/thumbnails/23.jpg)
After passing left again
1 10
-1
0
1
0
No -2s anymore either.Everyone kept at least 0 and receivedAt most -1.
![Page 24: Grade School Again: A Parallel Perspective](https://reader036.vdocuments.net/reader036/viewer/2022070412/56814963550346895db6b78e/html5/thumbnails/24.jpg)
1 10
-1
0
1
0
1
-1
11
1
0
1
-1
-1
=
![Page 25: Grade School Again: A Parallel Perspective](https://reader036.vdocuments.net/reader036/viewer/2022070412/56814963550346895db6b78e/html5/thumbnails/25.jpg)
Caution: Parties and Algorithms Do not Mix
1 11 0
1
1
![Page 26: Grade School Again: A Parallel Perspective](https://reader036.vdocuments.net/reader036/viewer/2022070412/56814963550346895db6b78e/html5/thumbnails/26.jpg)
Is there a fast parallel way to convert an Extended Binary number into a standard binary number?
![Page 27: Grade School Again: A Parallel Perspective](https://reader036.vdocuments.net/reader036/viewer/2022070412/56814963550346895db6b78e/html5/thumbnails/27.jpg)
Not obvious:
Sub-linear addition in standard Binary.
Sub-linear EB to Binary
![Page 28: Grade School Again: A Parallel Perspective](https://reader036.vdocuments.net/reader036/viewer/2022070412/56814963550346895db6b78e/html5/thumbnails/28.jpg)
Let’s reexamine grade school addition from the view of a computer circuit.
![Page 29: Grade School Again: A Parallel Perspective](https://reader036.vdocuments.net/reader036/viewer/2022070412/56814963550346895db6b78e/html5/thumbnails/29.jpg)
Grade School Addition
+
![Page 30: Grade School Again: A Parallel Perspective](https://reader036.vdocuments.net/reader036/viewer/2022070412/56814963550346895db6b78e/html5/thumbnails/30.jpg)
Grade School Addition
a4a3a2 a1 a0
b4b3b2 b1 b0
+
c5c4c3c2 c1
s1
![Page 31: Grade School Again: A Parallel Perspective](https://reader036.vdocuments.net/reader036/viewer/2022070412/56814963550346895db6b78e/html5/thumbnails/31.jpg)
Ripple-carry adder
ai bi
cici+1
si
aa44aa33aa22 a a1 1 aa00
bb44bb33bb22 b b1 1 bb00+
c5c4c3c2 c1
s1
0cn
ai bi
ci
si
a0 b0
a1 b1
s0
c1…
s1
…
an-1 bn-1
sn-1
![Page 32: Grade School Again: A Parallel Perspective](https://reader036.vdocuments.net/reader036/viewer/2022070412/56814963550346895db6b78e/html5/thumbnails/32.jpg)
Logical representation of binary: 0 = false, 1 = true
s1 = (a1 XOR b1) XOR c1
c2 = (a1 AND b1) OR (a1 AND c1) OR (b1 AND c1)
aa44aa33aa22 a a1 1 aa00
bb44bb33bb22 b b1 1 bb00+
c5c4c3c2 c1
s1
ai bi
cici+1
si
![Page 33: Grade School Again: A Parallel Perspective](https://reader036.vdocuments.net/reader036/viewer/2022070412/56814963550346895db6b78e/html5/thumbnails/33.jpg)
OR
AND
AND
AND
XOR
XOR
cibiai
OR
ci+1 si
ai bi
cici+1
si
![Page 34: Grade School Again: A Parallel Perspective](https://reader036.vdocuments.net/reader036/viewer/2022070412/56814963550346895db6b78e/html5/thumbnails/34.jpg)
0
How long to add two n bit numbers?
Propagation time through the circuit will be (n)
Ripple-carry adder
![Page 35: Grade School Again: A Parallel Perspective](https://reader036.vdocuments.net/reader036/viewer/2022070412/56814963550346895db6b78e/html5/thumbnails/35.jpg)
Circuits compute things in parallel. We can think of the propagation delay as PARALLEL TIME.
![Page 36: Grade School Again: A Parallel Perspective](https://reader036.vdocuments.net/reader036/viewer/2022070412/56814963550346895db6b78e/html5/thumbnails/36.jpg)
Is it possible to add two n bit numbers in less than
linear parallel-time?
![Page 37: Grade School Again: A Parallel Perspective](https://reader036.vdocuments.net/reader036/viewer/2022070412/56814963550346895db6b78e/html5/thumbnails/37.jpg)
I suppose the EB addition
algorithm could be helpful somehow.
![Page 38: Grade School Again: A Parallel Perspective](https://reader036.vdocuments.net/reader036/viewer/2022070412/56814963550346895db6b78e/html5/thumbnails/38.jpg)
Plus/minus binary means
base 2 allowing
digits to be from {-1, 0, 1}. We can
call each digit a “trit”.
![Page 39: Grade School Again: A Parallel Perspective](https://reader036.vdocuments.net/reader036/viewer/2022070412/56814963550346895db6b78e/html5/thumbnails/39.jpg)
n people can add 2, n-trit, plus/minus binary numbers in constant time!
![Page 40: Grade School Again: A Parallel Perspective](https://reader036.vdocuments.net/reader036/viewer/2022070412/56814963550346895db6b78e/html5/thumbnails/40.jpg)
1 11
1
0
1
-1
-1 -1
![Page 41: Grade School Again: A Parallel Perspective](https://reader036.vdocuments.net/reader036/viewer/2022070412/56814963550346895db6b78e/html5/thumbnails/41.jpg)
Can we still do addition quickly in the standard representation?
![Page 42: Grade School Again: A Parallel Perspective](https://reader036.vdocuments.net/reader036/viewer/2022070412/56814963550346895db6b78e/html5/thumbnails/42.jpg)
Yes, but first a neat idea…
Instead of adding two numbers to make one
number, let’s think about adding 3 numbers to make
2 numbers.
![Page 43: Grade School Again: A Parallel Perspective](https://reader036.vdocuments.net/reader036/viewer/2022070412/56814963550346895db6b78e/html5/thumbnails/43.jpg)
Carry-Save Addition
The sum of three numbers can be converted into the sum of 2 numbers in constant parallel time!
++
![Page 44: Grade School Again: A Parallel Perspective](https://reader036.vdocuments.net/reader036/viewer/2022070412/56814963550346895db6b78e/html5/thumbnails/44.jpg)
XOR
Carries
Carry-Save Addition
The sum of three numbers can be converted into the sum of 2 numbers in constant parallel time!
++
+
![Page 45: Grade School Again: A Parallel Perspective](https://reader036.vdocuments.net/reader036/viewer/2022070412/56814963550346895db6b78e/html5/thumbnails/45.jpg)
Cool!
So if we if represent x as a+b, and y as c+d, then can add x,y using two of these
(this is basically the same as that plus/minus binary thing).
(a+b+c)+d=(e+f)+d=g+h
![Page 46: Grade School Again: A Parallel Perspective](https://reader036.vdocuments.net/reader036/viewer/2022070412/56814963550346895db6b78e/html5/thumbnails/46.jpg)
Even in standard representation, this is
really useful for multiplication.
![Page 47: Grade School Again: A Parallel Perspective](https://reader036.vdocuments.net/reader036/viewer/2022070412/56814963550346895db6b78e/html5/thumbnails/47.jpg)
Grade School Multiplication
X101 1 0111 * * * * * * * *
* * * * * * * * * * * * * * *
* * * * * * * * *
* * * * * * * * * * * * * * *
* * * * * * * * * * * * * * * * * * * * * * *
* *
![Page 48: Grade School Again: A Parallel Perspective](https://reader036.vdocuments.net/reader036/viewer/2022070412/56814963550346895db6b78e/html5/thumbnails/48.jpg)
We need to add n 2n-bit numbers:
a1, a2, a3,…, an
Grade School MultiplicationGrade School Multiplication
X* * * * * * * * * * * * * * * *
* * * * * * * ** * * * * * * *
* * * * * * * ** * * * * * * *
* * * * * * * ** * * * * * * *
* * * * * * * ** * * * * * * *
* * * * * * * * * * * * * * * *
aa11aa22
aa33......aann
![Page 49: Grade School Again: A Parallel Perspective](https://reader036.vdocuments.net/reader036/viewer/2022070412/56814963550346895db6b78e/html5/thumbnails/49.jpg)
A tree of carry-save addersA tree of carry-save addersa1 an
+
Add the last two
+
+ + +
+
+
+
+++
+
+
![Page 50: Grade School Again: A Parallel Perspective](https://reader036.vdocuments.net/reader036/viewer/2022070412/56814963550346895db6b78e/html5/thumbnails/50.jpg)
A tree of carry-save addersA tree of carry-save adders
T(n) log3/2(n) + [last step]
+
Add the last two
+
+ + +
+
+
+
+++
+
+
![Page 51: Grade School Again: A Parallel Perspective](https://reader036.vdocuments.net/reader036/viewer/2022070412/56814963550346895db6b78e/html5/thumbnails/51.jpg)
So let’s go back to the problem of adding two numbers.
In particular, if we can add two numbers in O(log n) parallel time, then we can multiply in
O(log n) parallel time too!
![Page 52: Grade School Again: A Parallel Perspective](https://reader036.vdocuments.net/reader036/viewer/2022070412/56814963550346895db6b78e/html5/thumbnails/52.jpg)
If we knew the carries it would be very easy to do fast parallel addition
0
![Page 53: Grade School Again: A Parallel Perspective](https://reader036.vdocuments.net/reader036/viewer/2022070412/56814963550346895db6b78e/html5/thumbnails/53.jpg)
What do we know about the carry-out before we know the
carry-in?a b
cincout
s
a b Cout
0 0 0
0 1 Cin
1 0 Cin
1 1 1
![Page 54: Grade School Again: A Parallel Perspective](https://reader036.vdocuments.net/reader036/viewer/2022070412/56814963550346895db6b78e/html5/thumbnails/54.jpg)
What do we know about the carry-out before we know the
carry-in?a b
cincout
s
a b Cout
0 0 0
0 1 1 0 1 1 1
![Page 55: Grade School Again: A Parallel Perspective](https://reader036.vdocuments.net/reader036/viewer/2022070412/56814963550346895db6b78e/html5/thumbnails/55.jpg)
Hey, this is just a function of a and b. We can do this in
parallel.a b Cout
0 0 0
0 1 1 0 1 1 1
![Page 56: Grade School Again: A Parallel Perspective](https://reader036.vdocuments.net/reader036/viewer/2022070412/56814963550346895db6b78e/html5/thumbnails/56.jpg)
Idea #1: do this calculation first.
+
This takes just one step!
![Page 57: Grade School Again: A Parallel Perspective](https://reader036.vdocuments.net/reader036/viewer/2022070412/56814963550346895db6b78e/html5/thumbnails/57.jpg)
Idea #1: do this calculation first.
+
Also, once we have the carries, it only takes one step more: si = (ai XOR bi) XOR ci
![Page 58: Grade School Again: A Parallel Perspective](https://reader036.vdocuments.net/reader036/viewer/2022070412/56814963550346895db6b78e/html5/thumbnails/58.jpg)
So, everything boils down to: can we find a
fast parallel way to convert each position to its final 0/1 value?
Called the “parallel prefix problem”
![Page 59: Grade School Again: A Parallel Perspective](https://reader036.vdocuments.net/reader036/viewer/2022070412/56814963550346895db6b78e/html5/thumbnails/59.jpg)
So, we need to do this quickly….
![Page 60: Grade School Again: A Parallel Perspective](https://reader036.vdocuments.net/reader036/viewer/2022070412/56814963550346895db6b78e/html5/thumbnails/60.jpg)
0 1 0 0 0 0
1 1 1 1
0 1
( ( ( ( (
Can think of as all
partial results in:
for the operator :
Idea #2:
x = x1 x = 10 x = 0
![Page 61: Grade School Again: A Parallel Perspective](https://reader036.vdocuments.net/reader036/viewer/2022070412/56814963550346895db6b78e/html5/thumbnails/61.jpg)
( ( (
( ) ( )
And, the operator is associative.
Idea #2 (cont):
![Page 62: Grade School Again: A Parallel Perspective](https://reader036.vdocuments.net/reader036/viewer/2022070412/56814963550346895db6b78e/html5/thumbnails/62.jpg)
Just using the fact that we have an Associative, Binary Operator
Binary Operator: an operation that takes two objects and returns a third.• AB = C
Associative:• ( A B ) C = A ( B C )
![Page 63: Grade School Again: A Parallel Perspective](https://reader036.vdocuments.net/reader036/viewer/2022070412/56814963550346895db6b78e/html5/thumbnails/63.jpg)
Examples
• Addition on the integers• Min(a,b)• Max(a,b)• Left(a,b) = a• Right(a,b) = b• Boolean AND• Boolean OR
![Page 64: Grade School Again: A Parallel Perspective](https://reader036.vdocuments.net/reader036/viewer/2022070412/56814963550346895db6b78e/html5/thumbnails/64.jpg)
In what we are about to do “+”
will mean an arbitrary binary
associative operator.
![Page 65: Grade School Again: A Parallel Perspective](https://reader036.vdocuments.net/reader036/viewer/2022070412/56814963550346895db6b78e/html5/thumbnails/65.jpg)
Prefix Sum Problem
Input: Xn-1, Xn-2,…,X1, X0
Output: Yn-1, Yn-2,…,Y1, Y0
where
Y0 = X0
Y1 = X0 + X1
Y2 = X0 + X1 + X2
Y3 = X0 + X1 + X2 + X3
Yn-1 = X0 + X1 + X2 + X3 + … + Xn-1
...
![Page 66: Grade School Again: A Parallel Perspective](https://reader036.vdocuments.net/reader036/viewer/2022070412/56814963550346895db6b78e/html5/thumbnails/66.jpg)
Prefix Sum example
Input: 6, 9, 2, 3, 4, 7Output: 31, 25, 16, 14, 11, 7where
Y0 = X0
Y1 = X0 + X1
Y2 = X0 + X1 + X2
Y3 = X0 + X1 + X2 + X3
Yn-1 = X0 + X1 + X2 + X3 + … + Xn-1
...
![Page 67: Grade School Again: A Parallel Perspective](https://reader036.vdocuments.net/reader036/viewer/2022070412/56814963550346895db6b78e/html5/thumbnails/67.jpg)
Example circuitry(n = 4)
+
X1 X0
y1
X0X2
+
X1
+y2
X0
y0
X3
+
+ +
X2 X1 X0
y3
![Page 68: Grade School Again: A Parallel Perspective](https://reader036.vdocuments.net/reader036/viewer/2022070412/56814963550346895db6b78e/html5/thumbnails/68.jpg)
yn-1
+
Divide, conquer, and gluefor computing yn-1
Xn/2-1 … X1 X0
sum on n/2 items
Xn-1 … Xn/2Xn-2
sum on n/2 items
T(1)=0T(n) = T(n/2) + 1T(n) = log2 n
![Page 69: Grade School Again: A Parallel Perspective](https://reader036.vdocuments.net/reader036/viewer/2022070412/56814963550346895db6b78e/html5/thumbnails/69.jpg)
Modern computers do something
slightly different. This algorithm is
fast, but how many components
does it use?
![Page 70: Grade School Again: A Parallel Perspective](https://reader036.vdocuments.net/reader036/viewer/2022070412/56814963550346895db6b78e/html5/thumbnails/70.jpg)
yn-1
+
Size of Circuit(number of gates)
Xn/2-1 … X1 X0
Sum on n/2 items
Xn-1 … Xn/2Xn-2
Sum on n/2 items
S(1)=0S(n) = S(n/2) + S(n/2) +1S(n) = n-1
![Page 71: Grade School Again: A Parallel Perspective](https://reader036.vdocuments.net/reader036/viewer/2022070412/56814963550346895db6b78e/html5/thumbnails/71.jpg)
Sum of Sizes
X0
y0
+
X1 X0
y1
X0X2
+
X1
+y2
X3
+
+ +
X2 X1 X0
y3
S(n) = 0 + 1 + 2 + 3 + + (n-1) = n(n-1)/2
![Page 72: Grade School Again: A Parallel Perspective](https://reader036.vdocuments.net/reader036/viewer/2022070412/56814963550346895db6b78e/html5/thumbnails/72.jpg)
Recursive Algorithmn items (n = power of 2)
If n = 1, Y0 = X0;
+
X3 X2
+
X1 X0
+
X5 X4
+
Xn-3 Xn-4
+
Xn-1 Xn-2
…
![Page 73: Grade School Again: A Parallel Perspective](https://reader036.vdocuments.net/reader036/viewer/2022070412/56814963550346895db6b78e/html5/thumbnails/73.jpg)
Recursive Algorithmn items (n = power of 2)
If n = 1, Y0 = X0;
+
X3 X2
+
X1 X0
+
X5 X4
+
Xn-3 Xn-4
+
Xn-1 Xn-2
…
…Prefix sum on n/2 items
![Page 74: Grade School Again: A Parallel Perspective](https://reader036.vdocuments.net/reader036/viewer/2022070412/56814963550346895db6b78e/html5/thumbnails/74.jpg)
+
Recursive Algorithmn items (n = power of 2)
If n = 1, Y0 = X0;
+
X3 X2
+
X1 X0
+
X5 X4
+
Xn-3 Xn-4
+
Xn-1 Xn-2
…
+++
Prefix sum on n/2 items
Yn-1Y1 Y0Y2Yn-2 Y4 Y3…
…
![Page 75: Grade School Again: A Parallel Perspective](https://reader036.vdocuments.net/reader036/viewer/2022070412/56814963550346895db6b78e/html5/thumbnails/75.jpg)
Parallel time complexity
T(1)=0; T(2) = 1; T(n) = T(n/2) + 2T(n) = 2 log2(n) - 1
+
If n = 1, YIf n = 1, Y00 = X = X00;;
+
X3 X2
+X1 X0
+X5 X4
+Xn-3 Xn-4
+Xn-1 Xn-2
…
+++
Prefix sum on n/2 items
Yn-1Y1 Y0Y2Yn-2 Y4 Y3…
…1
T(n/2)
1
![Page 76: Grade School Again: A Parallel Perspective](https://reader036.vdocuments.net/reader036/viewer/2022070412/56814963550346895db6b78e/html5/thumbnails/76.jpg)
Size
S(1)=0; S(n) = S(n/2) + n - 1S(n) = 2n – log2n -2
+
If n = 1, YIf n = 1, Y00 = X = X00;;
+
X3 X2
+X1 X0
+X5 X4
+Xn-3 Xn-4
+Xn-1 Xn-2
…
+++
Prefix sum on n/2 items
Yn-1Y1 Y0Y2Yn-2 Y4 Y3…
…n/2
S(n/2)
(n/2)-1
![Page 77: Grade School Again: A Parallel Perspective](https://reader036.vdocuments.net/reader036/viewer/2022070412/56814963550346895db6b78e/html5/thumbnails/77.jpg)
Putting it all together: Carry Look-Ahead Addition
To add two n-bit numbers: a and b
• 1 step to compute x values ( )• 2 log2n - 1 steps to compute carries c
• 1 step to compute c XOR (a XOR b)
2 log2n + 1 steps total
![Page 78: Grade School Again: A Parallel Perspective](https://reader036.vdocuments.net/reader036/viewer/2022070412/56814963550346895db6b78e/html5/thumbnails/78.jpg)
T(n) log3/2(n) + 2log22n + 1
+
carry look ahead
+
+ + +
+
+
+
+++
+
+
Putting it all together: multiplication
![Page 79: Grade School Again: A Parallel Perspective](https://reader036.vdocuments.net/reader036/viewer/2022070412/56814963550346895db6b78e/html5/thumbnails/79.jpg)
For a 64-bit word that works out to a parallel time of 22 for multiplication,
and 13 for addition.
![Page 80: Grade School Again: A Parallel Perspective](https://reader036.vdocuments.net/reader036/viewer/2022070412/56814963550346895db6b78e/html5/thumbnails/80.jpg)
Addition requires
linear time
sequentially, but
has a practical
2log2n + 1 parallel
algorithm.
![Page 81: Grade School Again: A Parallel Perspective](https://reader036.vdocuments.net/reader036/viewer/2022070412/56814963550346895db6b78e/html5/thumbnails/81.jpg)
Multiplication
(which is a lot
harder
sequentially) also
has an O(log n)
time parallel
algorithm.
![Page 82: Grade School Again: A Parallel Perspective](https://reader036.vdocuments.net/reader036/viewer/2022070412/56814963550346895db6b78e/html5/thumbnails/82.jpg)
And this is how addition works on commercial
chips…..Processor n 2log2n +1
80186 16 9
Pentium 32 11
Alpha 64 13
![Page 83: Grade School Again: A Parallel Perspective](https://reader036.vdocuments.net/reader036/viewer/2022070412/56814963550346895db6b78e/html5/thumbnails/83.jpg)
In order to handle integer addition/subtraction we use 2’s compliment representation, e.g.,
-44=-64
32
16
8 4 2 1
1 0 1 0 1 0 0
![Page 84: Grade School Again: A Parallel Perspective](https://reader036.vdocuments.net/reader036/viewer/2022070412/56814963550346895db6b78e/html5/thumbnails/84.jpg)
Addition of two numbers works the same way (assuming no overflow).
-64
32
16
8 4 2 1
1 0 1 0 1 0 0
![Page 85: Grade School Again: A Parallel Perspective](https://reader036.vdocuments.net/reader036/viewer/2022070412/56814963550346895db6b78e/html5/thumbnails/85.jpg)
To negate a number, flip each of its bits and add 1.
-64
32
16
8 4 2 1
1 0 1 0 1 0 0-64
32
16
8 4 2 1
0 1 0
1 0 1 1-64
32
16
8 4 2 1
0 1 0
1 1 0 0
![Page 86: Grade School Again: A Parallel Perspective](https://reader036.vdocuments.net/reader036/viewer/2022070412/56814963550346895db6b78e/html5/thumbnails/86.jpg)
To negate a number, flip each of its bits and add 1.
x + flip(x) = -1.
So, -x = flip(x)+1.
-64
32
16
8 4 2 1
1 1 1 1 1 1 1
![Page 87: Grade School Again: A Parallel Perspective](https://reader036.vdocuments.net/reader036/viewer/2022070412/56814963550346895db6b78e/html5/thumbnails/87.jpg)
Most computers use two’s compliment representation to perform integer addition and subtraction.
![Page 88: Grade School Again: A Parallel Perspective](https://reader036.vdocuments.net/reader036/viewer/2022070412/56814963550346895db6b78e/html5/thumbnails/88.jpg)
Grade School Division
* * * * * * * *
* * * * * * * * * * * * * *
* * * * * * * * * * * * * * * * * * * * * * * *
* * * * * * * * * * * * * * * *
* * * * * * * * * * * * * * * *
*
* * * * * * * * * *
n bits of precision:n subtractions costing 2log2n + 1 each
(n logn)
![Page 89: Grade School Again: A Parallel Perspective](https://reader036.vdocuments.net/reader036/viewer/2022070412/56814963550346895db6b78e/html5/thumbnails/89.jpg)
Let’s see if we can reduce to O(n) by being clever about
it.
![Page 90: Grade School Again: A Parallel Perspective](https://reader036.vdocuments.net/reader036/viewer/2022070412/56814963550346895db6b78e/html5/thumbnails/90.jpg)
Idea: internally, allow ourselves to “go negative” using trits so
we can do constant-time subtraction.
Then convert back at the end.
(technically, called “extended binary”)
![Page 91: Grade School Again: A Parallel Perspective](https://reader036.vdocuments.net/reader036/viewer/2022070412/56814963550346895db6b78e/html5/thumbnails/91.jpg)
SRT division algorithm
1 0 1 1 1 1 1 0 1 1 0 1 23711
21 r 6
Rule: Each bit of quotientis determined by comparingfirst bit of divisor with firstbit of dividend. Easy!
-1 0 –1 -1
1 0 –1 1
1
= -1 0 0 1
-1 0 –1 -1
-2 0
1
1 0 1 1
1 2
-1
= 1 0 0 0
-1 0 –1 -1
0 –1 –1 1
1 0 r –1 –1 1
23711
22 r -5
Time for n bits of precision in result:
3n + 2log2(n) + 1
Convert to standard representation bysubtracting negativebits from positive.
1 additionper bit
![Page 92: Grade School Again: A Parallel Perspective](https://reader036.vdocuments.net/reader036/viewer/2022070412/56814963550346895db6b78e/html5/thumbnails/92.jpg)
Intel Pentium division error•The Pentium uses essentially the same algorithm, but computes more than one bit of the result in each step. Several leading bits of the divisor and quotient are examined at each step, and the difference is looked up in a table.
•The table had several bad entries.
•Ultimately Intel offered to replace any defective chip, estimating their loss at $475 million.
![Page 93: Grade School Again: A Parallel Perspective](https://reader036.vdocuments.net/reader036/viewer/2022070412/56814963550346895db6b78e/html5/thumbnails/93.jpg)
If millions of processors,
how much of a speed-up might I get
over a single processor?
![Page 94: Grade School Again: A Parallel Perspective](https://reader036.vdocuments.net/reader036/viewer/2022070412/56814963550346895db6b78e/html5/thumbnails/94.jpg)
Brent’s Law
At best, p processors will give you a factor of p speedup over the time
it takes on a single processor.
![Page 95: Grade School Again: A Parallel Perspective](https://reader036.vdocuments.net/reader036/viewer/2022070412/56814963550346895db6b78e/html5/thumbnails/95.jpg)
The traditional GCD algorithm will take linear time to operate
on two n bit numbers. Can it be done faster
in parallel?
![Page 96: Grade School Again: A Parallel Perspective](https://reader036.vdocuments.net/reader036/viewer/2022070412/56814963550346895db6b78e/html5/thumbnails/96.jpg)
If n2 people agree to help you compute the GCD of two n bit numbers, it is not obvious that they can finish faster than if you had done it yourself.
![Page 97: Grade School Again: A Parallel Perspective](https://reader036.vdocuments.net/reader036/viewer/2022070412/56814963550346895db6b78e/html5/thumbnails/97.jpg)
Is GCD
inherently
sequential?
![Page 98: Grade School Again: A Parallel Perspective](https://reader036.vdocuments.net/reader036/viewer/2022070412/56814963550346895db6b78e/html5/thumbnails/98.jpg)
No one knows.