algorithm analysis - max alekseyev

Upload: userscrybd

Post on 04-Jun-2018

224 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/13/2019 Algorithm Analysis - Max Alekseyev

    1/27

    1/19

    CSCE750: Analysis of Algorithms

    Lecture 6

    Max Alekseyev

    University of South Carolina

    September 10, 2013

    http://find/
  • 8/13/2019 Algorithm Analysis - Max Alekseyev

    2/27

    2/19

    Outline

    Divide-and-ConquerFast Integer MultiplicationFast Matrix Multiplication

    http://find/http://goback/
  • 8/13/2019 Algorithm Analysis - Max Alekseyev

    3/27

    3/19

    Fast Integer Multiplication

    Let b, c 0 be integers, represented in binary, with n bits each.

    Let us assume that n is large, so that band ccannot be added,subtracted, or multiplied in constant time.

    We imagine that the band care both represented as arrays ofnbits: b=bn

    1 b

    0and c=cn

    1 c

    0, where the bi and ci are

    individual bits (leading 0s are allowed). Thus,

    b = b0 20 +b1 2

    1 + +bn1 2n1 =

    n1

    i=0bi2

    i,

    c = c0 20 +c1 2

    1 + +cn1 2n1 =

    n

    1i=0

    ci2i.

    http://goforward/http://find/http://goback/
  • 8/13/2019 Algorithm Analysis - Max Alekseyev

    4/274/19

    Addition

    The usual sequential binary add-with-carry algorithm that we alllearned in school takes time (n), since we spend a constant

    amount of time at each column, from right to left. The sum isrepresentable by n+ 1 bits (at most).

    Q: Can we do better?

    http://find/
  • 8/13/2019 Algorithm Analysis - Max Alekseyev

    5/274/19

    Addition

    The usual sequential binary add-with-carry algorithm that we alllearned in school takes time (n), since we spend a constantamount of time at each column, from right to left. The sum isrepresentable by n+ 1 bits (at most).

    This algorithm is clearly asymptotically optimal, since the producethe correct sum we must at least examine each bit ofband ofc.

    http://find/
  • 8/13/2019 Algorithm Analysis - Max Alekseyev

    6/275/19

    Subtraction

    Similar to addition, the usual subtract-and-borrow algorithm takes

    (n) time, which is clearly asymptotically optimal. The result canbe represented by at most n bits.

    http://find/
  • 8/13/2019 Algorithm Analysis - Max Alekseyev

    7/276/19

    Multiplication

    If we multiply band cusing the naive grade school algorithm, thenit takes quadratic (i.e., (n2)) time. Essentially, this algorithm istantamount to expanding the product bcaccording to theexpressions above:

    bc=

    n1i=0

    bi2i

    n1

    j=0

    cj2j

    =

    i,j

    bicj2i+j,

    then adding everything up term by term. There are n2 terms.

    Q: Can we do better?

    http://find/
  • 8/13/2019 Algorithm Analysis - Max Alekseyev

    8/277/19

    Multiplying with Divide-and-Conquer

    Ifn= 1, then the multiplication is trivial, so assume that n>1.Let us further assume for simplicity n is even. In fact, we can

    assume that n is a power of 2. If it is not, pad each number withleading 0s to the next power of 2; at worst this just doubles theinput size.

    http://find/http://goback/
  • 8/13/2019 Algorithm Analysis - Max Alekseyev

    9/277/19

    Multiplying with Divide-and-Conquer

    Let m= n/2. Split band cup into their m least and m mostsignificant bits. Let b and bh be the numbers given by the low mbits and the high m bits ofb, respectively. Similarly, let c and ch

    be the low and high halves ofc. Thus, 0 b, bh, c, ch

  • 8/13/2019 Algorithm Analysis - Max Alekseyev

    10/277/19

    Multiplying with Divide-and-Conquer

    We then have

    bc= (b+bh2m)(c+ch2m) =bc+ (bch+bhc)2m +bhch2n.

    This suggests that we can compute bcwith four recursivemultiplications of pairs ofm-bit numbers bc, bch, bhc, andbhch as well as (n) time spent doing other things, namely, some

    additions and multiplications by powers of two (the latter amountsto arithmetic shift of the bits, which can be done in linear time.)

    The time for this divide-and-conquer multiplication algorithm thussatisfies the recurrence

    T(n) = 4T(m) + (n) = 4T(n/2) + (n).

    The Master Theorem (Case 1) then gives T(n) = (n2), which isasymptotically no better than the naive algorithm.

    http://goforward/http://find/http://goback/
  • 8/13/2019 Algorithm Analysis - Max Alekseyev

    11/278/19

    Better Approach

    Another way to compute

    bc= (b+bh2m

    )(c+ch2m

    ) =bc+ (bch+bhc)2m

    +bhch2n

    .

    Split band cup into their low and high halves as above, but then

    recursively compute only three products:

    x = bc,

    y = bhch,

    z = (b+bh)(c+ch).

    Now you should verify for yourself that

    bc=x+ (z y x)2m +y2n,

    which the algorithm then computes.

    http://find/
  • 8/13/2019 Algorithm Analysis - Max Alekseyev

    12/27

    9/19

    Running Time Analysis

    How much time does this take?Besides the recursive calls, there is a linear times worth ofoverhead: additions, subtractions, and arithmetic shifts. There arethree recursive callscomputing x, y, and z. The numbers x and

    yare products of two m-bit integers each, and z is the product of(at most) two (m+ 1)-bit integers. Thus the running time satisfies

    T(n) = 2 T(n/2) +T(n/2+ 1) + (n).

    http://find/
  • 8/13/2019 Algorithm Analysis - Max Alekseyev

    13/27

    9/19

    Running Time Analysis

    It can be shown that the +1 doesnt affect the result, so therecurrence is effectively

    T(n) = 3 T(n/2) + (n),

    which yields T(n) = (nlg 3) by the Master Theorem. Sincelg 3 1.585

  • 8/13/2019 Algorithm Analysis - Max Alekseyev

    14/27

    10/19

    A Bit of History

    This approach dates back at least to Gauss, who discovered (usingthe same trick) that multiplying two complex numbers togethercould be done with only three real multiplications instead of the

    more naive four.

    The same idea has been applied to long integer multiplication byKaratsuba and to matrix multiplication by Strassen.

    http://goforward/http://find/http://goback/
  • 8/13/2019 Algorithm Analysis - Max Alekseyev

    15/27

    11/19

    Matrix Multiplication

    Given two n n matrices A= (aij)ni,j=1 and B= (bij)

    ni,j=1, their

    product is defined as follows:

    A B= (cij)ni,j=1, where cij=

    nk=1

    aik bkj.

    Therefore, to compute the matrix product, we need to compute n2

    matrix entries. A naive approach takes n multiplications and n 1

    additions for each entry.

    http://goforward/http://find/http://goback/
  • 8/13/2019 Algorithm Analysis - Max Alekseyev

    16/27

    12/19

    Naive Matrix Multiplication Pseudocode

    Matrix-Multiply

    (A,

    B)

    1. n=A.rows

    2. let Cbe a new n n matrix

    3. for i= 1 to n

    4. for j= 1 to n5. cij= 0

    6. fork= 1 to n

    7. cij=cij+aik bkj

    8. return C

    http://find/http://goback/
  • 8/13/2019 Algorithm Analysis - Max Alekseyev

    17/27

    12/19

    Naive Matrix Multiplication Pseudocode

    Matrix-Multiply

    (A,

    B)

    1. n=A.rows

    2. let Cbe a new n n matrix

    3. for i= 1 to n

    4. for j= 1 to n5. cij= 0

    6. fork= 1 to n

    7. cij=cij+aik bkj

    8. return C

    Running time is (n3).

    C ?

    http://find/http://goback/
  • 8/13/2019 Algorithm Analysis - Max Alekseyev

    18/27

    13/19

    Can we do better?

    Is (n3) the best we can do? Can we multiply matrices in o(n3)

    time?It seems like any algorithm to multiply matrices must take (n3)time:

    Must compute n2 entries.

    Each entry is the sum ofn terms.

    But with Strassens method, we can multiply matrices in o(n3)time:

    Strassens algorithm runs in (nlg 7

    ) time. 2.80

  • 8/13/2019 Algorithm Analysis - Max Alekseyev

    19/27

    14/19

    Divide-and-Conquer Multiplication Algorithm

    For simplisity assume that n is a power of 2. To compute the

    product of matrices, we subdivide each of the matrices into fourn/2 n/2submatrices so that the equation C=A B takes form:C11 C12C21 C22

    =

    A11 A12A21 A22

    B11 B12B21 B22

    .

    This matrix equation corresponds to the following four equationson the submatrices:

    C11=A11 B11+A12 B21

    C12=A11 B12+A12 B22C21=A21 B11+A22 B21

    C22=A21 B12+A22 B22

    Di id d C M l i li i P d d

    http://find/http://goback/
  • 8/13/2019 Algorithm Analysis - Max Alekseyev

    20/27

    15/19

    Divide-and-Conquer Multiplication Pseudocode

    Matrix-Multiply-Recursive(A, B)

    1. n=A.rows

    2. let Cbe a new n n matrix

    3. ifn== 1

    4. c11=a11 b11

    5. else partition each ofA, B, C into four submatrices

    6. C11= Matrix-Multiply-Recursive(A11, B11)+ Matrix-Multiply-Recursive(A12, B21)

    7. C12= Matrix-Multiply-Recursive(A11, B12)+ Matrix-Multiply-Recursive(A12, B22)

    8. C21= Matrix-Multiply-Recursive(A21, B11)+ Matrix-Multiply-Recursive(A22, B21)

    9. C22= Matrix-Multiply-Recursive(A21, B12)+ Matrix-Multiply-Recursive(A22, B22)

    10. return C

    Di id d C M lti li ti R i Ti

    http://find/
  • 8/13/2019 Algorithm Analysis - Max Alekseyev

    21/27

    16/19

    Divide-and-Conquer Multiplication Running Time

    Using index calculation, we can execute Step 5 in (1) time (incontrast to (n2) that would be required if we created submatricesand copied their entries). However, that does not make adifference asymptotically.

    Di id d C M lti li ti R i Ti

    http://find/
  • 8/13/2019 Algorithm Analysis - Max Alekseyev

    22/27

    16/19

    Divide-and-Conquer Multiplication Running Time

    Using index calculation, we can execute Step 5 in (1) time (incontrast to (n2) that would be required if we created submatricesand copied their entries). However, that does not make adifference asymptotically.

    The running time T(n) for Matrix-Multiply-Recursive onn n matrices satisfy the recurrence:

    T(n) = (1) + 8 T(n/2) + (n2) = 8 T(n/2) + (n2)

    with T(1) = (1).By Master Theorem, T(n) = (n3) which is unfortunately notfaster than the naive method Matrix-Multiply.

    Di id d C M lti li ti D b k

    http://find/
  • 8/13/2019 Algorithm Analysis - Max Alekseyev

    23/27

    17/19

    Divide-and-Conquer Multiplication Drawback

    Each time we split matrix sizes in half, but do not actually reducethe total amount of time.

    If we assume that naive matrix multiplication takes c n3 time.Then computing each product of submatrices takesc (n/2)3 =c n

    3

    8 and we need eight such products, resulting in

    total time of 8 c n3

    8 =c n3 (plus overhead) that is no better

    than simply doing multiplication in the naive way.

    Divide and Conquer Multiplication Drawback

    http://goforward/http://find/http://goback/
  • 8/13/2019 Algorithm Analysis - Max Alekseyev

    24/27

    17/19

    Divide-and-Conquer Multiplication Drawback

    Each time we split matrix sizes in half, but do not actually reducethe total amount of time.

    If we assume that naive matrix multiplication takes c n3 time.Then computing each product of submatrices takesc (n/2)3 =c n

    3

    8 and we need eight such products, resulting in

    total time of 8 c n3

    8 =c n3 (plus overhead) that is no better

    than simply doing multiplication in the naive way.

    In contrast, let us consider Merge-Sort with the running timerecurrence T(n) = 2 T(n/2) + (n). Even if we did naivequadratic (that is, of time c n2) sorting for each of the

    subproblems, the total time would be 2 c (n/2)2 =c n22 (plusoverhead of (n)) that is faster than naive sorting of the wholeproblem by factor of 2. This tells us that Divide-and-Conquersorting may be more efficient than naive sorting (and it is indeedsuch as Master Theorem proves).

    Strassens Method

    http://find/
  • 8/13/2019 Algorithm Analysis - Max Alekseyev

    25/27

    18/19

    Strassen s Method

    The idea behind Strassens method is to reduce the number of

    multiplications at each recursive call from eight to seven. Thatmakes the recursion tree slightly less bushy.

    Strassens Method

    http://find/
  • 8/13/2019 Algorithm Analysis - Max Alekseyev

    26/27

    18/19

    Strassen s MethodStrassens method has four steps:

    1. Divide the input matrices A and B into submatrices as before,

    using index calculations in (1) time.2. Create ten n/2 n/2 matrices S1, S2, . . . , S10, each equal the

    sum or difference of two submatrices created in Step 1. Thisstep takes (n2) time.

    3. Using the submatrices created in Steps 1 and 2, compute

    seven products P1, P2, . . . , P7, each of size n/2 n/2.

    4. Compute the submatrices ofCby adding or subtractingvarious combinations of the matrices Pi. This step takes(n2).

    The running time for Strassens Method satisfies the recurrence:

    T(n) =

    (1) if n= 1,

    7 T(n/2) + (n2) ifn>1.

    By the Master theorem, T(n) = (nlg 7).

    Notes

    http://find/http://goback/
  • 8/13/2019 Algorithm Analysis - Max Alekseyev

    27/27

    19/19

    Notes

    Strassens algorithm was the first to beat (n3) time, but it is notthe asymptotically fastest known. A method by Coppersmith and

    Winograd runs in O(n2.376

    ) time.

    Practical issues against Strassens algorithm:

    Higher constant factor than the obvious (n3)-time method.

    Not good for sparse matrices.

    Not numerically stable: larger errors accumulate than in thenaive method.

    Submatrices consume space, especially if copying.

    Various researchers have tried to find the crossover point, whereStrassens algorthm runs faster than the naive (n3)-time method.Theoretical analyses (that ignore caches and hardware pipelines)have produced crossover points as low as n= 8, and practical

    experiments have found crossover points as low as n= 400.

    http://goforward/http://find/http://goback/