chap12 slides

Upload: abdulbarimalik

Post on 06-Apr-2018

242 views

Category:

Documents


1 download

TRANSCRIPT

  • 8/3/2019 Chap12 Slides

    1/42

    Dynamic Programming

    Ananth Grama, Anshul Gupta, GeorgeKarypis, and Vipin Kumar

    To accompany the text ``Introduction to Parallel Computing'', Addison Wesley, 2003

  • 8/3/2019 Chap12 Slides

    2/42

    TopicOverview

    Overview of Serial Dynamic Programming

    Serial Monadic DP Formulations

    Nonserial Monadic DP Formulations

    Serial Polyadic DP Formulations

    Nonserial Polyadic DP Formulations

  • 8/3/2019 Chap12 Slides

    3/42

    Overview of Serial Dynamic Programming

    Dynamic programming(DP) is used to solve a widevariety of discrete optimization problems such asscheduling, string-editing, packaging, and inventorymanagement.

    Break problems into subproblems and combine theirsolutions into solutions to larger problems.

    In contrast to divide-and-conquer, there may berelationships across subproblems.

  • 8/3/2019 Chap12 Slides

    4/42

    Dynamic Programming: Example

    Consider the problem of finding a shortest path betweena pair of vertices in an acyclic graph.

    An edge connecting node ito nodejhas cost c(i,j).

    The graph contains nnodes numbered 0,1,, n-1, and

    has an edge from node i to nodejonly if i < j. Node 0 issource and node n-1 is the destination.

    Let f(x) be the cost of the shortest path from node 0 tonode x.

  • 8/3/2019 Chap12 Slides

    5/42

    Dynamic Programming: Example

    A graph for which the shortest path between nodes 0

    and 4 is to be computed.

  • 8/3/2019 Chap12 Slides

    6/42

    Dynamic Programming

    The solution to a DP problem is typically expressed as aminimum (or maximum) of possible alternate solutions.

    If rrepresents the cost of a solution composed ofsubproblems x1, x2,, xl, then rcan be written as

    Here, gis the composition function.

    If the optimal solution to each problem is determined by

    composing optimal solutions to the subproblems andselecting the minimum (or maximum), the formulation issaid to be a DP formulation.

  • 8/3/2019 Chap12 Slides

    7/42

    Dynamic Programming: Example

    The computation and composition of subproblem solutionsto solve problem f(x8).

  • 8/3/2019 Chap12 Slides

    8/42

    Dynamic Programming

    The recursive DP equation is also called the functionalequationor optimization equation.

    In the equation for the shortest path problem thecomposition function is f(j) + c(j,x). This contains a single

    recursive term (f(j)). Such a formulation is calledmonadic.

    If the RHS has multiple recursive terms, the DPformulation is called polyadic.

  • 8/3/2019 Chap12 Slides

    9/42

    Dynamic Programming

    The dependencies between subproblems can beexpressed as a graph.

    If the graph can be levelized (i.e., solutions to problemsat a level depend only on solutions to problems at the

    previous level), the formulation is called serial, else it iscalled non-serial.

    Based on these two criteria, we can classify DPformulations into four categories - serial-monadic, serial-polyadic, non-serial-monadic, non-serial-polyadic.

    This classification is useful since it identifies concurrencyand dependencies that guide parallel formulations.

  • 8/3/2019 Chap12 Slides

    10/42

    Serial Monadic DP Formulations

    It is difficult to derive canonical parallel formulations forthe entire class of formulations.

    For this reason, we select two representative examples,the shortest-path problem for a multistage graph and the0/1 knapsack problem.

    We derive parallel formulations for these problems andidentify common principles guiding design within theclass.

  • 8/3/2019 Chap12 Slides

    11/42

    Shortest-Path Problem

    Special class of shortest path problem where the graphis a weighted multistage graph of r + 1 levels.

    Each level is assumed to have nlevels and every nodeat level iis connected to every node at level i + 1.

    Levels zero and rcontain only one node, the source anddestination nodes, respectively.

    The objective of this problem is to find the shortest pathfrom Sto R.

  • 8/3/2019 Chap12 Slides

    12/42

    Shortest-Path Problem

    An example of a serial monadic DP formulation for findingthe shortest path in a graph whose nodes can be

    organized into levels.

  • 8/3/2019 Chap12 Slides

    13/42

    Shortest-Path Problem

    The ith node at level lin the graph is labeled viland the

    cost of an edge connecting vilto node vj

    l+1 is labeled cil,j.

    The cost of reaching the goal node Rfrom any node vil is

    represented by Cil.

    If there are nnodes at level l, the vector[C0

    l, C1l,,

    Cnl-1]

    Tis referred to as Cl. Note that

    C0= [C00].

    We have Cil= min {(ci

    l,j+ Cj

    l+1) | jis a node at level l + 1}

  • 8/3/2019 Chap12 Slides

    14/42

    Shortest-Path Problem

    Since all nodes vjr-1 have only one edge connecting them

    to the goal node Rat level r, the cost Cjr-1 is equal to cj

    r,-

    R1.

    We have:

    Notice that this problem is serial and monadic.

  • 8/3/2019 Chap12 Slides

    15/42

    Shortest-Path Problem

    The cost of reaching the goal node Rfrom any node atlevel lis (0 < l < r 1) is

  • 8/3/2019 Chap12 Slides

    16/42

    Shortest-Path Problem

    We can express the solution to the problem as amodified sequence of matrix-vector products.

    Replacing the addition operation by minimization and themultiplication operation by addition, the preceding set ofequations becomes:

    where Cl

    and Cl+1

    are n x 1 vectors representing the costof reaching the goal node from each node at levels landl + 1.

  • 8/3/2019 Chap12 Slides

    17/42

    Shortest-Path Problem

    Matrix Ml,l+1 is an n x nmatrix in which entry (i, j) storesthe cost of the edge connecting node iat level lto nodejat level l + 1.

    The shortest path problem has been formulated as asequence of rmatrix-vector products.

  • 8/3/2019 Chap12 Slides

    18/42

    Parallel Shortest-Path

    We can parallelize this algorithm using the parallelalgorithms for the matrix-vector product.

    (n) processing elements can compute each vector Cl intime (n) and solve the entire problem in time (rn).

    In many instances of this problem, the matrix Mmay besparse. For such problems, it is highly desirable to usesparse matrix techniques.

  • 8/3/2019 Chap12 Slides

    19/42

    0/1 Knapsack Problem

    We are given a knapsack of capacity cand a set of nobjectsnumbered 1,2,,n. Each object ihas weight wiand profit pi.

    Let v = [v1, v2,, vn]be a solution vector in which vi= 0if object iisnot in the knapsack, and vi= 1 if it is in the knapsack.

    The goal is to find a subset of objects to put into the knapsack sothat

    (that is, the objects fit into the knapsack) and

    is maximized (that is, the profit is maximized).

  • 8/3/2019 Chap12 Slides

    20/42

    0/1 Knapsack Problem

    The naive method is to consider all 2npossible subsetsof the nobjects and choose the one that fits into theknapsack and maximizes the profit.

    Let F[i,x]be the maximum profit for a knapsack ofcapacity xusing only objects {1,2,,i}. The DPformulation is:

  • 8/3/2019 Chap12 Slides

    21/42

    0/1 Knapsack Problem

    Construct a table Fof size n x cin row-major order.

    Filling an entry in a row requires two entries from theprevious row: one from the same column and one fromthe column offset by the weight of the objectcorresponding to the row.

    Computing each entry takes constant time; thesequential run time of this algorithm is (nc).

    The formulation is serial-monadic.

  • 8/3/2019 Chap12 Slides

    22/42

    0/1 Knapsack Problem

    Computing entries of table Ffor the 0/1 knapsack problem. The computation ofentry F[i,j]requires communication with processing elements containing

    entries F[i-1,j]and F[i-1,j-wi].

  • 8/3/2019 Chap12 Slides

    23/42

    0/1 Knapsack Problem

    Using cprocessors in a PRAM, we can derive a simpleparallel algorithm that runs in O(n) time by partitioningthe columns across processors.

    In a distributed memory machine, in thejth iteration, forcomputing F[j,r]at processing element Pr-1, F[j-1,r]isavailable locally but F[j-1,r-wj]must be fetched.

    The communication operation is a circular shift and thetime is given by (ts+ tw) log c. The total time is thereforetc+ (ts+ tw) log c.

    Across all niterations (rows), the parallel time is O(n logc). Note that this is not cost optimal.

  • 8/3/2019 Chap12 Slides

    24/42

    0/1 Knapsack Problem

    Using p-processing elements, each processing elementcomputes c/pelements of the table in each iteration.

    The corresponding shift operation takes time (2ts+ twc/p),since the data block may be partitioned across twoprocessors, but the total volume of data is c/p.

    The corresponding parallel time is n(tcc/p + 2ts+ twc/p),or O(nc/p) (which is cost-optimal).

    Note that there is an upper bound on the efficiency ofthis formulation.

  • 8/3/2019 Chap12 Slides

    25/42

    Nonserial Monadic DP Formulations: Longest-Common-Subsequence

    Given a sequence A = , a subsequence of

    A can be formed by deleting some entries from A.

    Given two sequences A = and B = , find the longest sequence that is a

    subsequence of both A and B.

    If A = and B = , the longestcommon subsequence of A and Bis .

  • 8/3/2019 Chap12 Slides

    26/42

    Longest-Common-Subsequence Problem

    Let F[i,j]denote the length of the longest commonsubsequence of the first ielements of A and the firstjelements of B. The objective of the LCS problem is tofind F[n,m].

    We can write:

  • 8/3/2019 Chap12 Slides

    27/42

    Longest-Common-Subsequence Problem

    The algorithm computes the two-dimensional Ftable in arow- or column-major fashion. The complexity is (nm).

    Treating nodes along a diagonal as belonging to onelevel, each node depends on two subproblems at the

    preceding level and one subproblem two levels prior. This DP formulation is nonserial monadic.

  • 8/3/2019 Chap12 Slides

    28/42

    Longest-Common-Subsequence Problem

    (a) Computing entries of table for the longest-common-subsequence problem. Computation proceeds along the dotteddiagonal lines. (b) Mapping elements of the table to processing

    elements.

  • 8/3/2019 Chap12 Slides

    29/42

    Longest-Common-Subsequence: Example

    Consider the LCS of two amino-acid sequences H E A G A W G H E E and P A W H E A E. For the interestedreader, the names of the corresponding amino-acids are A: Alanine, E: Glutamic acid, G: Glycine, H: Histidine, P:Proline, and W: Tryptophan.

    The Ftable for computing the LCS of the sequences. The LCS is A W H E E.

  • 8/3/2019 Chap12 Slides

    30/42

    Parallel Longest-Common-Subsequence

    Table entries are computed in a diagonal sweep from thetop-left to the bottom-right corner.

    Using nprocessors in a PRAM, each entry in a diagonalcan be computed in constant time.

    For two sequences of length n, there are 2n-1 diagonals.

    The parallel run time is (n) and the algorithm is cost-optimal.

  • 8/3/2019 Chap12 Slides

    31/42

    Parallel Longest-Common-Subsequence

    Consider a (logical) linear array of processors.Processing element Pi is responsible for the (i+1)

    thcolumn of the table.

    To compute F[i,j], processing element Pj-1 may needeither F[i-1,j-1]or F[i,j-1]from the processing element to

    its left. This communication takes time ts+ tw. The computation takes constant time (tc). We have:

    Note that this formulation is cost-optimal, however, itsefficiency is upper-bounded by 0.5!

    Can you think of how to fix this?

  • 8/3/2019 Chap12 Slides

    32/42

    Serial Polyadic DP Formulation: Floyd's All-PairsShortest Path

    Given weighted graph G(V,E), Floyd's algorithmdetermines the cost di,jof the shortest path betweeneach pair of nodes in V.

    Let dik,jbe the minimum cost of a path from node ito

    nodej, using only nodes v0,v1,,vk-1.

    We have:

    Each iteration requires time (n2) and the overall runtime of the sequential algorithm is (n3).

  • 8/3/2019 Chap12 Slides

    33/42

    Serial Polyadic DP Formulation: Floyd's All-PairsShortest Path

    A PRAM formulation of this algorithm uses n2processorsin a logical 2D mesh. Processor Pi,jcomputes the valueof di

    k,jfor k=1,2,,n in constant time.

    The parallel runtime is (n) and it is cost-optimal.

    The algorithm can easily be adapted to practicalarchitectures, as discussed in our treatment of GraphAlgorithms.

  • 8/3/2019 Chap12 Slides

    34/42

    Nonserial Polyadic DP Formulation: Optimal Matrix-Parenthesization Problem

    When multiplying a sequence of matrices, the order ofmultiplication significantly impacts operation count.

    Let C[i,j]be the optimal cost of multiplying the matricesAi,Aj.

    The chain of matrices can be expressed as a product oftwo smaller chains, Ai,Ai+1,,Akand Ak+1,,Aj.

    The chain Ai,Ai+1,,Akresults in a matrix of dimensionsri-1 x rk, and the chain Ak+1,,Ajresults in a matrix ofdimensions rkx rj.

    The cost of multiplying these two matrices is ri-1rkrj.

  • 8/3/2019 Chap12 Slides

    35/42

    Optimal Matrix-Parenthesization Problem

    We have:

  • 8/3/2019 Chap12 Slides

    36/42

    Optimal Matrix-Parenthesization Problem

    A nonserial polyadic DP formulation for finding an optimal matrixparenthesization for a chain of four matrices. A square node

    represents the optimal cost of multiplying a matrix chain. A circlenode represents a possible parenthesization.

  • 8/3/2019 Chap12 Slides

    37/42

    Optimal Matrix-Parenthesization Problem

    The goal of finding C[1,n]is accomplished in a bottom-upfashion.

    Visualize this by thinking of filling in the Ctablediagonally. Entries in diagonal lcorresponds to the cost

    of multiplying matrix chains of length l+1. The value of C[i,j]is computed as min{C[i,k] + C[k+1,j] +

    ri-1rkrj}, where kcan take values from itoj-1.

    Computing C[i,j]requires that we evaluate (j-i) terms andselect their minimum.

    The computation of each term takes time tc, and thecomputation of C[i,j]takes time (j-i)tc. Each entry indiagonal lcan be computed in time ltc.

  • 8/3/2019 Chap12 Slides

    38/42

    Optimal Matrix-Parenthesization Problem

    The algorithm computes (n-1) chains of length two. Thistakes time (n-1)tc; computing n-2chains of length threetakes time (n-2)tc. In the final step, the algorithmcomputes one chain of length nin time (n-1)tc.

    It follows that the serial time is (n3).

  • 8/3/2019 Chap12 Slides

    39/42

    Optimal Matrix-Parenthesization Problem

    The diagonal order of computation for the optimal matrix-parenthesization problem.

  • 8/3/2019 Chap12 Slides

    40/42

    Parallel Optimal Matrix-Parenthesization Problem

    Consider a logical ring of processors. In step l, each processor computes asingle element belonging to the lthdiagonal.

    On computing the assigned value of the element in table C, each processorsends its value to all other processors using an all-to-all broadcast.

    The next value can then be computed locally. The total time required to compute the entries along diagonal lis ltc+tslog

    n+tw(n-1). The corresponding parallel time is given by:

  • 8/3/2019 Chap12 Slides

    41/42

    Parallel Optimal Matrix-Parenthesization Problem

    When using p(

  • 8/3/2019 Chap12 Slides

    42/42

    Discussion of Parallel Dynamic ProgrammingAlgorithms

    By representing computation as a graph, we identifythree sources of parallelism: parallelism within nodes,parallelism across nodes at a level, and pipelining nodesacross multiple levels. The first two are available in serialformulations and the third one in non-serial formulations.

    Data locality is critical for performance. Different DPformulations, by the very nature of the problem instance,have different degrees of locality.