chap12 slides

8/3/2019 Chap12 Slides

1/42

Dynamic Programming

Ananth Grama, Anshul Gupta, GeorgeKarypis, and Vipin Kumar

To accompany the text ``Introduction to Parallel Computing'', Addison Wesley, 2003


2/42

TopicOverview

Overview of Serial Dynamic Programming

Serial Monadic DP Formulations

Nonserial Monadic DP Formulations

Serial Polyadic DP Formulations

Nonserial Polyadic DP Formulations


3/42

Overview of Serial Dynamic Programming

Dynamic programming(DP) is used to solve a widevariety of discrete optimization problems such asscheduling, string-editing, packaging, and inventorymanagement.

Break problems into subproblems and combine theirsolutions into solutions to larger problems.

In contrast to divide-and-conquer, there may berelationships across subproblems.


4/42

Dynamic Programming: Example

Consider the problem of finding a shortest path betweena pair of vertices in an acyclic graph.

An edge connecting node ito nodejhas cost c(i,j).

The graph contains nnodes numbered 0,1,, n-1, and

has an edge from node i to nodejonly if i < j. Node 0 issource and node n-1 is the destination.

Let f(x) be the cost of the shortest path from node 0 tonode x.


5/42


A graph for which the shortest path between nodes 0

and 4 is to be computed.


6/42

Dynamic Programming

The solution to a DP problem is typically expressed as aminimum (or maximum) of possible alternate solutions.

If rrepresents the cost of a solution composed ofsubproblems x1, x2,, xl, then rcan be written as

Here, gis the composition function.

If the optimal solution to each problem is determined by

composing optimal solutions to the subproblems andselecting the minimum (or maximum), the formulation issaid to be a DP formulation.


7/42


The computation and composition of subproblem solutionsto solve problem f(x8).


8/42

Dynamic Programming

The recursive DP equation is also called the functionalequationor optimization equation.

In the equation for the shortest path problem thecomposition function is f(j) + c(j,x). This contains a single

recursive term (f(j)). Such a formulation is calledmonadic.

If the RHS has multiple recursive terms, the DPformulation is called polyadic.


9/42

Dynamic Programming

The dependencies between subproblems can beexpressed as a graph.

If the graph can be levelized (i.e., solutions to problemsat a level depend only on solutions to problems at the

previous level), the formulation is called serial, else it iscalled non-serial.

Based on these two criteria, we can classify DPformulations into four categories - serial-monadic, serial-polyadic, non-serial-monadic, non-serial-polyadic.

This classification is useful since it identifies concurrencyand dependencies that guide parallel formulations.


10/42

Serial Monadic DP Formulations

It is difficult to derive canonical parallel formulations forthe entire class of formulations.

For this reason, we select two representative examples,the shortest-path problem for a multistage graph and the0/1 knapsack problem.

We derive parallel formulations for these problems andidentify common principles guiding design within theclass.


11/42

Shortest-Path Problem

Special class of shortest path problem where the graphis a weighted multistage graph of r + 1 levels.

Each level is assumed to have nlevels and every nodeat level iis connected to every node at level i + 1.

Levels zero and rcontain only one node, the source anddestination nodes, respectively.

The objective of this problem is to find the shortest pathfrom Sto R.


12/42


An example of a serial monadic DP formulation for findingthe shortest path in a graph whose nodes can be

organized into levels.


13/42


The ith node at level lin the graph is labeled viland the

cost of an edge connecting vilto node vj

l+1 is labeled cil,j.

The cost of reaching the goal node Rfrom any node vil is

represented by Cil.

If there are nnodes at level l, the vector[C0

l, C1l,,

Cnl-1]

Tis referred to as Cl. Note that

C0= [C00].

We have Cil= min {(ci

l,j+ Cj

l+1) | jis a node at level l + 1}


14/42


Since all nodes vjr-1 have only one edge connecting them

to the goal node Rat level r, the cost Cjr-1 is equal to cj

r,-

R1.

We have:

Notice that this problem is serial and monadic.


15/42


The cost of reaching the goal node Rfrom any node atlevel lis (0 < l < r 1) is


16/42


We can express the solution to the problem as amodified sequence of matrix-vector products.

Replacing the addition operation by minimization and themultiplication operation by addition, the preceding set ofequations becomes:

where Cl

and Cl+1

are n x 1 vectors representing the costof reaching the goal node from each node at levels landl + 1.


17/42


Matrix Ml,l+1 is an n x nmatrix in which entry (i, j) storesthe cost of the edge connecting node iat level lto nodejat level l + 1.

The shortest path problem has been formulated as asequence of rmatrix-vector products.


18/42

Parallel Shortest-Path

We can parallelize this algorithm using the parallelalgorithms for the matrix-vector product.

(n) processing elements can compute each vector Cl intime (n) and solve the entire problem in time (rn).

In many instances of this problem, the matrix Mmay besparse. For such problems, it is highly desirable to usesparse matrix techniques.


19/42

0/1 Knapsack Problem

We are given a knapsack of capacity cand a set of nobjectsnumbered 1,2,,n. Each object ihas weight wiand profit pi.

Let v = [v1, v2,, vn]be a solution vector in which vi= 0if object iisnot in the knapsack, and vi= 1 if it is in the knapsack.

The goal is to find a subset of objects to put into the knapsack sothat

(that is, the objects fit into the knapsack) and

is maximized (that is, the profit is maximized).


20/42


The naive method is to consider all 2npossible subsetsof the nobjects and choose the one that fits into theknapsack and maximizes the profit.

Let F[i,x]be the maximum profit for a knapsack ofcapacity xusing only objects {1,2,,i}. The DPformulation is:


21/42


Construct a table Fof size n x cin row-major order.

Filling an entry in a row requires two entries from theprevious row: one from the same column and one fromthe column offset by the weight of the objectcorresponding to the row.

Computing each entry takes constant time; thesequential run time of this algorithm is (nc).

The formulation is serial-monadic.


22/42


Computing entries of table Ffor the 0/1 knapsack problem. The computation ofentry F[i,j]requires communication with processing elements containing

entries F[i-1,j]and F[i-1,j-wi].


23/42


Using cprocessors in a PRAM, we can derive a simpleparallel algorithm that runs in O(n) time by partitioningthe columns across processors.

In a distributed memory machine, in thejth iteration, forcomputing F[j,r]at processing element Pr-1, F[j-1,r]isavailable locally but F[j-1,r-wj]must be fetched.

The communication operation is a circular shift and thetime is given by (ts+ tw) log c. The total time is thereforetc+ (ts+ tw) log c.

Across all niterations (rows), the parallel time is O(n logc). Note that this is not cost optimal.


24/42


Using p-processing elements, each processing elementcomputes c/pelements of the table in each iteration.

The corresponding shift operation takes time (2ts+ twc/p),since the data block may be partitioned across twoprocessors, but the total volume of data is c/p.

The corresponding parallel time is n(tcc/p + 2ts+ twc/p),or O(nc/p) (which is cost-optimal).

Note that there is an upper bound on the efficiency ofthis formulation.


25/42

Nonserial Monadic DP Formulations: Longest-Common-Subsequence

Given a sequence A = , a subsequence of

A can be formed by deleting some entries from A.

Given two sequences A = and B = , find the longest sequence that is a

subsequence of both A and B.

If A = and B = , the longestcommon subsequence of A and Bis .


26/42

Longest-Common-Subsequence Problem

Let F[i,j]denote the length of the longest commonsubsequence of the first ielements of A and the firstjelements of B. The objective of the LCS problem is tofind F[n,m].

We can write:


27/42


The algorithm computes the two-dimensional Ftable in arow- or column-major fashion. The complexity is (nm).

Treating nodes along a diagonal as belonging to onelevel, each node depends on two subproblems at the

preceding level and one subproblem two levels prior. This DP formulation is nonserial monadic.


28/42


(a) Computing entries of table for the longest-common-subsequence problem. Computation proceeds along the dotteddiagonal lines. (b) Mapping elements of the table to processing

elements.


29/42

Longest-Common-Subsequence: Example

Consider the LCS of two amino-acid sequences H E A G A W G H E E and P A W H E A E. For the interestedreader, the names of the corresponding amino-acids are A: Alanine, E: Glutamic acid, G: Glycine, H: Histidine, P:Proline, and W: Tryptophan.

The Ftable for computing the LCS of the sequences. The LCS is A W H E E.


30/42

Parallel Longest-Common-Subsequence

Table entries are computed in a diagonal sweep from thetop-left to the bottom-right corner.

Using nprocessors in a PRAM, each entry in a diagonalcan be computed in constant time.

For two sequences of length n, there are 2n-1 diagonals.

The parallel run time is (n) and the algorithm is cost-optimal.


31/42

Parallel Longest-Common-Subsequence

Consider a (logical) linear array of processors.Processing element Pi is responsible for the (i+1)

thcolumn of the table.

To compute F[i,j], processing element Pj-1 may needeither F[i-1,j-1]or F[i,j-1]from the processing element to

its left. This communication takes time ts+ tw. The computation takes constant time (tc). We have:

Note that this formulation is cost-optimal, however, itsefficiency is upper-bounded by 0.5!

Can you think of how to fix this?


32/42

Serial Polyadic DP Formulation: Floyd's All-PairsShortest Path

Given weighted graph G(V,E), Floyd's algorithmdetermines the cost di,jof the shortest path betweeneach pair of nodes in V.

Let dik,jbe the minimum cost of a path from node ito

nodej, using only nodes v0,v1,,vk-1.

We have:

Each iteration requires time (n2) and the overall runtime of the sequential algorithm is (n3).


33/42

Serial Polyadic DP Formulation: Floyd's All-PairsShortest Path

A PRAM formulation of this algorithm uses n2processorsin a logical 2D mesh. Processor Pi,jcomputes the valueof di

k,jfor k=1,2,,n in constant time.

The parallel runtime is (n) and it is cost-optimal.

The algorithm can easily be adapted to practicalarchitectures, as discussed in our treatment of GraphAlgorithms.


34/42

Nonserial Polyadic DP Formulation: Optimal Matrix-Parenthesization Problem

When multiplying a sequence of matrices, the order ofmultiplication significantly impacts operation count.

Let C[i,j]be the optimal cost of multiplying the matricesAi,Aj.

The chain of matrices can be expressed as a product oftwo smaller chains, Ai,Ai+1,,Akand Ak+1,,Aj.

The chain Ai,Ai+1,,Akresults in a matrix of dimensionsri-1 x rk, and the chain Ak+1,,Ajresults in a matrix ofdimensions rkx rj.

The cost of multiplying these two matrices is ri-1rkrj.


35/42

Optimal Matrix-Parenthesization Problem

We have:


36/42


A nonserial polyadic DP formulation for finding an optimal matrixparenthesization for a chain of four matrices. A square node

represents the optimal cost of multiplying a matrix chain. A circlenode represents a possible parenthesization.


37/42


The goal of finding C[1,n]is accomplished in a bottom-upfashion.

Visualize this by thinking of filling in the Ctablediagonally. Entries in diagonal lcorresponds to the cost

of multiplying matrix chains of length l+1. The value of C[i,j]is computed as min{C[i,k] + C[k+1,j] +

ri-1rkrj}, where kcan take values from itoj-1.

Computing C[i,j]requires that we evaluate (j-i) terms andselect their minimum.

The computation of each term takes time tc, and thecomputation of C[i,j]takes time (j-i)tc. Each entry indiagonal lcan be computed in time ltc.


38/42


The algorithm computes (n-1) chains of length two. Thistakes time (n-1)tc; computing n-2chains of length threetakes time (n-2)tc. In the final step, the algorithmcomputes one chain of length nin time (n-1)tc.

It follows that the serial time is (n3).


39/42


The diagonal order of computation for the optimal matrix-parenthesization problem.


40/42

Parallel Optimal Matrix-Parenthesization Problem

Consider a logical ring of processors. In step l, each processor computes asingle element belonging to the lthdiagonal.

On computing the assigned value of the element in table C, each processorsends its value to all other processors using an all-to-all broadcast.

The next value can then be computed locally. The total time required to compute the entries along diagonal lis ltc+tslog

n+tw(n-1). The corresponding parallel time is given by:


41/42

Parallel Optimal Matrix-Parenthesization Problem

When using p(


42/42

Discussion of Parallel Dynamic ProgrammingAlgorithms

By representing computation as a graph, we identifythree sources of parallelism: parallelism within nodes,parallelism across nodes at a level, and pipelining nodesacross multiple levels. The first two are available in serialformulations and the third one in non-serial formulations.

Data locality is critical for performance. Different DPformulations, by the very nature of the problem instance,have different degrees of locality.

chap12 slides

Documents