algorithms vlsi cad final f07

31
Algorithmic Techniques in VLSI CAD Shantanu Dutt University of Illinois at Chicago

Upload: krish-gokul

Post on 21-Oct-2015

47 views

Category:

Documents


6 download

DESCRIPTION

CAD Algorithms

TRANSCRIPT

Page 1: Algorithms VLSI CAD Final f07

Algorithmic Techniques in VLSI CAD

Shantanu Dutt

University of Illinois at Chicago

Page 2: Algorithms VLSI CAD Final f07

Algorithms in VLSI CAD

• Divide & Conquer (D&C) [e.g., merge-sort, partition-driven placement]

• Reduce & Conquer (R&C) [e.g., multilevel techniques such as the hMetis partitioner]

• Dynamic programming [e.g., matrix multiplication, optimal buffer insertion]

• Mathematical programming: linear, quadratic, 0/1 integer programming [e.g., floorplanning, global placement]

Page 3: Algorithms VLSI CAD Final f07

Algorithms in VLSI CAD (contd)

• Search Methods:– Depth-first search (DFS): mainly used to find any

solution when cost is not an issue [e.g., FPGA detailed routing---cost generally determined at the global routing phase]

– Breadth-first search (BFS): mainly used to find a soln at min. distance from root of search tree [e.g., maze routing when cost = dist. from root]

– Best-first search (BeFS): used to find optimal solutions w/ any cost function, Can be done when a provable lower-bound of the cost can be determined for each branching choice from the “current partial soln node” [e.g., TSP, global routing]

• Iterative Improvement: deterministic, stochastic

Page 4: Algorithms VLSI CAD Final f07

Divide & Conquer• Determine if the problem can be solved in a hierarchical or divide-&-

conquer (D&C) manner:

– D&C approach: See if the problem can be “broken up” into 2 or more smaller subproblems that can be “stitched-up” to give a soln. to the parent prob.– Do this recrusively for each large subprob until subprobs are small enough for an “easy” solution technique (could be exhasutive!)– If the subprobs are of a similar kind to the root prob then the breakup and stitching will also be similar

Subprob. A1

A1,1 A1,2 A2,1 A2,2

Root problem A

Subprob. A2

Stitch-up of solns to A1 and A2to form the complete soln to A

Do recursively until subprob-size is s.t. TT-based design is doable

Page 5: Algorithms VLSI CAD Final f07

Reduce-&-Conquer

Reduce problem size(Coarsening)

Solve

Uncoarsen andrefine solution

• Examples: Multilevel graph/hypergraph partitioning (e.g., hMetis), multilevel routing

Page 6: Algorithms VLSI CAD Final f07

Dynamic Programming (DP)

• The above property means that everytime we optimally solve the subproblem, we can store/record the soln and reuse it everytime it is part of the formulation of a higher-level problem

Stitch-upfunction

Stitch-up function f:Optimal soln of root =f(optimal solns of subproblems)= f(opt(A1), opt(A2), opt(A3), opt(A4)

RootProblem

A

A1 A2 A3 A4

Subproblems

Page 7: Algorithms VLSI CAD Final f07

Dynamic Programming (contd)

• Matrix multiplication example: Most computationally efficient way to perform the series of matrix mults: M = M1 x M2 x ………….. x Mn, Mi is of size ri x ci w/ ri = ci-1 for i > 1.• DP formulation: opt_seq(M) = (by defn) opt_seq(M(1,n)) = mini=1 to n-1 {opt_seq(M(1, i)) + opt_seq(M(i+1, n)) + r1xcixcn}• Correctness rests on the property that the optimal way of multiplying M1x … x Mi& Mi+1 to Mn will be used in the “min” stitch-up function to determine the optimal soln for M• Thus if the optimal soln invloves a “cut” at Mr, then the opt_seq(M(1,r)) & opt_seq(M(r+1,n)) will be part of opt_seq(M)• Perform computation bottom-up (smallest sequences first)• Complexity: Note that each subseq M(j, k) will appear in the above computation and is solved exactly once (irrespective of how many times it appears).

• Time to solve M(j, k), j < n, k >= j, not counting the time to solve its subproblems (which are accounted for in the complexity of each M(j,k)) is length l of seq -1 = l-1 (min of l-1 different options is computed). Note l = j-k+1• # of different M(j, k)’s is of length l = n – l + 1, 2 <= l <= n.• Total complexity = Sum i = 1 to n-1 (i+1) (n-i) = O(n 3) (as opposed to, say, O(2 n) using exhaustive search)

Stitch-upfunction

RootProblem

A

A1 A2 A3 A4

Subproblems

Page 8: Algorithms VLSI CAD Final f07

A DP Example: Simple Buffer Insertion Problem

Given: Source and sink locations, sink capacitancesand RATs, a buffer type, source delay rules, unit wire resistance and capacitance

Buffer

RAT1

RAT2

RAT3

RAT4

s0

Courtesy: Chuck Alpert, IBM

Page 9: Algorithms VLSI CAD Final f07

Simple Buffer Insertion Problem (contd)

Find: Buffer locations and a routing tree such that slack/RAT at the source is maximized

RAT2

RAT3

RAT4

RAT1

s0

)},()({min)( 0410 iii ssdelaysRATsq

Courtesy: Chuck Alpert, IBM

Page 10: Algorithms VLSI CAD Final f07

Slack/RAT Example

RAT = 400delay = 600

RAT = 500delay = 350

RAT = 400delay = 300

RAT = 500delay = 400

Slack/RAT = -200

Slack/RAT = +100

Courtesy: Chuck Alpert, IBM

Page 11: Algorithms VLSI CAD Final f07

Elmore Delay

22211 )()( CRCCRCADelay

A B CR1 R2

C1 C2

Courtesy: Chuck Alpert, IBM

Page 12: Algorithms VLSI CAD Final f07

DP Example: Van Ginneken Buffer Insertion Algorithm [ISCAS’90]

• Associate each leaf node/sink with two metrics (Ct, Tt)– Downstream loading capacitance (Ct) and RAT (Tt)

– DP-based alg propagates potential solutions bottom-up [Van Ginneken,

90] • Add a wire

• Add a buffer

• Merge two solutions: For each Zn=(Cn,Tn),

Zm=(Cm,Tm) soln. vectors in the 2 subtrees,

create a soln vector Zt=(Ct,Tt) where

1

2

t n w

t n w n w w

C C C

T T R L R C

t b

t n b b n

C C

T T T R L

min( , )t n m

t n m

C C C

T T T

Cn, Tn

Ct, Tt

Cn, TnCt, Tt

Cn, Tn Cm, Tm

Ct, Tt

Cw, Rw

Courtesy: UCLA

Note: Take Ln = Cn

Page 13: Algorithms VLSI CAD Final f07

DP Example (contd)

• Add a wire to each merged solution Zt (same cap. & delay change formulation as before)

• Add a buffer to each Zt

• Delete all dominated solutions Zd: Zd=(Cd, Td) is dominated if there exists a Zr=(Cr, Tr) s.t. Cd >= Cr and Td <= Tr (i.e., both metrics are worse)

• The remaining soln vectors are all “optimal” solns for this subtree and one of them will be part of the optimal solution at the root/driver of the net---this is the DP feature of this algorithm

RAT2

RAT3

RAT4

RAT1

s0

Page 14: Algorithms VLSI CAD Final f07

Van Ginneken Example

(20,400)

(20,400)(30,250)(5, 220)

WireC=10,d=150

BufferC=5, d=30

(20,400)

BufferC=5, d=50C=5, d=30

WireC=15,d=200C=15,d=120

(30,250)(5, 220)

(45, 50)(5, 0)(20,100)(5, 70)

Courtesy: Chuck Alpert, IBM

Page 15: Algorithms VLSI CAD Final f07

Van Ginneken Example Cont’d

(20,400)(30,250)(5, 220)

(45, 50)(5, 0)(20,100)(5, 70)

(5,0) is inferior to (5,70). (45,50) is inferior to (20,100)

(20,400)(30,250)(5, 220)

(20,100)(5, 70)(30,10)

(15, -10)

Pick solution with largest slack, follow arrows to get solution

Wire C=10

Courtesy: Chuck Alpert, IBM

Page 16: Algorithms VLSI CAD Final f07

Mathematical Programming

Linear programming (LP)E.g., Obj: Min 2x1-x2+x3w/ constraintsx1+x2 <= a, x1-x3 <= b-- solvable in polynomial time

Quadratic programming (QP)E.g., Min. x12 – x2x3w/ linear constraints-- solvable in polynomial(cubic) time w/ equality constraints

Others

Mixed integer linear prog (ILP)-- NP-hard

Mixed integer quad. prog (IQP)-- NP-hard

Mixed 0/1 integer linear prog(0/1 ILP)-- NP-hard

Mixed 0/1 integer quad. prog(0/1 IQP)-- NP-hard

Some varsare integers

Some varsare in {0,1}

Page 17: Algorithms VLSI CAD Final f07

0/1 ILP/QLP Examples

• Generally useful for “assignment” problems, where objects {O1, ..., On) are assigned to bins {B1, ..., Bm}• 0/1 variable x

i,j = 1 of object Oi is assigned to bin Bj

• Min-cut bi-partitioning for graphs G(V,E) can me modeled as a 0/1 IQP

V1V2

uiuj

➢ xi,1 = 1 => u

i in V1 else u

i in

V2➢ Edge (ui, uj) in cutset ifx

i,1 (1-x

j,1) + (1-x

i,1)(x

j,1 ) = 1

Objective function: Min Sum

(ui, uj) in E c(i,j) (x

i,1 (1-x

j,1) + (1-x

i,1)

(xj,1

)➢ Constraint: Sum w(ui) x

i,1 <= max-size

Page 18: Algorithms VLSI CAD Final f07

Search TechniquesA

BC

D

E

F

G

A

BC

D

E

F

G

1

2

3

45

6

A

BC

D

E

F

G

1

2

3

45

6

7

DFS BFSGraph

dfs(v) /* for basic graph visit or for soln finding when nodes are partial solns */ v.mark = 1; for each (v,u) in E if (u.mark != 1) then dfs(u)

Algorithm Depth_First_Search for each v in V v.mark = 0; for each v in V if v.mark = 0 then if G has partial soln nodes then dfs(v); else soln_dfs(v);

soln_dfs(v)/* used when nodes are basic elts of the problem and not partial soln nodes */v.mark = 1;If path to v is a soln, then return(1);for each (v,u) in E if (u.mark != 1) then soln_found = soln_dfs(u) if (soln_found = 1) then return(soln_found)end for;v.mark = 0; /* can visit v again to form another soln on a different path */return(0)

Page 19: Algorithms VLSI CAD Final f07

Search Techniques—Exhaustive DFSA

BC

D

E

F

G

1

2

3

45

6

DFS

optimal_soln_dfs(v)/* used when nodes are basic elts of the problem and not partial soln nodes */beginv.mark = 1;If path to v is a soln, then begin if cost < best_cost then begin best_soln=soln; best_cost=cost; endif v.mark=0; return;Endiffor each (v,u) in E if (u.mark != 1) then optimal_soln_dfs(u)end for;v.mark = 0; /* can visit v again to form another soln on a different path */end

Algorithm Depth_First_Search for each v in V v.mark = 0; best_cost = infinity; optimal_soln_dfs(root);

Page 20: Algorithms VLSI CAD Final f07

Best-First Search10

12 15 19

18

1718

16

(1)

(2)

(3)

costsBeFS (root)begin open = {root} /* open is list of gen. but not expanded nodes---partial solns */ best_soln_cost = infinity; while open != nullset do begin curr = first(open); if curr is a soln then return(curr) /* curr is an optimal soln */ else children = Expand_&_est_cost(curr); /* generate all children of curr & estimate their costs---cost(u) should be a lower bound of cost of the best soln reachable from u */ for each child in children do begin if child is a soln then delete all nodes w in open s.t. cost(w) >= cost(child); endif store child in open in increasing order of cost; endfor endwhileend /* BFS */

Expand_&_est_cost(Y)begin children = nullset; for each basic elt x of problem “reachable” from Y & can be part of current partial soln. Y do begin if x not in u and if feasible child = Y U {x}; path_cost(child) = path_cost(Y) + cost(u, x) /* cost(Y,x) is cost of reaching x from Y */ est(child) = lower bound cost of best soln reachable from child; cost(child) = path_cost(child) + est(child); children = children U {child}; endforend /* Expand_&_est_cost(Y);

Page 21: Algorithms VLSI CAD Final f07

Best-First Search

10

12 15 19

18

1718

16

(1)

(2)

(3)

costs

Proof of optimality when cost is a LB• The current set of nodes in “open” represents a complete front of generated nodes, i.e., the rest of the nodes in the search space are descendants of “open”• Assuming the basic cost (cost of adding an elt in a partial soln to contruct another partial soln that is closer to the soln) is non-negative, the cost is monotonic, i.e., cost of child >= cost of parent• If first node curr in “open” is a soln, then cost(curr) <= cost(w) for each w in “open”•Cost of any node in the search space not in “open” and not yet generated is >= cost of its ancestor in “open” and thus >= cost(curr). Thus curr is the optimal (min-cost) soln

Page 22: Algorithms VLSI CAD Final f07

Search techs for a TSP example9

5

21

3

5 4

8

7

5

AB

C

D

E

F

B E F

F

D F

E F D E

Dx

A A

C

F E E

A A A

27 31 33

Exhaustive search using DFS (w/ backtrack) for findingan optimal solution

Solution nodes

TSP graph

Page 23: Algorithms VLSI CAD Final f07

Search techs for a TSP example (contd)9

5

21

3

5 4

8

7

5

AB

C

D

E

F

B E F

F

D F

E F

A A

C

F

A

27

23+8

BeFS for finding an optimal solution

22+9

C D E

C E D

X X X

F D

21+6C F

B F

F

A

8+16

11+14

14+9

20

5+15

• Lower-bound cost estimate: MST({unvisited cities} U {current city} U {start city})• LB as structure (spanning tree) is a superset of reqd soln structure (cycle)• min_cost(set S) <= min_cost(set S’) if S is a superset of S’

MST for node (A, E, F); =MST{F,A,B,C,D}; cost=16

Path cost for(A,E,F) = 8

Page 24: Algorithms VLSI CAD Final f07

BFS for 0/1 ILP Solution

X = {x1, …, xm} are 0/1 vars

X2=0 X2=1

Solve LPw/ x2=0;Cost=cost(LP)=C1

Solve LPw/ x2=1;Cost=cost(LP)=C2

Solve LPw/ x2=1, x4=0;Cost=cost(LP)=C3

Solve LPw/ x2=1, x4=1;Cost=cost(LP)=C4

X4=0 X4=1

X5=0 X5=1

Solve LPw/ x2=1, x4=1, x5=1Cost=cost(LP)=C6

Solve LPw/ x2=1, x4=1, x5=0Cost=cost(LP)=C5

optimal soln

Cost relations:C5 < C3 < C1 < C6C2 < C1C4 < C3

Page 25: Algorithms VLSI CAD Final f07

Iterative Improvement Techniques

Iterative improvement

Deterministic GreedyStochastic(non-greedy)

Locally/immediately greedy

Non-locally greedy

Make move that isimmediately (locally) bestUntil (no further impr.)(e.g., FM)

Make move that isbest according to somenon-immediate (non-local)metric (e.g., probability-based lookahead as in PROP)Until (no further impr.)

Make a combination of deterministic greedy moves and probabilistic moves that cause a detrioration (can help to jump out of local minima)Until (stopping criteria satisfied)• Stopping criteria could be an upper bound on the total # of moves or iterations

Page 26: Algorithms VLSI CAD Final f07
Page 27: Algorithms VLSI CAD Final f07
Page 28: Algorithms VLSI CAD Final f07
Page 29: Algorithms VLSI CAD Final f07
Page 30: Algorithms VLSI CAD Final f07
Page 31: Algorithms VLSI CAD Final f07