yilin zhang and david z. pan ece, univ. of texas at austin

32
1 Timing-Driven, Over-the-Block Rectilinear Steiner Tree Construction with Pre-Buffering and Slew Constraints Yilin Zhang and David Z. Pan ECE, Univ. of Texas at Austin ISPD’ 2014

Upload: ehren

Post on 06-Feb-2016

49 views

Category:

Documents


0 download

DESCRIPTION

ISPD’ 2014. Timing-Driven, Over-the-Block Rectilinear Steiner Tree Construction with Pre-Buffering and Slew Constraints. Yilin Zhang and David Z. Pan ECE, Univ. of Texas at Austin. Outline. Background & Motivation T OB-RSMT Problem Formulation T OB-RSMT Algorithms - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Yilin  Zhang and  David Z. Pan ECE, Univ. of Texas at  Austin

1

Timing-Driven, Over-the-Block Rectilinear Steiner Tree

Construction with Pre-Buffering and Slew Constraints

Yilin Zhang and David Z. Pan

ECE, Univ. of Texas at Austin

ISPD’ 2014

Page 2: Yilin  Zhang and  David Z. Pan ECE, Univ. of Texas at  Austin

Outline

Background & MotivationTOB-RSMT

› Problem Formulation› TOB-RSMT Algorithms

Experimental ResultsConclusion

2

Page 3: Yilin  Zhang and  David Z. Pan ECE, Univ. of Texas at  Austin

History of VLSI RSMTs

Wirelength driven: BOI, BI1S, RV-based RST, FLUTE and GeoSteiner

Obstacle-avoiding RSMT (OA-RSMT) › [Chow+, VLSI14] [Liu+, DAC12][Li+, ICCAD08]

Over-the-block RSMT (OB-RSMT) are proposed since 2012

› [Huang+, ICCAD12] [Zhang+, ICCAD12] Minimum delay routing tree (MDRT) : BA-Tree,

etc.RAT-driven RSMT: C-Tree, etc.

3

Page 4: Yilin  Zhang and  David Z. Pan ECE, Univ. of Texas at  Austin

Limitations on Previous Timing-driven RST

Cluster nodes during bottom-up method› Such as BA-Tree and C-Tree

Clustering distance metric: › spatial and slack

4

Hard to find accurate slack:Some segments are not fixed yet All segments are not buffered yet

Page 5: Yilin  Zhang and  David Z. Pan ECE, Univ. of Texas at  Austin

Limitations in Dealing Blocks

Completely neglect block will have slew problem› No over-the-block buffer allowed

Obstacle avoiding› More congested outside-block› Detour means more WL and worse timing

5

detours

Page 6: Yilin  Zhang and  David Z. Pan ECE, Univ. of Texas at  Austin

Post-buffering Topology Tuning is NecessaryBuffering plays a big role in delay reduction

› Shielding effect; linear delay on long wire› But it is always placed after wiring

Change topology after buffering is fruitful!

6

DSB unchanged

DSA decreased Db2

Page 7: Yilin  Zhang and  David Z. Pan ECE, Univ. of Texas at  Austin

Our Contributions

Use pre-buffering to find practical slack for each node in the graph

Use over-the-block routing resource to improve WL, buffering cost and timing

Apply post-buffering tuning to improve timing on critical paths with little extra cost

7

Page 8: Yilin  Zhang and  David Z. Pan ECE, Univ. of Texas at  Austin

Outline

Background & MotivationTOB-RSMT

› Problem Formulation› TOB-RSMT Algorithms

Experimental ResultsConclusion

8

Page 9: Yilin  Zhang and  David Z. Pan ECE, Univ. of Texas at  Austin

Problem Formulation

N = {s0,s1,s2,...,sn}, n sinks and source s0

B = {b1, b2, . . . , bm}, non-overlapping rectilinear blocks in two-dimensional space R

Buffered T(V, E) connects all the pins in N to optimize WNS with the lowest buffering cost

› V is the set of nodes › E is the set of horizontal and vertical edges.

Slew rate on every point in T within constraints› Slew mode buffering [Hu+, TCAD07]

No buffers are allowed over the blocks

9

Page 10: Yilin  Zhang and  David Z. Pan ECE, Univ. of Texas at  Austin

Timing Models

Elmore Delay

Slew

› Peri Model + Bakoglu’s Metric

» ( 4% error [Kashyap+, ISPD03] [Bakoglu+, 90] )

10

Page 11: Yilin  Zhang and  David Z. Pan ECE, Univ. of Texas at  Austin

Overall Algorithm

11

Initial timing-driven RST with Pre-buffering

Find all over-the-block slew violation and fix them

Buffering

Tune the topology according to buffering information

Buffering

N & B

Return buffered T

Page 12: Yilin  Zhang and  David Z. Pan ECE, Univ. of Texas at  Austin

Initial Tree Generation with Pre-Buffering

12

Iterative method› Until converges or oscillates between several states

Feed back real delay to each node to find slack (criticality)

› Identified critical sinks before topology construction are real critical ones

› Practical slack on each node

Page 13: Yilin  Zhang and  David Z. Pan ECE, Univ. of Texas at  Austin

Initial Tree with Pre-Buffering Flow

13

[Lin+, TCAD11]

Page 14: Yilin  Zhang and  David Z. Pan ECE, Univ. of Texas at  Austin

14

Initial Tree with Pre-Buffering Example

Simple model without buffering suggests D is critical

However, with buffering, D is not critical

Now, D is inserted far from source

with less WL

Page 15: Yilin  Zhang and  David Z. Pan ECE, Univ. of Texas at  Austin

Buffering-Aware Over-the-Block TD-RST

TD-RST needs over-the-block route

› Better WL, buffer resources and timing

› Replace obstacle-avoiding detours with shorter over-

the-block connection

15

150ps 100ps

120ps 110ps

Page 16: Yilin  Zhang and  David Z. Pan ECE, Univ. of Texas at  Austin

16

Different with WL-driven BOB-RSMT

Original

WL driven

Move non-critical paths to save slew

Protect critical paths for timing

WL+slack

Page 17: Yilin  Zhang and  David Z. Pan ECE, Univ. of Texas at  Austin

The hard problem with over-the-block is slewEach topology confines a set of inside treesUse hypothetic buffer to check if it is possible for

buffering

17

Slew Constraints in Buffering-Aware TD-RST

Page 18: Yilin  Zhang and  David Z. Pan ECE, Univ. of Texas at  Austin

Optimization Primitives

Three optimization primitives

18

Parallel sliding Perpendicular sliding

EP merging[Zhang, ICCAD12]

Page 19: Yilin  Zhang and  David Z. Pan ECE, Univ. of Texas at  Austin

Formulation consider slack and WL together

19

Formulation of Buffering-Aware TD-RST

WijCdEPit: delay

increase for every sink downstream EPi

t

Increase of TNS

Increase of WL

Page 20: Yilin  Zhang and  David Z. Pan ECE, Univ. of Texas at  Austin

Buffer-location-based Tuning Benefits

Tuning topology after buffering benefits!

Buffering resources are costly

Improve timing without increasing buffers is

tempting

› With small amount of WL increase

We propose a way to post-tune the topology

base on buffer location information

20

Page 21: Yilin  Zhang and  David Z. Pan ECE, Univ. of Texas at  Austin

Saturated/Un-saturated Buffers

Some buffers are “Saturated” and some are “Un-

saturated”

› Saturate: the slew reaches maximum

› Un-saturated: slew does not reach maximum

21

Page 22: Yilin  Zhang and  David Z. Pan ECE, Univ. of Texas at  Austin

Buffer-location-based Tuning Study

Un-saturated buffer == opportunity

22

WL increase

Delay to A improves

Page 23: Yilin  Zhang and  David Z. Pan ECE, Univ. of Texas at  Austin

Buffer-location-based Tuning Condition

Δslew = slewmax – slewcur

Lmax is the max allowed distance to relocate

› If neglecting buffer input cap, Lmax =

› If consider buffer input cap, Lmax =

23

Page 24: Yilin  Zhang and  David Z. Pan ECE, Univ. of Texas at  Austin

Buffer-location-based Tuning Flow

24

Sort all sinks according to slack

Tuning

Buffered T

Return buffered T

n = n.parent

satisfy Lmax constraint ?

For each neg slack sink n

n at source?

N

YContinue

Buffering

Page 25: Yilin  Zhang and  David Z. Pan ECE, Univ. of Texas at  Austin

Outline

Background & MotivationTOB-RSMT

› Problem Formulation› TOB-RSMT Algorithms

Experimental ResultsConclusion

25

Page 26: Yilin  Zhang and  David Z. Pan ECE, Univ. of Texas at  Austin

Experimental Setups

C++ programming language Intel Core 3.0GHz Linux machine with 32GB

memory Gurobi Optimizer 5.10 for mathematical

optimizationRC01-RC12 are benchmarks [Feng+, ISPD06]Two sizes of buffers: 450 ohms and 850 ohms,

3.8 fF and 1.9 fF Interconnect RC from ITRS and slew constraints

70ps

26

Page 27: Yilin  Zhang and  David Z. Pan ECE, Univ. of Texas at  Austin

Experimental Setups

SD-OARST is baseline [Lin+, TCAD11]TOB-RST-1 OA-RST with pre-bufferingTOB-RST-2 is over-the-block with pre-bufferingTOB-RST is over-the-block with pre-buffering

and post-buffering tuning

27

Page 28: Yilin  Zhang and  David Z. Pan ECE, Univ. of Texas at  Austin

Experimental Results

28

TOB-RST-1 to SD-OARST › similarity of WL (buffering cost)› pre-buffering benefits the slack

TOB-RST-2 to TOB-RST-1: › 179ps on average for WNS› buffering cost and WL reduced by 6% and 5%

TOB-RST to TOB-RST-2: › 70ps in WNS on average, less than 1% more WL

Page 29: Yilin  Zhang and  David Z. Pan ECE, Univ. of Texas at  Austin

Experimental Results

29

Page 30: Yilin  Zhang and  David Z. Pan ECE, Univ. of Texas at  Austin

Outline

Background & MotivationTOB-RSMT

› Problem Formulation› TOB-RSMT Algorithms

Experimental ResultsConclusion

30

Page 31: Yilin  Zhang and  David Z. Pan ECE, Univ. of Texas at  Austin

Conclusion

Timing-driven over-the-block rectilinear Steiner minimum tree

Use pre-buffering to find practical slack for each node

Use over-the-block routing resources to improve WL, buffering cost and timing

Apply post-buffering tuning to improve timing on critical paths with little extra cost

Significantly improve WNS for all benchmarks along with 2% less WL and 4% less buffering cost than SD-OARST

31

Page 32: Yilin  Zhang and  David Z. Pan ECE, Univ. of Texas at  Austin

Acknowledgment

This work is supported in part by Oracle Thanks to Dr. Salim Chowdhury, Dr. Rajendran

Panda and Dr. Akshay Sharma from Oracle

32

Thank you!Questions?