physical synthesis comes of age chuck alpert, ibm corp. chris chu, iowa state university paul...
TRANSCRIPT
Physical Synthesis Comes of
Age
Chuck Alpert, IBM Corp.Chris Chu, Iowa State UniversityPaul Villarrubia, IBM Corp.
2
Physical Synthesis Family Tree
Roles of layout as a parent: Clean up the mess created by physical synthesis
(Implement the netlist generated by physical synthesis) Provide guidance to physical synthesis
so that it will do things right
Is layout mature enough to serve the role? Is there still room for layout to grow?
Synthesis Layout
PhysicalSynthesis
3
New Requirements of Placement
1. Super fast 4 to 8 million objects now Provide quick feedbacks to physical synthesis to
refine the netlist
2. Stable in handling incremental placement Physical synthesis constantly makes changes to
netlist
3. Flexible objective function Timing, Power, Routability
4. Handle mixed-size modules Hierarchical design and use of IP blocks are
common
4
Placement As a Baby
Simulated annealing based placement Popularized by Timberwolf [DAC-86]
Greedy Algorithm Simulated Annealing
•You only have 1 chance.•If you get stuck, I will terminate you!
•OK to make mistakes. Keep trying! •Evaluation/Feedback is important.
Strength: Good quality for small designs Easy to consider different objective functions Handle incremental changes well
Weakness: Very slow – crawling Non-trivial to handle modules of different sizes
5
Placement As a Kid Min-cut placement (or Partitioning-based placement)
An old idea [Breuer, DAC-77]
Capo [DAC-00] leverages breakthrough in partitioning using multi-level technique (e.g., hMetis [DAC-97], MLFM [DAC-97])
Dragon [ICCAD-00] combines hierarchical partitioning with annealing
Strength: Efficient and scalable Very good wirelength, but can we do better?
Weakness: More difficult to handle other objectives Not stable in handling incremental changes Not good in white space management
CircuitCircuit
PlacementPlacementRegionRegion
6
White Space in Min-Cut PlacementCapo (Min-Cut)
adaptec2HPWL=9955
APlace (Analytical)adaptec2
HPWL=8715
Courtesy: IBM
7
Placement Maturing Analytical placement
Used by 4 of the top 5 placers in ISPD-05 Placement Contestand the top 5 placers in ISPD-06 Placement Contest
Strength: Fastest and scalable Best wirelength Robust framework to incorporate different objectives and
constraints Stable in handling incremental changes Good in white space management
Why would analytical placement work so well? Can see the big picture
Why was it not popular in the past? Hard to spread modules evenly in placement region
8
Attempt Still Relying on Partitioning
Gordian: Global Optimization and Rectangle Dissection [TCAD-91]
Artificial center of mass constraints disturb global optimal solution too drastically
Centers of mass
9
Another Partitioning-based Spreading
Quadratic optimization with quadrisection [Vygen, DAC-97]
Courtesy: IBM
10
Spreading by Density-based Force
Kraftwerk [DAC-98] Quadratic wirelength minimization:
Spread cells by additional forces: Density-based force to push cells away from dense to sparse
region
Great idea: Spread cells smoothly Very good wirelength
But not too fast: Constant force, hard to control convergence Density-based force expensive to compute
0)(Min 2
1
dCpconstpdCpppf TT
0 fdCp
'
'
')'(
2)( 2 rd
rr
rrrD
krf
x
rr
11
Dramatic Speedup FastPlace [ISPD-04]
repeatSolve quadratic program to minimize wirelength
Spread the cells
until cell distribution is roughly even Reduce wirelength by iterative heuristic
Hybrid Net Model Speed up solving of QP
Cell Shifting Simple technique to compute spreading force Fast convergence due to the use of pseudo-net [Hu et al.,
ISPD-02] instead of constant force Iterative Local Refinement
More efficient than using QP to refine the solution Minimize wirelength based on linear objective
12
Linearization of Quadratic Wirelength
New Kraftwerk [ICCAD-06] BoundingBox net model for multi-pin nets:
Need to know the outmost pins of a net
Accurately models HPWL Faster and less memory than clique model
Two fundamental components of spreading force: Hold force – Constant force Move force – Enforced by pseudo-net to fixed point
BoundingBox Clique
13
Relaxation Rather than Linearization
RQL [DAC-07] Force Vector Modulation to FastPlace framework Currently fastest and best wirelength
Spreading Force
Magnitude
Module Index
Rank Modules based on the
spreading force magnitude
Nullify the spreading force
for top 5-10% of modules
14
An Alterative Analytical Approach
APlace [ISPD-04], mPL5 [ISPD-05], NTUPlace3 [ICCAD-06]
Log-sum-exponential function to approximate HPWL [Naylor et al., US Patent 2001]
Density constraint is directed formulated into the objective function
Very competitive wirelength and runtime
APlace NTUP3 mPL6 RQL
Wirelength Model
Log-sum-exponential Quadratic
Spreading ForceDensity potential based
Fixed-point basedBell-shaped
Bell-shaped
Poisson smoothed
Objective Function
Non-linear & Non-convex Quadratic
nn
i
xn xxexxlse i ,,maxln,, 11
/1
15
Placement: Getting Old or Still Young?
Better approach than quadratic / analytical approach?
Massive parallelism to speed up placement Better clustering technique Marco placement / floorplanning True timing driven placement
16
Sufficient Parental Guidance? All physical synthesis gets from placement is distance
info Physical synthesis has a distorted world view!
Wirelength estimation is inaccurate (especially for nets with high pin count)
Congestion estimation is inaccurate
Area estimation is inaccurate Without buffering and gate sizing
Timing estimation is very inaccurate
S3S2S1S0
T0 T1 T2 T3
S3S2S1S0
T0 T1 T2 T3
S3S2S1S0
T0 T1 T2 T3
Routing of a Bus A Simple Solution Probablistic Estimation
series Harmonic4
1
3
1
2
11 UsageProb.
17
Routing-Driven Physical Synthesis
Need a more integrated approach Past: Placement-Driven Physical Synthesis Future: Routing-Driven Physical Synthesis
Main obstacle: Runtime
Two possibilities:1. Construct Steiner trees to guide synthesis and placement2. Perform global routing to guide synthesis and placement
18
Fast Steiner Tree Construction
FLUTE (Fast LookUp Table Estimation) [ICCAD 04, ISPD 05]
An extremely fast and accurate rectilinear Steiner Tree algorithm
Very suitable for VLSI applications: Optimal up to degree 9, Very accurate up to degree 100 Over all 1.57 million nets in 18 IBM circuits [ISPD 98]
0
1
2
3
4
0 20 40 60 80 100 120Runtime (s)
Erro
r (%
)
RMST
RSTT
SPAN BGA BI1SFLUTE
19
Is Steiner Tree Sufficient? Steiner trees do not consider detour due to routing
congestion or buffering congestion Can we predict the impact of congestion on routing? There is no way for generic estimators to accurately
estimate congestion of arbitrary global routers!
Labyrinth(70%) Labyrinth(50%) Chi Dispersion#cong #cong #match #cong #match
ibm01 238 268 54 122 44ibm02 368 390 89 46 7ibm03 247 214 47 1 0ibm04 588 596 261 273 161ibm06 367 391 81 9 1ibm07 568 643 162 122 55ibm08 486 655 138 30 18ibm09 377 399 69 12 3ibm10 501 376 93 27 16
match
Congestion by router 1
Congestion by router 2
20
Traditional Global Routing
Simultaneous approach (e.g., ILP) Very slow
Sequential approach Net-by-net routing, Rip-up and Reroute Maze routing for a net: Lee’s, Dijkstra’s, A*-search
algorithms Reasonably fast Reasonably good quality Is it good enough to handle the demand of physical
synthesis?
21
Progresses in Global Routing Pattern Routing [Kastner et al., ICCAD-00]
L-shaped, Z-shaped routes Faster
Better cost functions for maze routing [Hadsell & Madden, DAC-03; Pan & Chu, ICCAD-06]
Reduce overflow significantly Congestion-driven Steiner tree construction [Pan & Chu,
ICCAD-06] Much faster because of much less reliance on maze routing
Negotiated Congestion by PathFinder [FPGA-95] Used by BoxRouter [ICCAD-07], FGA [ICCAD-07], Archer [ICCAD-
07] Excellent routing ability Very slow because it takes a long time to build congestion
history
Wanted: Techniques that are both fast and high quality
22
What Should We Do Next? Integration of global routing into placement
An initial attempt: IPR [DAC-07] Integration of FastPlace, FastDP, FLUTE and FastRoute Significantly improves routability & wirelength in good
runtime Incorporate buffering and gate sizing into integrated
placement & routing Much more accurate timing information Should also help congestion and placement density control
Integration with logic synthesis
In other words, we need: Better basic algorithms – placement, Steiner tree, global routing,
buffering, gate sizing, etc. Clever ways of integration
It is a (EDA) family problem. Let’s work together!
Thank You