topology-aware buffer insertion and gpu-based massively parallel rerouting for eco timing...
TRANSCRIPT
Topology-Aware Buffer Insertion and GPU-Based Massively Parallel
Rerouting for ECO Timing Optimization
Yen-Hung Lin, Yun-Jian Lo, Hian-Syun Tong, Wen-Hao Liu, Yih-Lang Li
Department of Computer Science, NCTU
ASPDAC 2012
Outline
• Introduction• Preliminaries• Problem formulation• Proposed algorithms• Experimental results• Conclusions
Introduction
• Precise timing information for critical paths/sinks with delay violations is only available after P&R stage.– Re-design is time consuming.
• Engineering change orders (ECO) can be used to fix timing violations after P&R.– Using spare cells with re-routing.
Introduction (cont.)• Conventional timing ECO algorithms focus
on improving the delay of one timing path at a time.– [3] considered one two-pin net in the timing path
but neglected the multi-pin net topology when selecting inserted buffers.
– [4] considered the positions of multiple pins of a net but did not consider the net topology of detailed routing paths.
• Only optimize the delay of the critical sink by treating one multi-pin net as one two-pin net may degrade the delays of other sinks of the same net.– Sequentially worsening other timing violation
paths.
The effect of topology
Introduction (cont.)
• Besides, detail routing is time consuming.– Greedily finding the inserted buffer and
connections may falling into suboptimal.– Sequentially investigating each
reconnection to the newly inserted buffer requires unacceptable detailed rerouting runtime.
• Parallel routing could save the runtime.– GPU supports high computing power with
low cost.
Preliminaries
• ECO timing optimization– Inserts one buffer between two gates on
the timing path by breaking the original interconnection and re-wiring the gates to the inserted buffer.
• Estimate delay by Elmore delay model– model
Problem formulation
• Given– A routed design (D), a buffer set (B), a
routed net set (NALL), a routed net (N) belonging to NALL with an edge set (E), a pin set (P), a violation pin set (VP).
• Objective– Inserting one buffer in B into N, such that
the topology of N is changed and the arrival times of the sinks in VP are minimized without the addition of violated sinks.
Topology-Aware Buffer Insertion (BI) & Topology Restructuring
Proposed algorithms
• Buffering Pair scoring (BP).• Edge breaking and Buffer
connection (EB).• Topology Restructuring (TR).• Node Computing-based Massively
Parallel Maze Routing (NCMPMR).
Buffering pair scoring
• We want to disregard those BP that may potentially worsening the delay of some sinks.– In other words, invalid BPs are
ignored.• Then adopts the Elmore delay
model to compute the delay difference for all sinks in VP if a BP is valid.
• The wire length is estimated by the Manhattan distance.
Buffering pair scoring (cont.)
• is the delay difference of VP after one buffering pair is selected for BI.
• is the delay difference of VP after topology restructuring around the inserted buffer.
• Weight of each violation sink• The score of each buffering pair
Edge breaking and buffer connection
Edge breaking and buffer connection (cont.)
• A buffering pair can improve the delays of all sinks in when the following inequality is satisfied.
• A buffering pair can improve the delays of all sinks in when the following inequality is satisfied.
1
2
34
5
1. Changed delay of
2. Delay of 3. Delay of
buffer4. Delay of 5. Previous
delay of
Topology restructuring
• Since the net is partitioned by the inserted buffer, we could change the topology for further delay improvement.
• Two observations– Reconnecting to buff with the shortest
wires could improve the delays of all sinks in .
– Separating into two subnets and reconnecting them to buff with the shortest wires could improve the arrival times of all sinks in .
The overall flow
Node computing-based massively parallel maze routing.
Iteration 0
Routing cost
w[A,0] 0
w[B,0] Inf
w[C,0] Inf
w[D,0] Inf
Iteration 1
Routing cost
w[A,1] 0
w[B,1] 1
w[C,1] 5
w[D,1] Inf
Iteration 2
Routing cost
w[A,2] 0
w[B,2] 1
w[C,2] 5
w[D,2] 3
Iteration 3
Routing cost
w[A,3] 0
w[B,3] 1
w[C,3] 4
w[D,3] 3
NCMPMR flow
Speedup and preventing race condition
• Partition routing graph to blocks due to performance and scalability.
• Stagger adjacent blocks for better performance.– 2.25x faster.
Experimental results
• Environment– AMD Opteron 2.6GHz workstation with
16GB memory.– Intel Xeon E5520 2.26GHz with 8GB
memory and a single NVIDIA Tesla C1060 GPU.
• Implemented in C++.• s35932 in IWLS benchmark with
additional 300 spare cells.• Selects five nets, N1-N5, in s35942
with various degrees of pins to demonstrate.
Critical sink delay improvement
Analysis
• The following results are on platform 2.
Conclusions
• This work develops topology-aware ECO timing optimization algorithm flow.– BP, EB, TR.– GPU based re-routing.
• Improve the WNS and TNS significantly with 7.72x average runtime speedup compared to conventional 2-pin net-based buffer insertion.