topology-aware buffer insertion and gpu-based massively parallel rerouting for eco timing...

Topology-Aware Buffer Insertion and GPU-Based Massively Parallel

Rerouting for ECO Timing Optimization

Yen-Hung Lin, Yun-Jian Lo, Hian-Syun Tong, Wen-Hao Liu, Yih-Lang Li

Department of Computer Science, NCTU

ASPDAC 2012

Outline

• Introduction• Preliminaries• Problem formulation• Proposed algorithms• Experimental results• Conclusions

Introduction

• Precise timing information for critical paths/sinks with delay violations is only available after P&R stage.– Re-design is time consuming.

• Engineering change orders (ECO) can be used to fix timing violations after P&R.– Using spare cells with re-routing.

Introduction (cont.)• Conventional timing ECO algorithms focus

on improving the delay of one timing path at a time.– [3] considered one two-pin net in the timing path

but neglected the multi-pin net topology when selecting inserted buffers.

– [4] considered the positions of multiple pins of a net but did not consider the net topology of detailed routing paths.

• Only optimize the delay of the critical sink by treating one multi-pin net as one two-pin net may degrade the delays of other sinks of the same net.– Sequentially worsening other timing violation

paths.

The effect of topology

Introduction (cont.)

• Besides, detail routing is time consuming.– Greedily finding the inserted buffer and

connections may falling into suboptimal.– Sequentially investigating each

reconnection to the newly inserted buffer requires unacceptable detailed rerouting runtime.

• Parallel routing could save the runtime.– GPU supports high computing power with

low cost.

Preliminaries

• ECO timing optimization– Inserts one buffer between two gates on

the timing path by breaking the original interconnection and re-wiring the gates to the inserted buffer.

• Estimate delay by Elmore delay model– model

Problem formulation

• Given– A routed design (D), a buffer set (B), a

routed net set (NALL), a routed net (N) belonging to NALL with an edge set (E), a pin set (P), a violation pin set (VP).

• Objective– Inserting one buffer in B into N, such that

the topology of N is changed and the arrival times of the sinks in VP are minimized without the addition of violated sinks.

Topology-Aware Buffer Insertion (BI) & Topology Restructuring

Proposed algorithms

• Buffering Pair scoring (BP).• Edge breaking and Buffer

connection (EB).• Topology Restructuring (TR).• Node Computing-based Massively

Parallel Maze Routing (NCMPMR).

Buffering pair scoring

• We want to disregard those BP that may potentially worsening the delay of some sinks.– In other words, invalid BPs are

ignored.• Then adopts the Elmore delay

model to compute the delay difference for all sinks in VP if a BP is valid.

• The wire length is estimated by the Manhattan distance.

Buffering pair scoring (cont.)

• is the delay difference of VP after one buffering pair is selected for BI.

• is the delay difference of VP after topology restructuring around the inserted buffer.

• Weight of each violation sink• The score of each buffering pair

Edge breaking and buffer connection

Edge breaking and buffer connection (cont.)

• A buffering pair can improve the delays of all sinks in when the following inequality is satisfied.

• A buffering pair can improve the delays of all sinks in when the following inequality is satisfied.

1

2

34

5

1. Changed delay of

2. Delay of 3. Delay of

buffer4. Delay of 5. Previous

delay of

Topology restructuring

• Since the net is partitioned by the inserted buffer, we could change the topology for further delay improvement.

• Two observations– Reconnecting to buff with the shortest

wires could improve the delays of all sinks in .

– Separating into two subnets and reconnecting them to buff with the shortest wires could improve the arrival times of all sinks in .

The overall flow

Node computing-based massively parallel maze routing.

Iteration 0

Routing cost

w[A,0] 0

w[B,0] Inf

w[C,0] Inf

w[D,0] Inf

Iteration 1

Routing cost

w[A,1] 0

w[B,1] 1

w[C,1] 5

w[D,1] Inf

Iteration 2

Routing cost

w[A,2] 0

w[B,2] 1

w[C,2] 5

w[D,2] 3

Iteration 3

Routing cost

w[A,3] 0

w[B,3] 1

w[C,3] 4

w[D,3] 3

NCMPMR flow

Speedup and preventing race condition

• Partition routing graph to blocks due to performance and scalability.

• Stagger adjacent blocks for better performance.– 2.25x faster.

Experimental results

• Environment– AMD Opteron 2.6GHz workstation with

16GB memory.– Intel Xeon E5520 2.26GHz with 8GB

memory and a single NVIDIA Tesla C1060 GPU.

• Implemented in C++.• s35932 in IWLS benchmark with

additional 300 spare cells.• Selects five nets, N1-N5, in s35942

with various degrees of pins to demonstrate.

Critical sink delay improvement

Analysis

• The following results are on platform 2.

Conclusions

• This work develops topology-aware ECO timing optimization algorithm flow.– BP, EB, TR.– GPU based re-routing.

• Improve the WNS and TNS significantly with 7.72x average runtime speedup compared to conventional 2-pin net-based buffer insertion.

topology-aware buffer insertion and gpu-based massively parallel rerouting for eco timing...

Documents