topology-aware buffer insertion and gpu-based massively parallel rerouting for eco timing...

23
Topology-Aware Buffer Insertion and GPU-Based Massively Parallel Rerouting for ECO Timing Optimization Yen-Hung Lin, Yun-Jian Lo, Hian- Syun Tong, Wen-Hao Liu, Yih-Lang Li Department of Computer Science, NCTU ASPDAC 2012

Upload: brandy-causer

Post on 15-Dec-2015

221 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Topology-Aware Buffer Insertion and GPU-Based Massively Parallel Rerouting for ECO Timing Optimization Yen-Hung Lin, Yun-Jian Lo, Hian-Syun Tong, Wen-Hao

Topology-Aware Buffer Insertion and GPU-Based Massively Parallel

Rerouting for ECO Timing Optimization

Yen-Hung Lin, Yun-Jian Lo, Hian-Syun Tong, Wen-Hao Liu, Yih-Lang Li

Department of Computer Science, NCTU

ASPDAC 2012

Page 2: Topology-Aware Buffer Insertion and GPU-Based Massively Parallel Rerouting for ECO Timing Optimization Yen-Hung Lin, Yun-Jian Lo, Hian-Syun Tong, Wen-Hao

Outline

• Introduction• Preliminaries• Problem formulation• Proposed algorithms• Experimental results• Conclusions

Page 3: Topology-Aware Buffer Insertion and GPU-Based Massively Parallel Rerouting for ECO Timing Optimization Yen-Hung Lin, Yun-Jian Lo, Hian-Syun Tong, Wen-Hao

Introduction

• Precise timing information for critical paths/sinks with delay violations is only available after P&R stage.– Re-design is time consuming.

• Engineering change orders (ECO) can be used to fix timing violations after P&R.– Using spare cells with re-routing.

Page 4: Topology-Aware Buffer Insertion and GPU-Based Massively Parallel Rerouting for ECO Timing Optimization Yen-Hung Lin, Yun-Jian Lo, Hian-Syun Tong, Wen-Hao

Introduction (cont.)• Conventional timing ECO algorithms focus

on improving the delay of one timing path at a time.– [3] considered one two-pin net in the timing path

but neglected the multi-pin net topology when selecting inserted buffers.

– [4] considered the positions of multiple pins of a net but did not consider the net topology of detailed routing paths.

• Only optimize the delay of the critical sink by treating one multi-pin net as one two-pin net may degrade the delays of other sinks of the same net.– Sequentially worsening other timing violation

paths.

Page 5: Topology-Aware Buffer Insertion and GPU-Based Massively Parallel Rerouting for ECO Timing Optimization Yen-Hung Lin, Yun-Jian Lo, Hian-Syun Tong, Wen-Hao

The effect of topology

Page 6: Topology-Aware Buffer Insertion and GPU-Based Massively Parallel Rerouting for ECO Timing Optimization Yen-Hung Lin, Yun-Jian Lo, Hian-Syun Tong, Wen-Hao

Introduction (cont.)

• Besides, detail routing is time consuming.– Greedily finding the inserted buffer and

connections may falling into suboptimal.– Sequentially investigating each

reconnection to the newly inserted buffer requires unacceptable detailed rerouting runtime.

• Parallel routing could save the runtime.– GPU supports high computing power with

low cost.

Page 7: Topology-Aware Buffer Insertion and GPU-Based Massively Parallel Rerouting for ECO Timing Optimization Yen-Hung Lin, Yun-Jian Lo, Hian-Syun Tong, Wen-Hao

Preliminaries

• ECO timing optimization– Inserts one buffer between two gates on

the timing path by breaking the original interconnection and re-wiring the gates to the inserted buffer.

• Estimate delay by Elmore delay model– model

Page 8: Topology-Aware Buffer Insertion and GPU-Based Massively Parallel Rerouting for ECO Timing Optimization Yen-Hung Lin, Yun-Jian Lo, Hian-Syun Tong, Wen-Hao
Page 9: Topology-Aware Buffer Insertion and GPU-Based Massively Parallel Rerouting for ECO Timing Optimization Yen-Hung Lin, Yun-Jian Lo, Hian-Syun Tong, Wen-Hao

Problem formulation

• Given– A routed design (D), a buffer set (B), a

routed net set (NALL), a routed net (N) belonging to NALL with an edge set (E), a pin set (P), a violation pin set (VP).

• Objective– Inserting one buffer in B into N, such that

the topology of N is changed and the arrival times of the sinks in VP are minimized without the addition of violated sinks.

Topology-Aware Buffer Insertion (BI) & Topology Restructuring

Page 10: Topology-Aware Buffer Insertion and GPU-Based Massively Parallel Rerouting for ECO Timing Optimization Yen-Hung Lin, Yun-Jian Lo, Hian-Syun Tong, Wen-Hao

Proposed algorithms

• Buffering Pair scoring (BP).• Edge breaking and Buffer

connection (EB).• Topology Restructuring (TR).• Node Computing-based Massively

Parallel Maze Routing (NCMPMR).

Page 11: Topology-Aware Buffer Insertion and GPU-Based Massively Parallel Rerouting for ECO Timing Optimization Yen-Hung Lin, Yun-Jian Lo, Hian-Syun Tong, Wen-Hao

Buffering pair scoring

• We want to disregard those BP that may potentially worsening the delay of some sinks.– In other words, invalid BPs are

ignored.• Then adopts the Elmore delay

model to compute the delay difference for all sinks in VP if a BP is valid.

• The wire length is estimated by the Manhattan distance.

Page 12: Topology-Aware Buffer Insertion and GPU-Based Massively Parallel Rerouting for ECO Timing Optimization Yen-Hung Lin, Yun-Jian Lo, Hian-Syun Tong, Wen-Hao

Buffering pair scoring (cont.)

• is the delay difference of VP after one buffering pair is selected for BI.

• is the delay difference of VP after topology restructuring around the inserted buffer.

• Weight of each violation sink• The score of each buffering pair

Page 13: Topology-Aware Buffer Insertion and GPU-Based Massively Parallel Rerouting for ECO Timing Optimization Yen-Hung Lin, Yun-Jian Lo, Hian-Syun Tong, Wen-Hao

Edge breaking and buffer connection

Page 14: Topology-Aware Buffer Insertion and GPU-Based Massively Parallel Rerouting for ECO Timing Optimization Yen-Hung Lin, Yun-Jian Lo, Hian-Syun Tong, Wen-Hao

Edge breaking and buffer connection (cont.)

• A buffering pair can improve the delays of all sinks in when the following inequality is satisfied.

• A buffering pair can improve the delays of all sinks in when the following inequality is satisfied.

1

2

34

5

1. Changed delay of

2. Delay of 3. Delay of

buffer4. Delay of 5. Previous

delay of

Page 15: Topology-Aware Buffer Insertion and GPU-Based Massively Parallel Rerouting for ECO Timing Optimization Yen-Hung Lin, Yun-Jian Lo, Hian-Syun Tong, Wen-Hao

Topology restructuring

• Since the net is partitioned by the inserted buffer, we could change the topology for further delay improvement.

• Two observations– Reconnecting to buff with the shortest

wires could improve the delays of all sinks in .

– Separating into two subnets and reconnecting them to buff with the shortest wires could improve the arrival times of all sinks in .

Page 16: Topology-Aware Buffer Insertion and GPU-Based Massively Parallel Rerouting for ECO Timing Optimization Yen-Hung Lin, Yun-Jian Lo, Hian-Syun Tong, Wen-Hao

The overall flow

Page 17: Topology-Aware Buffer Insertion and GPU-Based Massively Parallel Rerouting for ECO Timing Optimization Yen-Hung Lin, Yun-Jian Lo, Hian-Syun Tong, Wen-Hao

Node computing-based massively parallel maze routing.

Iteration 0

Routing cost

w[A,0] 0

w[B,0] Inf

w[C,0] Inf

w[D,0] Inf

Iteration 1

Routing cost

w[A,1] 0

w[B,1] 1

w[C,1] 5

w[D,1] Inf

Iteration 2

Routing cost

w[A,2] 0

w[B,2] 1

w[C,2] 5

w[D,2] 3

Iteration 3

Routing cost

w[A,3] 0

w[B,3] 1

w[C,3] 4

w[D,3] 3

Page 18: Topology-Aware Buffer Insertion and GPU-Based Massively Parallel Rerouting for ECO Timing Optimization Yen-Hung Lin, Yun-Jian Lo, Hian-Syun Tong, Wen-Hao

NCMPMR flow

Page 19: Topology-Aware Buffer Insertion and GPU-Based Massively Parallel Rerouting for ECO Timing Optimization Yen-Hung Lin, Yun-Jian Lo, Hian-Syun Tong, Wen-Hao

Speedup and preventing race condition

• Partition routing graph to blocks due to performance and scalability.

• Stagger adjacent blocks for better performance.– 2.25x faster.

Page 20: Topology-Aware Buffer Insertion and GPU-Based Massively Parallel Rerouting for ECO Timing Optimization Yen-Hung Lin, Yun-Jian Lo, Hian-Syun Tong, Wen-Hao

Experimental results

• Environment– AMD Opteron 2.6GHz workstation with

16GB memory.– Intel Xeon E5520 2.26GHz with 8GB

memory and a single NVIDIA Tesla C1060 GPU.

• Implemented in C++.• s35932 in IWLS benchmark with

additional 300 spare cells.• Selects five nets, N1-N5, in s35942

with various degrees of pins to demonstrate.

Page 21: Topology-Aware Buffer Insertion and GPU-Based Massively Parallel Rerouting for ECO Timing Optimization Yen-Hung Lin, Yun-Jian Lo, Hian-Syun Tong, Wen-Hao

Critical sink delay improvement

Page 22: Topology-Aware Buffer Insertion and GPU-Based Massively Parallel Rerouting for ECO Timing Optimization Yen-Hung Lin, Yun-Jian Lo, Hian-Syun Tong, Wen-Hao

Analysis

• The following results are on platform 2.

Page 23: Topology-Aware Buffer Insertion and GPU-Based Massively Parallel Rerouting for ECO Timing Optimization Yen-Hung Lin, Yun-Jian Lo, Hian-Syun Tong, Wen-Hao

Conclusions

• This work develops topology-aware ECO timing optimization algorithm flow.– BP, EB, TR.– GPU based re-routing.

• Improve the WNS and TNS significantly with 7.72x average runtime speedup compared to conventional 2-pin net-based buffer insertion.