interconnect driver design for long wires in fpgas edmund lee university of british columbia...

Post on 19-Jan-2018

218 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

DESCRIPTION

3 Outline □ Motivation and Background □ Problem Description and Goals □ Driver Design Approaches ■ Method 1: Elmore-based ■ Method 2: SPICE-based □ CAD Modeling, VPR Results □ Summary

TRANSCRIPT

Interconnect Driver Design for Long Wires in FPGAs

Edmund LeeUniversity of British ColumbiaElectrical & Computer Engineering

MASc Thesis Presentation

2

Contributions□ First attempt to combine repeater

insertion with FPGA interconnect design□ Produced 3 interconnect driver design

methodology for FPGAs■Lumped driver design■Distributed design

• Elmore-based• HSpice-based

□Quantified significance of Early Turn Modeling and Fast Paths

□ Paper submitted to FPT 2006

3

Outline□Motivation and Background□ Problem Description and Goals□Driver Design Approaches

■Method 1: Elmore-based■Method 2: SPICE-based

□CAD Modeling, VPR Results□Summary

4

Motivation□Deep submicron interconnect delay is

increasing□ Interconnect delay is a large

component of FPGA delay□Only part of a wire is used in FPGAs

■Critical sink locations unknown■Improve all midpoint delays

Driver

MU

X Sink 5

Sink1

Sink2

Sink3

Sink4

Early Sinks

MU

X

Sink1

Sink2

Sink3

Sink4

5

Problem Description

Given: Wire RC, total wire lengthFind: Buffer sizes, buffer locations, # of buffers

Del

ay

Distance Travelled Along Wire

PDP forLumped

driver design

PDP forDistributed

driver design

LumpedDriver

DistributedDriver

0% 50% 100%30%

Improvedmidpoint

delay

6

BackgroundCLB CLB CLB CLBCLB

SCSC SCSC C

C C C C

CLB CLB CLB CLBCLB

SCSC SCSC C

C C C C

CLB CLB CLB CLBCLB

SCSC SCSC C

C C C C

CLB CLB CLB CLBCLB

SCSC SCSC C

C C C C

VerticalChannel

HorizontalChannel

Length 2Tracks/Wires

S BLOCK

HorizontalTrack Driver

MU

X

Track Drivers

Driver

MU

X

CLB CLB CLB CLB

S BLOCK

fast

MUX

S BLOCK

fast

MUX

FPGA Interconnect Driver

mux

MU

X

VerticalChannel

VerticalChannel

HorizontalChannel

VerticalChannel

HorizontalChannel

Early

Turn

Interconnect

Fast Interconnect

Fast

MUX

7

Method 1: Elmore-based Design□ Provide circuit design solution□ Elmore delay model

□ Multidimensional sweep■determine optimal wirelengths and

buffersizes■ Fix B1 to minimum size

L3L2L1

L1 L2 L3

B2 B3

L

B1

driversize sweep

wire distribution sweep

3 stage distributed

design

8

Elmore-based Design Results

100%

100%

50% 50%

* Buffer 1 is fixed to minimum size

1mm

2mm

4mm

8mm

45% 55%

Wirelength Optimal buffer configuration

9

Elmore-based Design Results□ Results

■Distributed buffering is best with wires > 2mm

■ For all wirelengths, L1 = 0■Delay is tolerant to shifts in buffer

placement□ Limitations

■Complexity related to number of stages■RC based Elmore approach

• Difficult to model multiplexer circuits• Accuracy (delay and determining sizes)

10

Method 2: Spice-Based Design

B0

MU

X L1L0

L0 L1

L1

L1

Number of stages N

FPGA Mux stage Distributed Stages

B1 B1 B1

BB B

multiplexed (mux)

distributed (distrib)

Designs with best delay/mm

Characterization: design(wirelength) buffersizes and delays

Divide, characterize and combine…

11

Buffer-Wire Pre-Characterization

10 20 30 40 50 60 700

100

200

300

400

500

600

700

800

0.1mm

0.5mm1.0mm

1.5mm

2.0mm

2.5mm

3.0mm

3.5mm

4.0mm

Delay vs. Buffersize for Wirelengths (mm):

Buffersize

Del

ay (

ps)

B

BuffersizeWirelength

BB

0 2 4 6 8 100

10

20

30

40

50

60

Wirelengths (mm)

Buf

fers

izes

(x m

inim

um)

Best Buffersize Data

dmux Mindelay buffersizesddistrib 10%-buffersizesddistrib Mindelay buffersizes

Distributed (distrib)

Multiplexed (mux)

12

Delay Concatenation

□Sum delays of each stage together■Fast to compute■Accurate (within 4% of HSPICE)

□Calculation can be embedded into VPR

B0

MU

X L1L0

L0 L1

L1

L1

Number of stages N

FPGA Mux stage Distributed Stages

B1 B1 B1

Mux stage delay

Distributed stage delay+ x (N-1)Delay =

13

L0-Sweep□Remaining Unknown: L0 and L1□ Length = L0 + L1*(N – 1)□Sweep L0 for a fixed N

14

L0-Sweep

0 0.5 1 1.5 2 2.5 3 3.5 4600

650

700

750

800

850

900

950

L0 length (mm)

Tota

l Del

ay (p

s)Delays for various sizes of the first segment (4mm wire)

N=2 (174ps/mm)N=3 (162ps/mm)N=4 (157ps/mm)

2 stage (N=2)

Mux DistL0 L1

16

Spice-Based Design Conclusions□Distributed designs improve over

lumped designs on wires longer than 2mm+

□ Longer wires achieve faster delay/mm■In an FPGA Multiplexing Interval

Multiplexing Interval

CLB CLB CLB CLB

S BLOCKfast

MUX

S BLOCK

fast

MUX

FPGA Interconnect Driver

muxM

UX

VerticalChannel

VerticalChannel

HorizontalChannel

VerticalChannel

HorizontalChannel

Early

Turn

17

Multiplexing IntervalBest Muxing Interval for FPGA Interconnect Wires

(one 2:1 multiplexer every X mm)

0

50

100

150

200

250

300

0 1 2 3 4 5 6 7 8 9 10Multiplexing Interval (mm)

Bes

t Del

ay (p

s / m

m)

1x/1x/180nm

ASIC delay

18

What about Early Turns?□ Path Delay Profiles show potential

improvement of the proposed circuit designs

0 0.5 1 1.5 250

100

150

200

250

300

350

400

450

location along wire (mm)

dela

y (p

s)

Path delay Profile for 2mm buffered wire

N=2N=3N=4Lumped

0 0.5 1 1.5 2 2.5 3100

150

200

250

300

350

400

450

500

550

location along wire (mm)

dela

y (p

s)

Path delay Profile for 3mm buffered wire

N=2N=3N=4Lumped

Lumped driver

19

VPR Modifications□Assess the benefits of distributed

buffering design on FPGAs□Early Turn Model

■Can compute a path delay profile for VPR

□ Fast Path modeling

Driver

MU

X Driver

MU

X

CLB1

CLB2

CLBN-1

CLBN

Early Turns Normal Turns

Straight Thruor Fast Path

20

3mm

0

10

20

30

40

50

60

70

80

90

alu4ap

ex2ap

ex4

bigke

yclm

ade

sdif

feq dsip

elliptic

ex10

10ex

5p frisc

misex3 pd

cs2

98

s384

17

s385

84.1 seq

spla

tseng

Criti

cal P

ath

(ns)

FPT04LumpedETMDistrib4Distrib4 +Fast

VPR Results

MCNC Benchmarks

Prior FPT04 Design

Lumped driver

Lumped + ETM

Distributed

Distributed + Fast

22

Summary□Developed interconnect driver design

methodology for FPGAs■Identified that longer wires can

improve delay efficiency in FPGAs□Results from VPR

■Early turn modeling (5-10%)■Distributed buffers (2-3%)■Fast path (4-9%)

23

Contributions□ First attempt to combine repeater

insertion with FPGA interconnect design□ Produced 3 interconnect driver design

methodology for FPGAs■Lumped driver design■Distributed design

• Elmore-based• HSpice-based

□Quantified significance of Early Turn Modeling and Fast Paths

□ Paper submitted to FPT 2006

top related