interconnect optimizations
DESCRIPTION
Interconnect Optimizations. G. S. D. w. S. h. l. h s. l s. S s. w s. A scaling primer. Ideal process scaling: Device geometries shrink by S ( = 0.7x) Device delay shrinks by s Wire geometries shrink by s R/ m : r /(ws.hs) = r/s 2 Cc/ m : (hs). e /(Ss) = Cc C/ m : similar - PowerPoint PPT PresentationTRANSCRIPT
Interconnect Optimizations
A scaling primer
• Ideal process scaling:– Device geometries shrink by S= 0.7x)
• Device delay shrinks by s
– Wire geometries shrink by • R/ : /(ws.hs) = r/s2
• Cc/ : (hs)./(Ss) = Cc• C/: similar
• R/ doubles, C/ and Cc/ unchanged
SS
GG
DD
h
w
l
S
l
h
Sw
Interconnect role
• Short (local) interconnect– Used to connect nearby cells– Minimize wire C, i.e., use short min-width wires
• Medium to long-distance (global) interconnect– Size wires to tradeoff area vs. delay– Increasing width Capacitance increases, Resistance
decreases Need to find acceptable tradeoff - wire sizing problem
• “Fat” wires– Thicker cross-sections in higher metal layers– Useful for reducing delays for global wires– Inductance issues, sharing of limited resource
Cross-Section of A Chip
Block scaling
• Block area often stays same – # cells, # nets doubles
– Wiring histogram shape invariant
• Global interconnect lengths don’t shrink• Local interconnect lengths shrink by s
Interconnect delay scaling• Delay of a wire of length l :
int = (rl)(cl) = rcl2 (first order)
• Local interconnects : int : (r/s2)(c)(ls)2 = rcl2
– Local interconnect delay unchanged (compare to faster devices)
• Global interconnects : int : (r/s2)(c)(l)2 = (rcl2)/s2
– Global interconnect delay doubles – unsustainable!
• Interconnect delay increasingly more dominant
Buffer Insertion For Delay Reduction
Analysis of Simple RC Circuit
)()()(
)())(()(
)()()(
tvtvdt
tdvRC
dt
tdvC
dt
tCvdti
tvtvtiR
T
T
state variable
Inputwaveform
± v(t)CR
vT(t)
i(t)
Analysis of Simple RC Circuit
Step-input response:
match initial state:
output response for step-input:
v0
v0u(t)
v0(1-e-t/RC)u(t)
)()()(
0 tuvtvdt
tdvRC
)()( 0 tuvKetv RCt
)()1()( 0 tuevtv RCt
0)( 0)0( 0 tuvKv
Delays of Simple RC Circuit
• v(t) = v0(1 - e-t/RC) -- waveform
under step input v0u(t)
• v(t)=0.5v0 t = 0.69RC
– i.e., delay = 0.69RC (50% delay)
v(t)=0.1v0 t = 0.1RC
v(t)=0.9v0 t = 2.3RC
– i.e., rise time = 2.2RC (if defined as time from 10% to 90% of Vdd)
• Commonly used metric TD = RC (= Elmore delay)
Elmore Delay
Delay
Elmore Delay
• Driver is modeled as R• Driver intrinsic gate delay t(B)
• Delay = all Ri all Cj downstream from Ri Ri*Cj
• Elmore delay at n2 R(B)*(C1+C2)+R(w)*C2• Elmore delay at n1 R(B)*(C1+C2)
R(B)C1 R(w) C2
n1
B
n2
Elmore Delay
• For uniform wire
• No matter how to lump, the Elmore delay is the same
x
C
unit wire capacitance c
unit wire resistance r
Delay for Buffer
v
C
u
C(b)
u
Intrinsic buffer delay
Driver resistanceInput capacitance
R
Buffers Reduce Wire Delay
x/2
cx/4 cx/4rx/2
t_unbuf = R( cx + C ) + rx( cx/2 + C )
t_buf = 2R( cx/2 + C ) + rx( cx/4 + C ) + tb
t_buf – t_unbuf = RC + tb – rcx2/4
x/2
cx/4 cx/4rx/2
C
C R
x
∆t
Combinational Logic Delay
Combinational logic delay <= clock period
Combinational Logic
Register
Primary Input
Register
Primary Outputclock
Buffered global interconnects: Intuition
Interconnect delay = r.c.l2
Now, interconnect delay = r.c.li2 < r.c.l2 (where l = lj )
since (lj 2) < (lj )2
(Of course, account for buffer delay also)
l1 lnl3l2
l
Optimal inter-buffer length
• First order (lumped parasitic, Elmore delay) analysis
• Assume N identical buffers with equal inter-buffer length l (L = Nl)
• For minimum delay,
gddg
ggd
CRl
cRrCrclL
clCrlclCRNT
12/
2/
0dldT
02 2
opt
gd
l
CRrcL
rc
CRl gdopt
2
L
Rd – On resistance of inverterCg – Gate input capacitancer,c – Resistance, cap. per micron
… …
l
Optimal interconnect delay
• Substituting lopt back into the interconnect delay expression:
rc
CR
CRcRrC
rc
CRrcL
CRl
cRrCrclLT
gd
gddg
gd
gdopt
dgoptopt
2
2
1
cRrCrcCRLT dggdopt 2
Delay grows linearly with L (instead of quadratically)
Optimized interconnect delay scaling
• Rewriting the optimal interconnect delay expression,
• With optimally sized buffers (using dT/dh = 0),
cRrCrcCRLT dggdopt 2
rcCRcRrC gddg
,, 00 hcCh
rR gd
c
h
rhrcrccrLTopt
00002
ecapacitancunit and resistance drivingunit buffer , 00 cr
0
020
0
rc
crh
h
crrc
rcCRLT gdopt 4
Total buffer count
• Ever-increasing fractions of total cell count will be buffers– 70% in 32nm
0
10
20
30
40
50
60
70
80
90nm 65nm 45nm 32nm
% c
ells
use
d t
o b
uff
er n
ets
clk-buf
buf
tot-buf
Source: ITRS, 2003Source: ITRS, 20030.1
1
10
100
250 180 130 90 65 45 32
Feature size (nm)Relative
delay
Gate delay (fanout 4)Local interconnect (M1,2)Global interconnect with repeatersGlobal interconnect without repeaters
ITRS projections