revamping electronic design process to embrace interconnect dominance
DESCRIPTION
Revamping Electronic Design Process to Embrace Interconnect Dominance. Chung-Kuan Cheng CSE Department UC San Diego La Jolla, CA 92093-0404 [email protected]. Interconnect Dominance. Goals: Speed, Power, Cost Constraints: Area, Current, Skew Challenges: PVT Variations, Signal Integrity. - PowerPoint PPT PresentationTRANSCRIPT
1
Revamping Electronic Design Process to
Embrace Interconnect Dominance
Chung-Kuan ChengCSE DepartmentUC San DiegoLa Jolla, CA [email protected]
2
Interconnect Dominance
Goals: Speed, Power, Cost
Constraints: Area, Current, Skew
Challenges: PVT Variations, Signal Integrity
0
40
80
120
160
200
180 150 130 100 90 80 70 65 57 50Process Technology Node (nm)
Dela
y (p
s)
1mm Global I nterconnect with Scattering(source: I TRS Roadmap 2004)
FO4 I nverter Delay (Estimated by0.36*Ldraw)
1mm distortionless Transmission Line (Speedof Light)
3
Outlines
• Interconnect Technologies• Buffers, Pitches, Circuit Styles
• Geometrical Planning• Wire Orientations, Chip Shapes
• Interconnect Networks• Topologies, Wire Styles
• Power and Clock Distributions• Functional Modules
• Adders, Shifters
• Conclusion
4
Interconnect Technologies
• RC Wires• Wire Pitch, Width, Separation• Buffer Size, Buffer Interval
• Transmission Line• RLCG
5
Interconnect Technologies
2.5 0.31 0.08 1.31 1.5 1.5 0.13t h h ta
c s s s sC t
e e e es
0.2
0.35 0.651.53 0.98 0.01w sfr
c h hC h
e es
asC w
h
0.05 0.25
1.21.05 0.63 0.0632
t sfrs s h
C s te e
s h h
Coupling capacitance:
Self capacitance:
a fr a frc c s s
w
C C C CC
Total wire capacitance:
t
h
h
w
s
d
p/g plane
wire
w: wire widtht: wire thickness
s: wire spacingh: distance to p/g plane
d: wire pitch
Cc
Cs
Cc: coupling capacitance Cs: self capacitance
Cs
6
Interconnect Technologies
Year (On-Chip) 2005 2010 2015
rncn (ps)T40 0.870 0.400 0.180
rwcw (ps/mm)T80
metal 1
440 1792 5951
interval (um) 136 45.7 16.8
delay (ps/um) 0.120 0.164 0.200
,)1(2
ww
gn
cr
crfInt
SRC Roadmap 2005
wgwntrtr ccrrfllDelay ))1(22(/)(
7
Interconnect Technologies
ndelay bandwidth /bandwidth powerDesign metrics:
ndelay n ndelay power 2n ndelay powerObjective
functions:
For each pitch, For each objective function
Find wire width w, buffer size sinv and buffer interval linv
Calculate the metrics
8
Experimental Results – normalized delay
50100
150200
0
0.5
1
1.5x 10
-6
0
0.5
1
1.5
2x 10
-7
de
lay n(s
/m)
min-d
min-ddp
min-dp
1 2 3 4 5 6 7 8 9 10
x 10-7
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
1.8x 10
-7
pitch(m)
dela
y n(s/m
)
min-dmin-ddpmin-dp180nm130nm100nm70nm
0 0.2 0.4 0.6 0.8 1 1.2
x 10-6
0.4
0.6
0.8
1
1.2
1.4
1.6
1.8
2x 10
-7
pitch(m)
dela
y n(s/m
)
min-dmin-ddpmin-dp
Top left: OverviewTop right: at different pitches (solid lines: min-pitch; dash lines: saturating pitch)
Bottom right: at 70nm technology
9
Interconnect Tech. -- bandwidth
50100
150200
0
0.51
1.5x 10-6
1
2
3
4
5x 10
13
band
wid
th(b
its/s
)
min-d
min-ddp
min-dp
1.5 2 2.5 3 3.5 4 4.5
x 10-7
2.5
3
3.5
4
4.5
5x 10
13
pitch(m)
band
wid
th(b
its/s
)
min-dmin-ddpmin-dp180nm130nm100nm70nm
0 0.2 0.4 0.6 0.8 1 1.2
x 10-6
1
1.5
2
2.5
3
3.5
4
4.5
5x 10
13
pitch(m)
band
wid
th(b
its/s
)min-dmin-ddpmin-dp
Top left: OverviewTop right: at min-pitchesBottom right: at 70nm technology
10
Interconnect Tech. – bandwidth/power
50
100
150
200
00.5
11.5x 10
-6
0
1
2
3
4
5x 10
23
band
wid
th/p
ower
(m/J
s)
min-d
min-ddp
min-dp
1 2 3 4 5 6 7 8 9
x 10-7
0.5
1
1.5
2
2.5
3
3.5
4
4.5
5x 10
23
pitch(m)
ba
nd
wid
th/p
ow
er(
bits
*m/J
s)
min-dmin-ddpmin-dp180nm130nm100nm70nm
0 0.2 0.4 0.6 0.8 1 1.2
x 10-6
0.5
1
1.5
2
2.5
3
3.5
4
4.5
5x 10
23
pitch(m)
band
wid
th/p
ower
(bits
*m/J
s)min-dmin-ddpmin-dp
Top left: OverviewTop right: at different pitches (solid lines: min-pitch; dash lines: optimal pitch)
Bottom right: at 70nm technology
11
Interconnect Tech.: Transmission Line• Speed-of-the-light on-chip communication
• < 1/5 Delay of Traditional Wires
• Low Power Consumption• < 1/5 Power Consumption
• Robust against process variations• Short Latency• Insensitive to Feature Size
12
Interconnect Tech.: Transmission Line
RΔl LΔlGΔl CΔl
RΔl LΔl
GΔl CΔl …
i(z,t)
RΔl LΔl RΔl LΔl
Differential Transmission Line Surfliner
RΔl LΔlCΔl
RΔl LΔl
CΔl …
i(z,t)
RΔl LΔl RΔl LΔl
Serial resistance causes voltage loss.
Speed and attenuation are frequency dependent.
Shunt conductance compensates voltage loss: R/G = L/C.
Flat from DC Mode to Giga Hz
Telegraph Cable: O. Heaviside in 1887.
13
Theory (Telegrapher’s Equation)• Telegrapher’s equation:
),(),(),(
),(),(
),(
tzGVdt
tzdVC
dz
tzdIdt
tzdILtzRI
dz
tzdV
• Propagation Constant:
jCjGLjR ))((
• Wave Propagation: zjzeVzV 0)(
• Alpha and Beta corresponds to speed and phase velocity. Both are frequency dependant
14
Theory (Distortionless Line)
• Set G=RC/L• Frequency Independent speed and attenuation:
LCCLR ,//
• Characteristic impedance: (pure resistive)
CLZ /0 • Phase Velocity (Speed of light in the media)
cLCv /1• Attenuation:
zZ
R
ezA 0)(
15
Digital Signal Response
16
Interconnect Tech.: Transmission Line• Add shunt conductance between differential
wires
• Resistors realized by serpentine unsilicided poly, diffusion resistors, or high resistive metal
17
Geometrical Planning
• Wire Orientations• Manhattan, Hexagonal, Octagonal, Euclidean
• Die Shapes• Rectangle, Diamond, Hexagon, Octagon, Circle
18
Average Radius of Unit-Circle Area
lambda geo.
Shape
Man. Y-Arch X-Arch Euclid.
Square 1.329 1.122 1.070 1.017
Diamond 1.253 1.121 1.070 1.017
Hexagon 1.276 1.100 1.058 1.003
Octagon 1.272 1.104 1.054 1.001
Circle 1.273 1.103 1.055 1.000
19
Throughput : concurrent flow demand
lambda geo.
Shape
Man. Y-Arch X-Arch*
M: Square 1.000 1.225 1.346
M: Diamond 1.195
Y: Hexagon 1.315
X: Octagon* 1.420
*ratio of 0-90 planes and 45-135 planes is not fixed
20
Flow congestion map for uniform 90 Degree meshes
21
12 by 12 13 by 13
Congestion map of square chip using X-architecture
22
12 by 12 13 by 13
Y-architecture + Square Chip
23
Y Architecture + Hexagonal Chip
24
X-Architecture + Octagonal Chip
25
Manhattan Architecture + Diamond Chip
26
Routing Grids
(http://www.xinitiative.org/img/062102forum.pdf)
X-Architecture Y-Architecture
27
Interconnect Networks
• Optimized Interconnect Architecture• Data Bus, Control Signals
• Shared Interconnect• Packet Switching• Circuit Switching• RTL Level Partition
28
Interconnect Networks
Physical Implementation
On-chip transmission line for long distance communication
Well spaced RC wire with buffer insertion for local connections
29
Interconnect Networks
• Obj: Power, Latency
• Constraints:• Routing Area, Bandwidth
• Design Space:• Topology• Wire Styles, Switches
• Model:• Traffic Demand• Data Bus, Control Signals
30
Interconnect Networks: Design Flow
Power & Delay Lib
Topology Lib
Multi-commodity network flow (MCF)
formulation
Power Evaluation(MCF solver)
Latency Aware low power NoC topology
with wire style optimization
31
Power and Latency Tradeoffs for Optimal 8x8 Topologies
48
53
58
63
68
73
2. 3 2. 5 2. 7 2. 9 3. 1 3. 3Average Latency (ns)
Powe
r Co
nsum
ptio
n (W
)
area=3000um area=4000um area=5000umarea=6000um area=7000um area=8000umarea=9000um area=10000um area=11000um
32
Topology Selection (Latency, Power, BW)
48
58
68
78
88
98
2. 3 2. 8 3. 3 3. 8 4. 3
Average Latency (ns)
Powe
r Co
nsum
ptio
n (W
)
topo=opt i mal topo=mesh topo=torus topo=hypercube
33
(a) Optimal topology when area = 3000um
(b) Optimal topology when area = 7000um
(c) Optimal topology when area = 11000um
Topology Selection (Latency, Power, BW)
Optimal 8-node topologies vs area resources
34
Clock Distributions
35
Clock: Linear Variations Model
• Process variation model• Transistor length• Wire width• Linear variation model
• Power variation model• Supply voltage varies randomly (10%)
ykxkdd yx 0
36
Clock: RC Model
Input: an n level meshes and h-treesConstraint: routing area, parameter variationsObjective: skew
37
Simplified Circuit Model
Vs1Rs
C
Vs2Rs
C
R
1
2
u(t)
u(t-T)
38
Skew Expression
110 2ln CRt
)(2
1.
2
21.
1
21
V
VV
V
VVT
Assumptions:
1. T<<RsC
2. Rs /R <<RsC/T
Using first order
Taylor expansion ex=1+x,
)2ln2exp(
:function Skew
R
RTT s
39
Optimal Routing Resources Allocation
level-1
level-2
level-3
level-4
total area
40
Inductance Diminishes Shunt Effects
Vs1 Rs
C
Vs2 Rs
C
R
1
2
u(t)
u(t-T)
L
f(GHz) 0.5 1 1.5 2 3 3.5 4 5
skew(ps) 3.9 4.2 5.8 7.5 9.9 13 17 26
• 0.5um wide 1.2 cm long copper wire
• Input skew 20ps
41
Clock Distributions
Surfliner
42
Functional Modules (Data Path)
• Arithmetic
• Algorithms
• Logic
• Logic Styles
• Placement
43
Cyclic Shifter
S2
S1
S0
D7 D6 D5 D4 D3 D2 D1 D0
Z7 Z6 Z5 Z4 Z3 Z2 Z1 Z0
0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1
0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1
0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1
(7,1) (6,1) (5,1) (4,1) (3,1) (2,1) (1,1) (0,1)
(7,2) (6,2) (5,2) (4,2) (3,2) (2,2) (1,2) (0,2)
(7,3) (6,3) (5,3) (4,3) (3,3) (2,3) (1,3) (0,3)
(7,0) (6,0) (5,0) (4,0) (3,0) (2,0) (1,0) (0,0)
Delay: nlogn, Power: ¼ n2
44
Fan-out Splitting
Z7 Z6 Z5 Z4 Z3 Z2 Z1 Z0
D7 D6 D5 D4 D3 D2 D1 D0
S2
S1
S0 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1
0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1
0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1
(7,1) (6,1) (5,1) (4,1) (3,1) (2,1) (1,1) (0,1)
(7,2) (6,2) (5,2) (4,2) (3,2) (2,2) (1,2) (0,2)
(7,3) (6,3) (5,3) (4,3) (3,3) (2,3) (1,3) (0,3)
(7,0) (6,0) (5,0) (4,0) (3,0) (2,0) (1,0) (0,0)
Delay Comparison
50.7
102.7
218.7
484
45.376
128
224
0
100
200
300
400
500
600
8-bit 16-bit 32-bit 64-bit
Del
ay
MuxShifterDemuxShifter
τ
Impr. Ratio8-bit 10.7%16-bit 26.0%32-bit 41.5%64-bit 53.7%
Power Comparison
150.7484
1561.3
5220
146.9453.7
1397.6
4458.9
0
1000
2000
3000
4000
5000
6000
8-bit 16-bit 32-bit 64-bit
Tot
al S
witc
hed
Cap
acita
nce
MuxShifterDemuxShifter
Cg
Impr. Ratio8-bit 2.5%16-bit 6.3%32-bit 10.5%64-bit 14.6%
Delay: nPower: 3/16 n2
45
Cell Permutation• Fix the input/output stage, permute cells of intermediate
stages to further improve delay.• Formulate as an ILP problem and solve by CPLEX.
• Optimal solution in terms of delay• Delay/Power tradeoff
Additional Delay Reductionby Cell Permutation
13
29
61
125
818
38
78
0
20
40
60
80
100
120
140
8-bit 16-bit 32-bit 64-bit
Del
ay (
in u
nit
of c
olum
ns s
pann
ed)
w/o permuatation
w/ permutation
7 6 5 4 3 2 1 0
6 5 4 3 7 2 1 0
3 4 2 6 7 5 1 0
7 6 5 4 3 2 1 0
>> 1-bit
>> 2-bit
>> 4-bit
0 0 00 0 0 0 0
1 1 01 1 4 0 0 11130000
11141111
4 2 2 2 4 1 4 5 4 3 5 5 1 4 1 3
4 2 4 5 4 5 4 3
8 4 7 5 8 8 4 7 8 4 7 7 4 6 3 8
8 7 8 7 8 7 6 8
Optimal Timing solution for 8-bit
46
Conclusion
• Interconnect Technologies
• Geometrical Planning
• Interconnect Networks
• Power and Clock Distributions
• Functional Modules
47
Interconnect Technologies
mcr
crfl
ww
gn 242)1(2
41gw
wn
cr
crs
Example: w= 85nm, t= 145nm
Optimal interval
mmpsmfsccrrfllDelay wgwntrtr /194/194))1(22(/)(
Optimal buffer size
Optimal delay
rn= 10Kohm,cn=0.25fF,cg=2.34xcn=0.585fFrw=2ohm/um, cw=0.2fF/um
48
Experiments—Optimized Skew
s-mesh(s) m-mesh(s) ratio0.00 2.92E-11 2.92E-11 100.0%0.25 2.79E-11 2.60E-11 93.2%0.40 2.71E-11 2.45E-11 90.4%1.00 2.42E-11 1.98E-11 81.8%3.00 1.70E-11 1.24E-11 73.2%5.00 1.24E-11 8.72E-12 70.5%
skewtotal area
49
Robustness Against Supply Voltage Variations
ave worst ave worst0.00 2.10E-11 2.91E-11 2.10E-11 2.91E-111.00 8.38E-12 1.14E-11 8.26E-12 1.43E-112.00 2.71E-12 4.42E-12 6.18E-12 1.11E-113.00 1.89E-12 3.33E-12 4.83E-12 8.73E-124.00 1.45E-12 2.48E-12 3.88E-12 6.96E-125.00 1.16E-12 2.02E-12 3.18E-12 5.64E-12
total areamutli-level mesh single-level mesh