ece260b – cse241a winter 2005 timing analysis and correction
Post on 21-Jan-2016
45 Views
Preview:
DESCRIPTION
TRANSCRIPT
ECE 260B – CSE 241A Timing Analysis & Correction 1 http://vlsicad.ucsd.edu
ECE260B – CSE241AWinter 2005
Timing Analysis and Correction
Website: http://vlsicad.ucsd.edu/courses/ece260b-w05
ECE 260B – CSE 241A Timing Analysis & Correction 2 http://vlsicad.ucsd.edu
Timing Analysis Testing
Simulation Device modeling (BSIM) Transistor-level time domain analysis (SPICE) Frequency domain interconnect analysis (AWE,
PRIMA)
Static timing analysis Transistor-level (PathMill) Gate-level (PrimeTime)
ECE 260B – CSE 241A Timing Analysis & Correction 3 http://vlsicad.ucsd.edu
State is stored in registers (flip-flops or latches)Combinational logic computes next-state, outputs
from present-state, inputs
Sequential Machine
clk
Combinationallogic
clk
Combinationallogic
clk
Combinationallogic
Courtesy K. Keutzer et al. UCB
ECE 260B – CSE 241A Timing Analysis & Correction 4 http://vlsicad.ucsd.edu
Why Clocks?
Clocks provide the means to synchronize By allowing events to happen at known timing boundaries, we
can sequence these events
Greatly simplifies building of state machines
No need to worry about variable delay through combinational logic (CL)
All signals delayed until clock edge (clock imposes the worst case delay)
CombLogic
register
CombLogic
register
register
DataflowFSM
Courtesy K. Yang, UCLA
ECE 260B – CSE 241A Timing Analysis & Correction 5 http://vlsicad.ucsd.edu
Clock Cycle Time
Cycle time is determined by the delay through the CL Signal must arrive before the latching edge If too late, it waits until the next cycle
- Synchronization and sequential order becomes incorrect
Constraint: Tcycle > Tprop_delay_through_CL + Toverhead
Example: 3.0 GHz Pentium-4 Tcycle = 333ps
Can change circuit architecture to obtain smaller Tcycle
Courtesy K. Yang, UCLA
ECE 260B – CSE 241A Timing Analysis & Correction 6 http://vlsicad.ucsd.edu
Pipelining For dataflow:
Instead of a long critical path, split the critical path into chunks Insert registers to store intermediate results This allows 2 waves of data to coexist within the CL
Can we extend this ad infinitum? Overhead eventually limits the pipelining
- E.g., 1.5 to 2 gate delays for latch or FF Granularity limits as well
- Minimum time quantum: delay of a gate
register
register
register
register
register
tpd tpd1 tpd2
Tcycle > Tpd + ToverheadTcycle > max(tpd1, tpd2) + Toverhead
CL
A+B
CL
A+B
CL
A
CL
A
CL
B
CL
B
Courtesy K. Yang, UCLA
ECE 260B – CSE 241A Timing Analysis & Correction 7 http://vlsicad.ucsd.edu
Intel MPU FO4 INV Delays Per Clock Period
0.00
20.00
40.00
60.00
80.00
100.00
120.00
1982 1987 1993 1998 2004
Year
Nu
mb
er
of
FO
4 in
ve
rte
r d
ela
ys
386
486 DX2 DX4
Pentium
Pentium MMX
Pentium Pro
Pentium II
Celeron
Pentium III
Pentium 4
FO4 INV = inverter driving 4 identical inverters (no interconnect)
Half of frequency improvement has been from reduced logic stages, i.e., pipelining
ECE 260B – CSE 241A Timing Analysis & Correction 8 http://vlsicad.ucsd.edu
Let’s Revisit Cycle Time and Path Delay
Cycle time (T) cannot be smaller than longest path delay (Tmax)
Longest (critical) path delay is a function of:
Total gate, wire delays logic levels
clock
Q1 Q2
Tclock1 Tclock2
critical path, ~5 logic levels
Tclock1
data
cycle time
maxT T
Courtesy K. Keutzer et al. UCB
ECE 260B – CSE 241A Timing Analysis & Correction 9 http://vlsicad.ucsd.edu
Cycle Time - Setup Time
For FFs to correctly capture data, must be stable for:
Setup time (Tsetup) before clock arrives
clock
Q1 Q2
Tclock1 Tclock2
critical path, ~5 logic levels
Tclock1
data
setup time
max setupT T T
Courtesy K. Keutzer et al. UCB
ECE 260B – CSE 241A Timing Analysis & Correction 10 http://vlsicad.ucsd.edu
Cycle Time – Clock Skew
clock
Q1 Q2
Tclock1 Tclock2
Tclock1
Tclock2
Q2
data
clock skewQ2
10
If clock network has unbalanced delay – clock skew
Cycle time is also a function of clock skew (Tskew) max setup skewT T T T
critical path, ~5 logic levels
Courtesy K. Keutzer et al. UCB
ECE 260B – CSE 241A Timing Analysis & Correction 11 http://vlsicad.ucsd.edu
Cycle Time – Flip-Flop Delay (Clock to Q)
Cycle time is also a function of propagation delay of FF (Tclk-to-Q or Tc2q)
Tc2q : time from arrival of clock signal till change at FF output)
clock
Q1 Q2
Tclock1 Tclock2
Tclock1
Tclock2
Q2clock-to-Q
data
Q2
max setup skew clk to QT T T T T
critical path, ~5 logic levels
Courtesy K. Keutzer et al. UCB
ECE 260B – CSE 241A Timing Analysis & Correction 12 http://vlsicad.ucsd.edu
Min Path Delay - Hold Time
For FFs to correctly latch data, data must be stable during:
Hold time (Thold) after clock arrives
Determined by delay of shortest path in circuit (Tmin) and clock skew (Tskew)
clock
Q1 Q2
Tclock1 Tclock2
short path, ~3 logic levels
Tclock1
data
hold time
min hold skewT T T
Courtesy K. Keutzer et al. UCB
ECE 260B – CSE 241A Timing Analysis & Correction 13 http://vlsicad.ucsd.edu
Setup, Hold, Cycle Times
set-up time – D stablebefore clock
cycle time
Example of a single phase clock
hold time –D stable after clock
When signalmay change
Courtesy K. Keutzer et al. UCB
ECE 260B – CSE 241A Timing Analysis & Correction 14 http://vlsicad.ucsd.edu
Timing Constraints for Edge-Triggered FFs
Max(Tpd) < Tcycle – Tsetup – Tc2q – Tskew
Delay is too long for data to be captured
Min(Tpd) > Thold-Tc2q+Tskew
Delay is too short and data can race through, skipping a state
FlipF
lop
Tcycle
Comb
Logic
Comb
Logic
Courtesy K. Yang, UCLA
ECE 260B – CSE 241A Timing Analysis & Correction 15 http://vlsicad.ucsd.edu
Example of Tpdmax Violation
Suppose there is skew between the registers in a dataflow (regA after regB)
“i” gets its input values from regA at transition in Ck’
CL output “o” arrives after Ck transition due to skew
To correct this problem, can increase cycle time
i
o
regA
regB
Tpdmax
Ck’ Ck
Ck
Ck’
i o
Tskew
Too late!
Tpdmax
Comb
Logic
Comb
Logic
Courtesy K. Yang, UCLA
ECE 260B – CSE 241A Timing Analysis & Correction 16 http://vlsicad.ucsd.edu
Example of Tpdmin Violation: Race Through Suppose clock skew causes regA to be clocked before regB
“i” passes through the CL with little delay (tpdmin)
“o” arrives before the rising Ck’ causes the data to be latched
Cannot be fixed by changing frequency have rock instead of chip
i
oregA
regB
Tpdmin
Ck Ck’
Ck
Ck’
i o
Tskew
Too early!
Tpdmin
Comb
Logic
Comb
Logic
Courtesy K. Yang, UCLA
ECE 260B – CSE 241A Timing Analysis & Correction 17 http://vlsicad.ucsd.edu
Synchronous design = combinational logic + sequential elements
For each flip-flop:
Tmax+ Tsetup < Tcycle - Tskew
Tmin > Thold + Tskew
Summary: Timing Constraints
CLK
Thold Tsetup
Tcycle
DATA
CLK
DQ combinationallogic
FF FF
Tmax : longest data propagation path delay
Tmin : shortest data propagation path delay
ECE 260B – CSE 241A Timing Analysis & Correction 18 http://vlsicad.ucsd.edu
Partition the design
Clock network
Clock definition
Derived clock
Clock groups
Clock delay (skew) calculation
Timing constraints exist between clocks with a common divisor frequency
Data paths with timing constraints
Clock Identification
CLK1
DQ combinationallogic
FF FF
CLK2
/8 divider
CLK4
CLK3
ECE 260B – CSE 241A Timing Analysis & Correction 19 http://vlsicad.ucsd.edu
Data paths with timing constraints Starting from primary inputs/FF outputs Ending at primary outputs/FF inputs
Represented by a labeled directed graph G = <V,E> Timing node V ~ pin/primary input/output Timing edge E ~ gate/wire delay (Timing arc ~ gate delay)
Timing Graph
C
B
F
X
Y
U
0
0
1
A
.15
.20
.20
1
2
2
2
Z
V
2
A
C
B2
2
10
1
0
.20
.20X
YZ
U
V.15
F
Courtesy K. Keutzer et al. UCB
ECE 260B – CSE 241A Timing Analysis & Correction 20 http://vlsicad.ucsd.edu
Static analysis = vector-less worst case analysis Graph based path propagation No logics
Pre-characterized look-up tables for gate delays Min/max/rise/fall
Characterized interconnect delays On-the-fly delay calculation SDF (standard delay format) annotation
Characterization
X
Y
2
2
Z2
X
YZ
ECE 260B – CSE 241A Timing Analysis & Correction 21 http://vlsicad.ucsd.edu
Compute Longest Path
Compute longest path in a DAG G = <V,E,delay,Origin>
// delay is set of labels, Origin is the super-source of the DAG
Forward-prop(W){
for each vertex v in W
for each edge <v,w> from v
Final-delay(w) = max(Final-delay(w), delay(v) + delay(w) + delay(<v,w>))
if all incoming edges of w have been traversed, add w to W
}
Longest path(G){
Forward_prop(Origin) }
Origin
(Kirkpatrick 1966, IBM JRD)
Courtesy K. Keutzer et al. UCB
C
B
F
X
Y
U
0
0
1
A
.15
.20
.20
1
2
2
2
Z
V
2
ECE 260B – CSE 241A Timing Analysis & Correction 22 http://vlsicad.ucsd.edu
Compute Longest Path
Compute longest path in a DAG G = <V,E,delay,Origin>
// delay is set of labels, Origin is the super-source of the DAG
Forward-prop(W){
for each vertex v in W
for each edge <v,w> from v
Final-delay(w) = max(Final-delay(w), delay(v) + delay(w) + delay(<v,w>))
if all incoming edges of w have been traversed, add w to W
}
Longest path(G){
Forward_prop(Origin) }
Origin
(Kirkpatrick 1966, IBM JRD)
Courtesy K. Keutzer et al. UCB
C
B
F
X
Y
U
0
0
1
A
.15
.20
.20
1
2
2
2
Z
V
2
Dynamic programming
How to exclude a set of paths?
ECE 260B – CSE 241A Timing Analysis & Correction 23 http://vlsicad.ucsd.edu
Timing Analysis Terminology Actual arrival time (AAT): forward propagation
Required arrival time (RAT): backward propagation
Slack = RAT - AAT A measure of how much timing margin exists at each node Slack < 0 timing violation Can optimize a particular branch Can trade slack for power, area, robustness
Critical path
clock
ECE 260B – CSE 241A Timing Analysis & Correction 24 http://vlsicad.ucsd.edu
Static Timing Analysis Flow Read in
design (LEF/DEF) timing library (.lib) timing constraints (GCF) delay annotation (SDF)
Set up constraints Annotated delays IO path constraints Single cycle setup/hold
checks Timing exceptions
- False paths- Multi-cycle paths- Max delay constraints- Min delay constraints
Construct timing graph Partition clock domain
(form path groups) Ideal/propagated clock Case analysis
AAT propagation Levelization
Timing report End points with violations Path enumeration
ECE 260B – CSE 241A Timing Analysis & Correction 25 http://vlsicad.ucsd.edu
Timing Exceptions
False paths: topologically connected but logically impossible to enable
To enable a path Logically: non-controlling values (e.g., 0 for OR gates, 1 for AND
gates) at side inputs Temporally: earlier signal transitions at side inputs
clock
ECE 260B – CSE 241A Timing Analysis & Correction 26 http://vlsicad.ucsd.edu
False Path Representation
Abstracted graph
Set_false_path -from {…} –through {…} … -through {…} –to {…}
from to
through through
from tothrough through
ECE 260B – CSE 241A Timing Analysis & Correction 27 http://vlsicad.ucsd.edu
False Path Identification Tagged timing analysis
Arrival times with the same tag are compared to find worst case False path filtered
from tothrough through
clock
a
btag: 2
ctag: 3
d
b
ac
d
arr: 3tag: 3
arr: 2tag: 2
arr: 1tag: 0
ECE 260B – CSE 241A Timing Analysis & Correction 28 http://vlsicad.ucsd.edu
Handling Latch-Based Designs Latch: level enabling sequential element
Transparent signal propagation
CLK
Tborrow
transparent
DATA
CLK
Dcombinationallogic
Latch
combinationallogic
Q
Time borrowing Path delay of previous stage
– Tborrow
Path delay of current stage + Tborrow
ECE 260B – CSE 241A Timing Analysis & Correction 29 http://vlsicad.ucsd.edu
Counting Process Variation Off-chip variation: two paths on a chip cannot use two
different operating conditions (i.e., corners) at the same time for setup or hold analysis
Launchclock_latepath (max) + data_latepath (max) < captureclock_earlypath (max) + clock_period – setup
Launchclock_earlypath (min) + data_earlypath (min) > captureclock_latepath (min) + hold
On-chip variation: the software calculates the delay for one path based on maximum operating condition while calculating the delay for another path based on minimum operating condition for setup or hold checks
Statistical static timing analysis (SSTA) Continuous pdf (probability distribution functions) Or discrete corners
ECE 260B – CSE 241A Timing Analysis & Correction 30 http://vlsicad.ucsd.edu
Clock Re-convergence Pessimism Removal Common part of two clock propagation paths cannot
have two different path delays at the same time
Need to compute clock propagation delay from the branch point
min
CLK
DQ combinationallogic
FF FF
max
max
Common part
ECE 260B – CSE 241A Timing Analysis & Correction 31 http://vlsicad.ucsd.edu
Outline
Timing Analysis Timing Requirements Static Timing Analysis
Timing Correction
ECE 260B – CSE 241A Timing Analysis & Correction 32 http://vlsicad.ucsd.edu
Timing Correction
Driven by STA “Incremental performance analysis backplane”
Two goals Fix logic design rule violations Fix timing problems
DAC-2002, Physical Chip Implementation
ECE 260B – CSE 241A Timing Analysis & Correction 33 http://vlsicad.ucsd.edu
Logic Design Rules Constraints of
Fanout Slew rate Load cap
Reduce timing look-up table extrapolation error
Control signal integrity Transition degradation Crosstalk noise Supply voltage drop Device reliability
Approaches Resizing Buffering Cloning (copying cells)
ECE 260B – CSE 241A Timing Analysis & Correction 34 http://vlsicad.ucsd.edu
Timing Correction Approaches
Re-synthesis Local synthesis transforms
Timing-driven placement Critical net weighting
Timing-driven routing Net ordering Buffering Topology optimization
Post-route optimization (IPO) Re-routing Re-timing and useful clock skew Sizing Buffering
DAC-2002, Physical Chip Implementation
ECE 260B – CSE 241A Timing Analysis & Correction 35 http://vlsicad.ucsd.edu
Local Synthesis Transforms
Resize cells
Buffer or clone to reduce load on critical nets
Decompose large cells
Swap connections on commutative pins or among equivalent nets
Move critical signals forward
Pad early paths
Area recovery
DAC-2002, Physical Chip Implementation
ECE 260B – CSE 241A Timing Analysis & Correction 36 http://vlsicad.ucsd.edu
Transform Example
Delay = 4
…..
Double Inverter
Removal
…..
…..
Delay = 2
DAC-2002, Physical Chip Implementation
ECE 260B – CSE 241A Timing Analysis & Correction 37 http://vlsicad.ucsd.edu
Resizing
00.010.020.030.040.05
0 0.2 0.4 0.6 0.8 1
loadd
A B C
b
ad
e
f
0.2
0.2
0.3
?
b
aA
0.035
b
aC
0.026
DAC-2002, Physical Chip Implementation
ECE 260B – CSE 241A Timing Analysis & Correction 38 http://vlsicad.ucsd.edu
Cloning
00.010.020.030.040.05
0 0.2 0.4 0.6 0.8 1
load
d
A B C
b
a
d
e
f
g
h
0.2
0.2
0.2
0.2
0.2
?
b
a
d
e
f
g
h
A
B
DAC-2002, Physical Chip Implementation
ECE 260B – CSE 241A Timing Analysis & Correction 39 http://vlsicad.ucsd.edu
Buffering
00.010.020.030.040.05
0 0.2 0.4 0.6 0.8 1
load
d
A B C
b
a
d
e
f
g
h
0.2
0.2
0.2
0.2
0.2
? b
a
d
e
f
g
h0.1
0.2
0.2
0.2
0.2
B
B
0.2
DAC-2002, Physical Chip Implementation
ECE 260B – CSE 241A Timing Analysis & Correction 40 http://vlsicad.ucsd.edu
Redesign Fan-in Tree
a
c
d
b eArr(b)=3
Arr(c)=1
Arr(d)=0
Arr(a)=4
Arr(e)=61
1
1
c
d
e
Arr(e)=5
1
1b1
a
DAC-2002, Physical Chip Implementation
ECE 260B – CSE 241A Timing Analysis & Correction 41 http://vlsicad.ucsd.edu
Redesign Fan-out Tree
1
1
1
3
1
1
1
Longest Path = 5
1
1
1
3
1
2
Longest Path = 4Slowdown of buffer due to load
DAC-2002, Physical Chip Implementation
ECE 260B – CSE 241A Timing Analysis & Correction 42 http://vlsicad.ucsd.edu
Decomposition
DAC-2002, Physical Chip Implementation
ECE 260B – CSE 241A Timing Analysis & Correction 43 http://vlsicad.ucsd.edu
Swap Commutative Pins
2
c
ab
2
1
0 1
1
1
3
a
cb
2
1
0
1
1
2
1 5
Simple sorting on arrival times and delay works
DAC-2002, Physical Chip Implementation
ECE 260B – CSE 241A Timing Analysis & Correction 44 http://vlsicad.ucsd.edu
Logic Restructuring 1
• Nodes in critical section that fan out outside of critical section are duplicated
h
f
a
b
c
d
e
Late input signals
f
Collapsed node
b
c d
a e
e
h
Slides courtesy of Keutzer
ECE 260B – CSE 241A Timing Analysis & Correction 45 http://vlsicad.ucsd.edu
Logic Restructuring 2
f
Collapsed node
b c d
a e
Collapse critical section
Re-extract factor k
divisor
f
kd
cclose tooutput
b
a e
Place timing-critical nodes closer to output Make them pass through fewer gates After collapse, a divisor is selected such that substituting k into f places critical
signal c and d closer to output
Slides courtesy of Keutzer
ECE 260B – CSE 241A Timing Analysis & Correction 46 http://vlsicad.ucsd.edu
Summary of Local Synthesis Transforms
Variety of methods for delay optimization No single technique dominates
The one with more tricks wins? No!
Technology dependant (for gate delay) Differ with cell libraries
Methodology dependant (for wire delay) Need to predict placement and routing result Uncertainty!
Pros: large potential improvement
Cons: less predictable, more expensive
ECE 260B – CSE 241A Timing Analysis & Correction 47 http://vlsicad.ucsd.edu
Summary of Local Synthesis Transforms
Work smoothly in a physical synthesis flow Tight integration with placement and routing
Need a good framework for evaluating and processing different transforms
Accurate, fast timing engine with incremental analysis capability- don’t want to retime the whole design for each local transform
Simultaneous min and max delay analysis- How does fixing the setup violation affect the existing hold checks?
ECE 260B – CSE 241A Timing Analysis & Correction 48 http://vlsicad.ucsd.edu
Timing Correction Approaches
Re-Synthesis Local Transformation
Timing-Driven Placement
Timing-Driven Routing
Post-Route Optimization (IPO) Re-Routing Re-Timing and Useful Clock Skew Sizing Buffering
ECE 260B – CSE 241A Timing Analysis & Correction 49 http://vlsicad.ucsd.edu
Reducing Crosstalk Effect
Shielding Effective for short range capacitive coupling Not for long range inductive coupling
Net ordering (wire swizzling)
ECE 260B – CSE 241A Timing Analysis & Correction 50 http://vlsicad.ucsd.edu
Reducing Crosstalk Effect Shielding
Net ordering
Gate sizing A strong driver is less sensitive to crosstalk But more likely to project crosstalk to its neighbors
ECE 260B – CSE 241A Timing Analysis & Correction 51 http://vlsicad.ucsd.edu
Reducing Crosstalk Effect Shielding
Net ordering
Gate sizing
Buffering Partition interconnects Mutual canceling:
ECE 260B – CSE 241A Timing Analysis & Correction 52 http://vlsicad.ucsd.edu
Timing Correction Approaches
Re-Synthesis Local Transformation
Timing-Driven Placement
Timing-Driven Routing
Post-Route Optimization (IPO) Re-Routing Re-Timing and Useful Clock Skew Sizing Buffering
ECE 260B – CSE 241A Timing Analysis & Correction 53 http://vlsicad.ucsd.edu
Re-Timing
How would you meet the 10ns clock cycle time?
clock
FF
D Q
FF
D Q
FF
D Q
6 4 2 4 4
Cycle = 10
ECE 260B – CSE 241A Timing Analysis & Correction 54 http://vlsicad.ucsd.edu
Re-Timing
Re-order sequential elements and combinational logic
Did you see a problem here?
clock
FF
D Q
FF
D Q
FF
D Q
6 4 2 4 4
Cycle = 10
clock
FF
D Q
FF
D Q
FF
D Q
6 4 2 4 4
Cycle = 10
ECE 260B – CSE 241A Timing Analysis & Correction 55 http://vlsicad.ucsd.edu
Re-Timing
Re-order sequential elements and combinational logic
Need to predict placement and routing
clock
FF
D Q
FF
D Q
FF
D Q
6 4 2 4 4
Cycle = 10
clock
FF
D Q
FF
D Q
FF
D Q
6 4 2 4 4
Cycle = 10
ECE 260B – CSE 241A Timing Analysis & Correction 56 http://vlsicad.ucsd.edu
Useful Clock Skew
Equivalent to re-timing
Clock tree re-construction Insert delay cells Snaking Add dummy capacitive load
clock
FF
D Q
FF
D Q
FF
D Q
6 4 2 4 4
Cycle = 10
+2
ECE 260B – CSE 241A Timing Analysis & Correction 57 http://vlsicad.ucsd.edu
Timing Correction Approaches
Re-Synthesis Local Transformation
Timing-Driven Placement
Timing-Driven Routing
Post-Route Optimization (IPO) Re-Routing Re-Timing and Useful Clock Skew Sizing Buffering
ECE 260B – CSE 241A Timing Analysis & Correction 58 http://vlsicad.ucsd.edu
Driving Large Capacitances: Inverter As Buffer
In
A
1 U
U*A
CL = X * CinCin
•Slide courtesy of Mary Jane Irwin, PSU
Total propagation delay = tp(inv) + tp(buffer)
tp0 = delay of min-size inverter with single min-size inverter as fanout load
Minimize tp = U * tp0 + X/U * tp0
Uopt = sqrt(X) ; tp,opt = 2 tp0 * sqrt(X)
Use only if combined delay is less than unbuffered case
ECE 260B – CSE 241A Timing Analysis & Correction 59 http://vlsicad.ucsd.edu
Delay Reduction With Cascaded Buffers
CL = xCin = uN Cin
in out
CLCin C1 C2
1 u u2 uN-1
•Slide courtesy of Mary Jane Irwin, PSU
Cascade of buffers with increasing sizes (U = tapering factor) can reduce delay
If load is driven by a large transistor (which is driven by a smaller transistor) then its turn-on time dominates overall delay
Each buffer charges the input capacitance of the next buffer in the chain and speeds up charging, reducing total delay
Cascaded buffers are useful when Rint < Rtr
ECE 260B – CSE 241A Timing Analysis & Correction 60 http://vlsicad.ucsd.edu
tp as Function of U and X
1.0 3.0 5.0 7.0u
0.0
20.0
40.0
60.0u
/ln
(u)
x=10
x=100
x=1000
x=10,000
•Slide courtesy of Mary Jane Irwin, PSU
Total line delay as function of driver size, load capacitance
Question: Derive the optimum (min-delay) value of U.
ECE 260B – CSE 241A Timing Analysis & Correction 61 http://vlsicad.ucsd.edu
Reducing RC Delay With Repeaters
RC delay is quadratic in length must reduce length
Observation: 22 = 4 and 1+1 = 2 but 12 + 12 = 2
driver receiver
driver
receiver
L = 2 units
Repeater = strong driver (usually inverter or pair of inverters for non-inversion) that is placed along a long RC line to “break up” the line and reduce delay
ECE 260B – CSE 241A Timing Analysis & Correction 62 http://vlsicad.ucsd.edu
Repeaters vs. Cascaded Buffers
Repeaters are used to drive long RC lines Breaking up the quadratic dependence of delay on line length is
the goal Typically sized identically
Cascaded buffers are used to drive large capacitive loads, where there is no parasitic resistance
We put all buffers at the beginning of the load This would be pointless for a long RC wire since the wire RC
delay would be unaffected and would dominate the total delay
Optimum buffering for an uniform long interconnect Cascaded buffers at source and sink Identical sized and spaced repeaters in between
ECE 260B – CSE 241A Timing Analysis & Correction 63 http://vlsicad.ucsd.edu
Buffering a Tree for Timing Optimization
Van Ginneken’s dynamic programming
Bottom-up traversal Evaluate each sub-tree by a triple
<delay, cap, cost> Filter out sub-optimal solutions
Limitations Buffer insertion locations (explored by
edge segmenting) Buffer insertion constraints (e.g., legal
buffer locations) Routing detour Delay calculation accuracy (wire
delay, slew rate, etc.)
<delay, cap, cost>
<delay, cap, cost>
ECE 260B – CSE 241A Timing Analysis & Correction 64 http://vlsicad.ucsd.edu
Buffering a Tree for Load Cap Constraints
Greedy for a single line Bottom-up traversal Insert a buffer when load cap reaches
limit
Greedy for a fixed routing tree Bottom-up traversal For each edge, greedy insertion For each node, buffer the branch with
the largest cap
NP-hard for simultaneous buffering and routing construction
C1 C2 C3 C4
C1 < U, C2 < U, C3 < U, C4 < UC1 + C2 + C3 + C4 > U
ECE 260B – CSE 241A Timing Analysis & Correction 65 http://vlsicad.ucsd.edu
Timing-Driven Routing Tree Construction
Minimum wirelength (Steiner Minimum Tree)
Given a set of terminals S
Find an additional set of points A such that a spanning tree T over S A has minimum wirelength
May not be timing optimum Some sinks are more timing
critical than others Some sinks have larger
capacitive load Buffers?
S
T
ECE 260B – CSE 241A Timing Analysis & Correction 66 http://vlsicad.ucsd.edu
Timing-Driven Routing Tree Construction
Minimum wirelength (Steiner Minimum Tree)
Shortest Path Tree
AHHK Tree Cost(q) = k * path_length(p)
+ edge_length(p, q) k = 0 minimum wirelength k = 1 shortest path
Heuristics with sink timing criticality weights
S
T
ECE 260B – CSE 241A Timing Analysis & Correction 67 http://vlsicad.ucsd.edu
Timing-Driven Routing Tree Construction
Simultaneous routing tree construction and buffer insertion
Buffer station (legal buffer locations) Routing blockage
Dynamic programming P-Tree
Clustering (C-Tree) Timing criticality Geometric distance Signal polarity Try AHHK with different k
ECE 260B – CSE 241A Timing Analysis & Correction 68 http://vlsicad.ucsd.edu
Timing-Driven Routing Tree Topology Optimization
Chicken-egg dilemma (delay vs. routing)
Iterative greedy improvement (Q-Tree)
S
T
Delta Elmore delay
Buffer location
top related