cse241 l3 asics.1kahng & cichy, ucsd ©2003 cse241 vlsi digital circuits winter 2003 lecture 07:...
Post on 20-Dec-2015
218 views
TRANSCRIPT
CSE241 L3 ASICs.1 Kahng & Cichy, UCSD ©2003
CSE241VLSI Digital Circuits
Winter 2003
Lecture 07: Timing II
CSE241 L3 ASICs.2 Kahng & Cichy, UCSD ©2003
Delay Calculation
Cap\Tr 0.05 0.2 0.5
0.01 0.02 0.16 0.30
0.5 0.04 0.32 0.60
2.0 0.08 0.64 1.20
Cap\Tr 0.05 0.2 0.5
0.01 0.03 0.18 0.33
0.5 0.06 0.36 0.66
2.0 0.09 0.72 1.32
Cell Fall
Cell Rise
1.0pf
0.1ns
0.12ns
Fall delay = 0.178nsRise delay = 0.261nsFall transition = 0.147nsRise transition = …
0.178
0.261
Cap\Tr 0.05 0.2 0.5
0.01 0.01 0.09 0.15
0.5 0.03 0.27 0.45
2.0 0.06 0.54 0.90
Fall Transition
0.147
0.147ns
CSE241 L3 ASICs.3 Kahng & Cichy, UCSD ©2003
PVT (Process, Voltage, Temperature) Derating
Actual cell delay = Original delay x KPVT
CSE241 L3 ASICs.4 Kahng & Cichy, UCSD ©2003
PVT Derating: Example + Min/Typ/Max Triples
Proc_var (0.5:1.0:1.3)Voltage (5.5:5.0:4.5)Temperature (0:20:50)KP = 0.80 : 1.00 : 1.30KV = 0.93 : 1.00 : 1.08KT = 0.80 : 1.07 : 1.35
KPVT = 0.60 : 1.07 : 1.90
Cell delay = 0.261nsDerated delay = 0.157 : 0.279 : 0.496 {min : typical : max}
CSE241 L3 ASICs.5 Kahng & Cichy, UCSD ©2003
Conservatism of Gate Delay Modeling
True gate delay depends on input arrival time patterns
STA will assume that only 1 input is switching Will use worst slope among several inputs
Time
A B Ftpd
Time
A Ftpd
Vdd
Vdd
DA
B
F
CLD
A
B
F
CL
CSE241 L3 ASICs.6 Kahng & Cichy, UCSD ©2003
This Class + Logistics
Reading Smith, Chapters 15, 16 http://vlsicad.ucsd.edu/Presentations/ICCAD00TUTORIAL/ Possibly: Sarrafzadeh/Wong Chapters 2 - placement, 3 - routing,
(4 – performance modeling)
Schedule- MT will be take-home (and, easy), BUT you lose 5% if you
don’t show up on Thursday (attendance will be taken by Ben)
- Thursday: Surprise guest lecturer on floorplan / placement
HW #12: Suppose that you want to work on timing edges that are most critical according to some F(slack of the edge, #paths through the edge). How would you modify the STA calculation (longest path in a DAG) so that it also calculates the number of paths through each edge?
Slide courtesy of S. P. Levitan, U. Pittsburg
CSE241 L3 ASICs.7 Kahng & Cichy, UCSD ©2003
Buffer Clustering
Sylvester / Shepard, 2001
Hierarchical clustering connecting clock source (= root) to clock sinks (= leaves) of clustering tree
Fanout at each level between 5 and 200 (depends on buffer library)
Often specify a clock topology in the tool as, e.g., (1)-6-8-5 root has 6 children, each of which has 8 children, each of which has 5 (leaf) children 240 clock sinks
Big question: how to perform the hierarchical buffer clustering? What makes a “good” cluster?
CSE241 L3 ASICs.8 Kahng & Cichy, UCSD ©2003
Buffer Clustering by Space Partitioning
Sylvester / Shepard, 2001
Example: Cadence CT-Gen
Pick fanout (e.g., 6-4)
Pick “long axis” of bounding box of sinks
Place buffers at medians (essentially) of chunks of sinks identified by space-partitioning
Why is this good? Uses (or assumes) min wire; easily routed (Steiner routing; robust to ECOs; …
Why is it bad? Oversizes drivers; commits to skew which could be avoided
CSE241 L3 ASICs.9 Kahng & Cichy, UCSD ©2003
Buffer Clustering by Traditional Clustering
Sylvester / Shepard, 2001
Example: SPC, old Cell3 CTS
Pick fanout (e.g., 6)
Find clusters of size 6
Place buffers at centers or centroids or … of clusters
Recurse
Why is this good? Can get near-zero skew trees?
Why is this bad? ECOs; hard to route; more wire(?); difficult algorithms!
HW #13: Propose a hierarchical clustering strategy for buffered clock trees, and explain its pros and cons
CSE241 L3 ASICs.17 Kahng & Cichy, UCSD ©2003
Outline
Clocking
Storage elements
Clocking metrics and methodology
Clock distribution
Package and useful-skew degrees of freedom
Clock power issues
Gate timing models
CSE241 L3 ASICs.18 Kahng & Cichy, UCSD ©2003
Skew Reduction Using Package
• Most clock network latency occurs at global level (largest distances spanned)
• Latency Skew
• With reverse scaling, routing low-RC signals at global level becomes more difficult & area-consuming
Sylvester / Shepard, 2001
CSE241 L3 ASICs.19 Kahng & Cichy, UCSD ©2003
System clock
P/ASIC Solder bump
substrate
Incorporate global clock distribution into the package
Flip-chip packaging allows for high density, low parasitic access from substrate to IC
• RC of package-level wiring up to 4 orders of magnitude smaller than on-chip wiring
• Global skew reduced
• Lower capacitance lower power
• Opens up global routing tracks
• Results not yet conclusive
Skew Reduction Using Package
Sylvester / Shepard, 2001
CSE241 L3 ASICs.20 Kahng & Cichy, UCSD ©2003
Useful Skew (= cycle-stealing)
FF fast FF FFslow
Zero skew
hold setup hold setup
Timing Slacks
FF fast FF FFslow
Useful skew
hold setup hold setup
Useful skew
• Local skew constraints
• Shift slack to critical paths
Zero skew
• Global skew constraint
• All skew is badW. Dai, UC Santa Cruz
CSE241 L3 ASICs.21 Kahng & Cichy, UCSD ©2003
Skew = Local Constraint
D : longest pathd : shortest path
FF FF
safe
Skew
race condition cycle time violation
-d + thold Tperiod - D - tsetup< <
permissible range
Timing is correct as long as the signal arrives in the permissible skew range
W. Dai, UC Santa Cruz
CSE241 L3 ASICs.22 Kahng & Cichy, UCSD ©2003
Skew Scheduling for Design Robustness
“0 0 0”: at verge of violation
FF FF FF2 ns 6 ns
T = 6 ns
“2 0 2”: more safety margin4 0
-22
4 0
Design will be more robust if clock signal arrival time is in the middle of permissible skew range, rather than on edge
Can solve a linear program to maximize robustness = determine prescribed sink skews
W. Dai, UC Santa Cruz
CSE241 L3 ASICs.23 Kahng & Cichy, UCSD ©2003
Potential Advantages of Useful Skew
CLK
0-skew
CLK
U-skew
Reduce peak current consumption by distributing the FF switch point in the range of permissible skew
Affords extra margin to increase clock frequency or reduce sizing (= power)
W. Dai, UC Santa Cruz
CSE241 L3 ASICs.24 Kahng & Cichy, UCSD ©2003
Conventional Zero-Skew Flow
PlacementPlacement
SynthesisSynthesis
Extraction & Delay CalculationExtraction & Delay Calculation
Static Timing AnalysisStatic Timing Analysis
0-Skew Clock Synthesis0-Skew Clock Synthesis
Clock RoutingClock Routing
Signal RoutingSignal Routing
W. Dai, UC Santa Cruz
CSE241 L3 ASICs.25 Kahng & Cichy, UCSD ©2003
Useful-Skew Flow
Existing PlacementExisting Placement
Extraction & Delay CalculationExtraction & Delay Calculation
Static Timing AnalysisStatic Timing Analysis
U-Skew Clock SynthesisU-Skew Clock Synthesis
Clock RoutingClock Routing
Signal RoutingSignal Routing
Permissible range generationPermissible range generation
Initial skew schedulingInitial skew scheduling
Clock tree topology synthesisClock tree topology synthesis
Clock net routingClock net routing
Clock timing verificationClock timing verification
W. Dai, UC Santa Cruz
CSE241 L3 ASICs.26 Kahng & Cichy, UCSD ©2003
Outline
Clocking
Storage elements
Clocking metrics and methodology
Clock distribution
Package and used-skew degrees of freedom
Clock power issues
Gate timing models
CSE241 L3 ASICs.27 Kahng & Cichy, UCSD ©2003
Power consumption in clocks due to: Clock drivers Long interconnections Large clock loads – all clocked elements (latches, FF’s) are driven
Different components dominate Depending on type of clock network used Ex. Grid – huge pre-drivers & wire cap. drown out load cap.
Clock Power
Sylvester / Shepard, 2001
CSE241 L3 ASICs.28 Kahng & Cichy, UCSD ©2003
Clock Power Is LARGE
Not only is the clock capacitance large, it switches every cycle!
P = C Vdd2 f
Sylvester / Shepard, 2001
CSE241 L3 ASICs.29 Kahng & Cichy, UCSD ©2003
Low-Power Clocking
Gated clocksGated clocks Prevent switching in areas of chip not being usedPrevent switching in areas of chip not being used Easier in static designsEasier in static designs
Edge-triggered flops in ARM rather than transparent latches Edge-triggered flops in ARM rather than transparent latches in Alphain Alpha Reduced load on clock for each latch/flopReduced load on clock for each latch/flop Eliminated spurious power-consuming transitions during latch flow-Eliminated spurious power-consuming transitions during latch flow-
through (transparency)through (transparency)
Sylvester / Shepard, 2001
CSE241 L3 ASICs.30 Kahng & Cichy, UCSD ©2003
Clock Area
Clock networks consume silicon area (clock drivers, PLL, etc.) and routing area
Routing area is most vital
Top-level metals are used to reduce RC delays These levels are precious resources (unscaled) Power routing, clock routing, key global signals
Reducing area also reduces wiring capacitance and power
Typical #’s: Intel Itanium – 4% of M4/5 used in clock routing
Sylvester / Shepard, 2001
CSE241 L3 ASICs.31 Kahng & Cichy, UCSD ©2003
Clock Slew Rates
To maintain signal integrity and latch performance, minimum slew rates are required
Too slow – clock is more susceptible to noise, latches are slowed down, setup times eat into timing budget [Tsetup = 200 + 0.33 * Tslew (ps)], more short-circuit power for large clock drivers
Too fast – burns too much power, overdesigned network, enhanced ground bounce
Rule-of-thumb: Trise and Tfall of clock are each between 10-20% of clock period (10% - aggressive target)
1 GHz clock; Trise = Tfall = 100-200ps
Sylvester / Shepard, 2001
CSE241 L3 ASICs.32 Kahng & Cichy, UCSD ©2003
Example: Alpha 21264
Grid + H-tree approach
Power = 32% of total
Wire usage = 3% of metals 3 & 4
4 major clock quadrants, each with a large driver connected to local grid structures
Sylvester / Shepard, 2001
CSE241 L3 ASICs.33 Kahng & Cichy, UCSD ©2003
Alpha 21264 Skew Map
Ref: Compaq, ASP-DAC00Sylvester / Shepard, 2001
CSE241 L3 ASICs.34 Kahng & Cichy, UCSD ©2003
Power vs. Skew
Fundamental design decision Meeting skew requirements is easy with unlimited
power budget Wide wires reduce RC product but increase total C Driver upsizing reduces latency ( reduces skew as well)
but increases buffer cap SOC context: plastic package power limit is 2-3 W
Sylvester / Shepard, 2001
CSE241 L3 ASICs.35 Kahng & Cichy, UCSD ©2003
Clock Distribution Trends
Timing Clock period dropping fast, skew must follow Slew rates must also scale with cycle time Jitter – PLL’s get better with CMOS scaling but other sources of noise
increase- Power supply noise more important
- Switching-dependent temperature gradients
Materials Cu reduces RC slew degradation, potential skew Low-k decreases power, improves latency, skew, slews
Power Complexity, dynamic logic, pipelining more clock sinks Larger chips bigger clock networksSylvester / Shepard, 2001