cse241 vlsi digital circuits winter 2003 lecture 06:...

91
CSE241 L3 ASICs.1 Kahng & Cichy, UCSD ©2003 CSE241 VLSI Digital Circuits Winter 2003 Lecture 06: Timing

Upload: phamkhanh

Post on 06-Feb-2018

220 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: CSE241 VLSI Digital Circuits Winter 2003 Lecture 06: …vlsicad.ucsd.edu/courses/cse241a/web/lec_notes/Lec6_final.pdf · CSE241 VLSI Digital Circuits Winter 2003 Lecture 06: ... on

CSE241 L3 ASICs.1 Kahng & Cichy, UCSD ©2003

CSE241VLSI Digital Circuits

Winter 2003

Lecture 06: Timing

Page 2: CSE241 VLSI Digital Circuits Winter 2003 Lecture 06: …vlsicad.ucsd.edu/courses/cse241a/web/lec_notes/Lec6_final.pdf · CSE241 VLSI Digital Circuits Winter 2003 Lecture 06: ... on

CSE241 L3 ASICs.2 Kahng & Cichy, UCSD ©2003

This Class + Logistics

TimingFlip-flop timing

Clock distribution

Clock tree synthesis

Reading: White papers on static timing analysis, papers on clock tree synthesis

Lab #2 due date: Monday January 27th

Slide courtesy of S. P. Levitan, U. Pittsburg

Page 3: CSE241 VLSI Digital Circuits Winter 2003 Lecture 06: …vlsicad.ucsd.edu/courses/cse241a/web/lec_notes/Lec6_final.pdf · CSE241 VLSI Digital Circuits Winter 2003 Lecture 06: ... on

CSE241 L3 ASICs.3 Kahng & Cichy, UCSD ©2003

Review

Static timing analysis (Lecture 4)Pin-based timing graph

Directed acyclic graph (DAG) of timing arcs

Longest path in DAG time linear in #arcs (edges)

Slack = required arrival time – actual arrival time (long path analysis)

Logic synthesis (Lecture 5)

Slide courtesy of S. P. Levitan, U. Pittsburg

Page 4: CSE241 VLSI Digital Circuits Winter 2003 Lecture 06: …vlsicad.ucsd.edu/courses/cse241a/web/lec_notes/Lec6_final.pdf · CSE241 VLSI Digital Circuits Winter 2003 Lecture 06: ... on

CSE241 L3 ASICs.4 Kahng & Cichy, UCSD ©2003

Static Analysis vs. Dynamic Analysis

c=0 c=1b=0 a-z delay1 a-z delay2 b=1 a-z delay3 a-z delay4

a

b

c

z

Why static analysis when dynamic simulation is more accurate?

Drawbacks of simulationRequires input vectors (stimuli for circuit)

Long runtimes

Example: calculate worst-case rising delay from a to zExponential explosion with number of possible design input states

Page 5: CSE241 VLSI Digital Circuits Winter 2003 Lecture 06: …vlsicad.ucsd.edu/courses/cse241a/web/lec_notes/Lec6_final.pdf · CSE241 VLSI Digital Circuits Winter 2003 Lecture 06: ... on

CSE241 L3 ASICs.5 Kahng & Cichy, UCSD ©2003

90

10Time

Vdd50

STA Terminology(Actual) arrival time (AAT, or AT) = time at which a pin switches state

Usually 50% point on voltage curve, i.e., AT = t50

Slew time = time over which signal switchesUsually difference between 10% and 90% on voltage curve, i.e., tslew = t90 – t10

Required arrival time (RAT) = time at which a signal must arrive in order to avoid a chip fail

Slack = RAT – AATPositive slack good (= margin), negative slack bad

Page 6: CSE241 VLSI Digital Circuits Winter 2003 Lecture 06: …vlsicad.ucsd.edu/courses/cse241a/web/lec_notes/Lec6_final.pdf · CSE241 VLSI Digital Circuits Winter 2003 Lecture 06: ... on

CSE241 L3 ASICs.6 Kahng & Cichy, UCSD ©2003

d=2

d=1

d=5

d=3

d=2

d=1

d=3

d=3d=1

temp at=3 temp at=7

at=0

at=0

at=0

at=1

at=2

at=5 at=6

at=5

at=8at=11

rat=10

Slack= -1

Example: What is slack at PO?

Page 7: CSE241 VLSI Digital Circuits Winter 2003 Lecture 06: …vlsicad.ucsd.edu/courses/cse241a/web/lec_notes/Lec6_final.pdf · CSE241 VLSI Digital Circuits Winter 2003 Lecture 06: ... on

CSE241 L3 ASICs.7 Kahng & Cichy, UCSD ©2003

d=2

d=1

d=5

d=3

d=2

d=1

d=3

d=3d=1

temp at=3 temp at=7

at=0

at=0

at=0

at=1

at=2

at=5 at=6

at=5

at=8at=11

rat=10

Slack = 0

Example: Incremental Timing Analysis

at=10

d=1

d=1d=1

at=3

at=7

Amount of work is bounded by sizes of fanin, fanout cones of logic

Page 8: CSE241 VLSI Digital Circuits Winter 2003 Lecture 06: …vlsicad.ucsd.edu/courses/cse241a/web/lec_notes/Lec6_final.pdf · CSE241 VLSI Digital Circuits Winter 2003 Lecture 06: ... on

CSE241 L3 ASICs.8 Kahng & Cichy, UCSD ©2003

0=aAT1=bAT

2=xRAT

1=xAT121 −=−=xSL101 =−=bSL

000 =−=aSL1=yAT

0=cAT

011 =−=ySL

a

b xc

y

Definitions change as followsRAT = lower bound on arrival timePropagate shortest possible instead of longest possible delaysSlack = Arrival – Required

Example: negative slack because ATc is too small (early)

1 1

110 −=−=cSL

Early-Mode Analysis

Page 9: CSE241 VLSI Digital Circuits Winter 2003 Lecture 06: …vlsicad.ucsd.edu/courses/cse241a/web/lec_notes/Lec6_final.pdf · CSE241 VLSI Digital Circuits Winter 2003 Lecture 06: ... on

CSE241 L3 ASICs.9 Kahng & Cichy, UCSD ©2003

Enhancements of STA

Incremental timing analysis

Nanometer-scale process effects – variation (probabilistic timing analysis)

Interference – crosstalk

Multiple inputs switching

Conservatism of delay propagation

HW #8: Suppose you change the size of one (combinational) gate in your design, thus invalidating the previous timing analysis. How much work must be done to regain a correct timing analysis?

Courtesy K. Keutzer et al. UCB

Page 10: CSE241 VLSI Digital Circuits Winter 2003 Lecture 06: …vlsicad.ucsd.edu/courses/cse241a/web/lec_notes/Lec6_final.pdf · CSE241 VLSI Digital Circuits Winter 2003 Lecture 06: ... on

CSE241 L3 ASICs.10 Kahng & Cichy, UCSD ©2003

Timing Correction

Driven by STA“Incremental performance analysis backplane”

Fix electrical violationsResize cellsBuffer netsCopy (clone) cells

Fix timing problemsLocal transforms (bag of tricks)Path-based transforms

DAC-2002, Physical Chip Implementation

Page 11: CSE241 VLSI Digital Circuits Winter 2003 Lecture 06: …vlsicad.ucsd.edu/courses/cse241a/web/lec_notes/Lec6_final.pdf · CSE241 VLSI Digital Circuits Winter 2003 Lecture 06: ... on

CSE241 L3 ASICs.11 Kahng & Cichy, UCSD ©2003

Local Synthesis Transforms

Resize cells

Buffer or clone to reduce load on critical nets

Decompose large cells

Swap connections on commutative pins or among equivalent nets

Move critical signals forward

Pad early paths

Area recovery

DAC-2002, Physical Chip Implementation

Page 12: CSE241 VLSI Digital Circuits Winter 2003 Lecture 06: …vlsicad.ucsd.edu/courses/cse241a/web/lec_notes/Lec6_final.pdf · CSE241 VLSI Digital Circuits Winter 2003 Lecture 06: ... on

CSE241 L3 ASICs.12 Kahng & Cichy, UCSD ©2003

Transform Example

Delay = 4

…..

Double Inverter

Removal

…..

…..

Delay = 2

DAC-2002, Physical Chip Implementation

Page 13: CSE241 VLSI Digital Circuits Winter 2003 Lecture 06: …vlsicad.ucsd.edu/courses/cse241a/web/lec_notes/Lec6_final.pdf · CSE241 VLSI Digital Circuits Winter 2003 Lecture 06: ... on

CSE241 L3 ASICs.13 Kahng & Cichy, UCSD ©2003

Resizing

00.010.020.030.040.05

0 0.2 0.4 0.6 0.8 1load

d

A B C

b

ad

e

f0.2

0.2

0.3

?

b

aA

0.035

b

aC

0.026

DAC-2002, Physical Chip Implementation

Page 14: CSE241 VLSI Digital Circuits Winter 2003 Lecture 06: …vlsicad.ucsd.edu/courses/cse241a/web/lec_notes/Lec6_final.pdf · CSE241 VLSI Digital Circuits Winter 2003 Lecture 06: ... on

CSE241 L3 ASICs.14 Kahng & Cichy, UCSD ©2003

Cloning

00.010.020.030.040.05

0 0.2 0.4 0.6 0.8 1load

d

A B C

b

a

d

e

f

gh

0.2

0.2

0.20.20.2

?

b

a

d

ef

gh

A

B

DAC-2002, Physical Chip Implementation

Page 15: CSE241 VLSI Digital Circuits Winter 2003 Lecture 06: …vlsicad.ucsd.edu/courses/cse241a/web/lec_notes/Lec6_final.pdf · CSE241 VLSI Digital Circuits Winter 2003 Lecture 06: ... on

CSE241 L3 ASICs.15 Kahng & Cichy, UCSD ©2003

Buffering

00.010.020.030.040.05

0 0.2 0.4 0.6 0.8 1load

d

A B C

b

a

d

e

f

gh

0.2

0.2

0.20.20.2

? b

a

d

e

f

gh

0.1

0.2

0.20.20.2

BB

0.2

DAC-2002, Physical Chip Implementation

Page 16: CSE241 VLSI Digital Circuits Winter 2003 Lecture 06: …vlsicad.ucsd.edu/courses/cse241a/web/lec_notes/Lec6_final.pdf · CSE241 VLSI Digital Circuits Winter 2003 Lecture 06: ... on

CSE241 L3 ASICs.16 Kahng & Cichy, UCSD ©2003

Redesign Fan-in Tree

a

cd

b eArr(b)=3

Arr(c)=1

Arr(d)=0

Arr(a)=4

Arr(e)=61

1

1

cd

e

Arr(e)=51

1b1

a

DAC-2002, Physical Chip Implementation

Page 17: CSE241 VLSI Digital Circuits Winter 2003 Lecture 06: …vlsicad.ucsd.edu/courses/cse241a/web/lec_notes/Lec6_final.pdf · CSE241 VLSI Digital Circuits Winter 2003 Lecture 06: ... on

CSE241 L3 ASICs.17 Kahng & Cichy, UCSD ©2003

Redesign Fan-out Tree

1

1

1

3

1

1

1

Longest Path = 5

1

1

1

3

1

2

Longest Path = 4Slowdown of buffer due to load

DAC-2002, Physical Chip Implementation

Page 18: CSE241 VLSI Digital Circuits Winter 2003 Lecture 06: …vlsicad.ucsd.edu/courses/cse241a/web/lec_notes/Lec6_final.pdf · CSE241 VLSI Digital Circuits Winter 2003 Lecture 06: ... on

CSE241 L3 ASICs.18 Kahng & Cichy, UCSD ©2003

Decomposition

DAC-2002, Physical Chip Implementation

Page 19: CSE241 VLSI Digital Circuits Winter 2003 Lecture 06: …vlsicad.ucsd.edu/courses/cse241a/web/lec_notes/Lec6_final.pdf · CSE241 VLSI Digital Circuits Winter 2003 Lecture 06: ... on

CSE241 L3 ASICs.19 Kahng & Cichy, UCSD ©2003

Swap Commutative Pins

2

c

ab

2

1

0 1

1

1

3

a

cb

2

1

0

1

1

2

1 5

Simple sorting on arrival times and delay works

DAC-2002, Physical Chip Implementation

Page 20: CSE241 VLSI Digital Circuits Winter 2003 Lecture 06: …vlsicad.ucsd.edu/courses/cse241a/web/lec_notes/Lec6_final.pdf · CSE241 VLSI Digital Circuits Winter 2003 Lecture 06: ... on

CSE241 L3 ASICs.20 Kahng & Cichy, UCSD ©2003

OutlineClocking

Storage elements

Clocking metrics and methodology

Clock distribution

Package and useful-skew degrees of freedom

Clock power issues

Gate timing models

Page 21: CSE241 VLSI Digital Circuits Winter 2003 Lecture 06: …vlsicad.ucsd.edu/courses/cse241a/web/lec_notes/Lec6_final.pdf · CSE241 VLSI Digital Circuits Winter 2003 Lecture 06: ... on

CSE241 L3 ASICs.21 Kahng & Cichy, UCSD ©2003

Why Clocks?Clocks provide the means to synchronize

By allowing events to happen at known timing boundaries, we can sequence these events

Greatly simplifies building of state machines

No need to worry about variable delay through combinational logic (CL)

All signals delayed until clock edge (clock imposes the worst case delay)

CombLogic

register

CombLogic

register

registerDataflowFSM

Page 22: CSE241 VLSI Digital Circuits Winter 2003 Lecture 06: …vlsicad.ucsd.edu/courses/cse241a/web/lec_notes/Lec6_final.pdf · CSE241 VLSI Digital Circuits Winter 2003 Lecture 06: ... on

CSE241 L3 ASICs.22 Kahng & Cichy, UCSD ©2003

Clock Cycle Time

Cycle time is determined by the delay through the CLSignal must arrive before the latching edgeIf too late, it waits until the next cycle

- Synchronization and sequential order becomes incorrect

tcycle > tprop_delay + toverhead

Can change circuit architecture to obtain smaller TcyclePipeliningParallelism

Page 23: CSE241 VLSI Digital Circuits Winter 2003 Lecture 06: …vlsicad.ucsd.edu/courses/cse241a/web/lec_notes/Lec6_final.pdf · CSE241 VLSI Digital Circuits Winter 2003 Lecture 06: ... on

CSE241 L3 ASICs.23 Kahng & Cichy, UCSD ©2003

PipeliningFor dataflow:

Instead of a long critical path, split the critical path into chunks Insert registers to store intermediate resultsThis allows 2 waves of data to coexist within the CL

Can we extend this ad infinitum?Overhead eventually limits the pipelining

- E.g., 1.5 to 2 gate delays for latch or FFGranularity limits as well

- Minimum time quantum: delay of a gate

register

register

register

register

register

tpd tpd1 tpd2

tcycle > tpd + toverhead tcycle > max(tpd1, tpd2) + toverhead

CL

A+B

CL

A+BCL

A

CL

ACL

B

CL

B

Page 24: CSE241 VLSI Digital Circuits Winter 2003 Lecture 06: …vlsicad.ucsd.edu/courses/cse241a/web/lec_notes/Lec6_final.pdf · CSE241 VLSI Digital Circuits Winter 2003 Lecture 06: ... on

CSE241 L3 ASICs.24 Kahng & Cichy, UCSD ©2003

Parallelism

For FSMs:Same functionality and performance can be achieved at half the clock rateHowever, the input and output signals must be doubled (to account for the outputs for each original cycle)Instead of doubling the delay, the optimized logic is often logarithmically related to the degree of parallelism

register

tpd

tcycle1 > tpd + tov

M-bits

reg

tpd

tcycle2 > Ntpd + tov

M-bits

tpd

reg

M-bits register

tpd

2*M-bits

tcycle3 > log(Ntpd) + tov

CLCLCLCL CLCL

Opt.

CL

Opt.

CL

Page 25: CSE241 VLSI Digital Circuits Winter 2003 Lecture 06: …vlsicad.ucsd.edu/courses/cse241a/web/lec_notes/Lec6_final.pdf · CSE241 VLSI Digital Circuits Winter 2003 Lecture 06: ... on

CSE241 L3 ASICs.25 Kahng & Cichy, UCSD ©2003

OutlineClocking

Storage elements

Clocking metrics and methodology

Clock distribution

Package and useful-skew degrees of freedom

Clock power issues

Gate timing models

Page 26: CSE241 VLSI Digital Circuits Winter 2003 Lecture 06: …vlsicad.ucsd.edu/courses/cse241a/web/lec_notes/Lec6_final.pdf · CSE241 VLSI Digital Circuits Winter 2003 Lecture 06: ... on

CSE241 L3 ASICs.26 Kahng & Cichy, UCSD ©2003

Storage Elements

LatchesLevel sensitive – transparent when H, hold when L

ckb

d

ck

qp_q

ck

q

d

ck

qdck

q

d

Flip-flopsEdge-triggered – data is sampled at the clock edge

Page 27: CSE241 VLSI Digital Circuits Winter 2003 Lecture 06: …vlsicad.ucsd.edu/courses/cse241a/web/lec_notes/Lec6_final.pdf · CSE241 VLSI Digital Circuits Winter 2003 Lecture 06: ... on

CSE241 L3 ASICs.27 Kahng & Cichy, UCSD ©2003

Latch and Flip-Flop Gates

in out

enable

enable

Active high latch

clock

D QN

Q

clock

clockclock

Rising edge flip-flop

clock

D QN

Q

clock

clock

clock

clock

clock

clock

clock

out

enable

enable

in

Latch and flip-flop schematics from TSMC 0.13um LV Artisan Sage-X Standard Cell Library.

Page 28: CSE241 VLSI Digital Circuits Winter 2003 Lecture 06: …vlsicad.ucsd.edu/courses/cse241a/web/lec_notes/Lec6_final.pdf · CSE241 VLSI Digital Circuits Winter 2003 Lecture 06: ... on

CSE241 L3 ASICs.28 Kahng & Cichy, UCSD ©2003

Latch and Flip-Flop Behavior

Active high latch Rising edge flip-flopWhen clock is high

D QN

Q

D QN

Q

D QN

Q

D QN

Q

When clock is low When clock is low

When clock is high

tDQ 2 inverter delays tCQ 4 inverter delays

Page 29: CSE241 VLSI Digital Circuits Winter 2003 Lecture 06: …vlsicad.ucsd.edu/courses/cse241a/web/lec_notes/Lec6_final.pdf · CSE241 VLSI Digital Circuits Winter 2003 Lecture 06: ... on

CSE241 L3 ASICs.29 Kahng & Cichy, UCSD ©2003

(a)

(b)

(c)

clock at B

clock at B

A B

T – tj

clock

tj/2 tj/2

Thigh – tduty tduty

clock at Bclock at B

tsk,AB

clock at Aclock at B

tsk,AB

Clock Skew and Jitter

Cycle-to-cycle edge jitter

Duty cycle jitter

Clock skew

Page 30: CSE241 VLSI Digital Circuits Winter 2003 Lecture 06: …vlsicad.ucsd.edu/courses/cse241a/web/lec_notes/Lec6_final.pdf · CSE241 VLSI Digital Circuits Winter 2003 Lecture 06: ... on

CSE241 L3 ASICs.30 Kahng & Cichy, UCSD ©2003

Flip-Flop Timing Characteristics

Rising edge flip-flop

non-idealclock

tCQmax tcomb,max tsutsk+tj

Tflip-flops

non-idealclock

clock

tsk

tCQ,min

th

tcomb,min

A

B

A B

A

B

Setup time constraint Hold time constraint

Page 31: CSE241 VLSI Digital Circuits Winter 2003 Lecture 06: …vlsicad.ucsd.edu/courses/cse241a/web/lec_notes/Lec6_final.pdf · CSE241 VLSI Digital Circuits Winter 2003 Lecture 06: ... on

CSE241 L3 ASICs.31 Kahng & Cichy, UCSD ©2003

Latch Setup Time and Transparency

clock

tCQ tcomb,max tsu tsk+tjtduty

non-idealclock

clock

tcomb

non-idealclock

tDQ tDQ

A BA B

AB

AB

Active high latch

Setup time constraint No penalty to clock period for setup time constraint!

Page 32: CSE241 VLSI Digital Circuits Winter 2003 Lecture 06: …vlsicad.ucsd.edu/courses/cse241a/web/lec_notes/Lec6_final.pdf · CSE241 VLSI Digital Circuits Winter 2003 Lecture 06: ... on

CSE241 L3 ASICs.32 Kahng & Cichy, UCSD ©2003

OutlineClocking

Storage elements

Clocking metrics and methodology

Clock distribution

Package and useful-skew degrees of freedom

Clock power issues

Gate timing models

Page 33: CSE241 VLSI Digital Circuits Winter 2003 Lecture 06: …vlsicad.ucsd.edu/courses/cse241a/web/lec_notes/Lec6_final.pdf · CSE241 VLSI Digital Circuits Winter 2003 Lecture 06: ... on

CSE241 L3 ASICs.33 Kahng & Cichy, UCSD ©2003

Setup Time

Important characteristics of storage elementsSetup time, hold time, clock-to-q delay

Setup time, tsuTime before the clock edge that the data must arrive in order for the new data to be storedThe setup time for a F/F occurs before the latching edge.The setup time for a Latch occurs before the transition from transparent to hold

ck

d

tsetup

q

Page 34: CSE241 VLSI Digital Circuits Winter 2003 Lecture 06: …vlsicad.ucsd.edu/courses/cse241a/web/lec_notes/Lec6_final.pdf · CSE241 VLSI Digital Circuits Winter 2003 Lecture 06: ... on

CSE241 L3 ASICs.34 Kahng & Cichy, UCSD ©2003

Hold TimeA second important characteristic is the hold time, th

Time after the clock edge that the data must remain in order to the data to be properly heldNote that Hold time (and Setup time) can be negative

Why isn’t hold time just the negative of setup time?Storage elements typically have some data dependence

- Capacitances, and devices may be faster for one data value versus another

Specify the worst case for process technology and operating condition variations

ck

d

q

thold

Page 35: CSE241 VLSI Digital Circuits Winter 2003 Lecture 06: …vlsicad.ucsd.edu/courses/cse241a/web/lec_notes/Lec6_final.pdf · CSE241 VLSI Digital Circuits Winter 2003 Lecture 06: ... on

CSE241 L3 ASICs.35 Kahng & Cichy, UCSD ©2003

Clocking OverheadInherent delay in any storage element

The delay is measured from Clock transition to Output data transition, tc2q

Input data transition to Output data transition, td2q

Flip-flop is edge triggeredThe overhead is tc2q + tsu

Latch is level-sensitiveThe overhead is td2q

ck

d

tc2q

q

td2q

Page 36: CSE241 VLSI Digital Circuits Winter 2003 Lecture 06: …vlsicad.ucsd.edu/courses/cse241a/web/lec_notes/Lec6_final.pdf · CSE241 VLSI Digital Circuits Winter 2003 Lecture 06: ... on

CSE241 L3 ASICs.36 Kahng & Cichy, UCSD ©2003

Clock Skew

Most “high-profile” of clock network metrics

Maximum difference in arrival times of clock signal to any 2 latches/FF’s fed by the network

Skew = max | t1 – t2 |

Clock Source (ex. PLL)

CLK1

CLK2

Skew

Time

Time

Time

t1 t2

Latency

Fig. From Zarkesh-HaSylvester / Shepard, 2001

Page 37: CSE241 VLSI Digital Circuits Winter 2003 Lecture 06: …vlsicad.ucsd.edu/courses/cse241a/web/lec_notes/Lec6_final.pdf · CSE241 VLSI Digital Circuits Winter 2003 Lecture 06: ... on

CSE241 L3 ASICs.37 Kahng & Cichy, UCSD ©2003

Clock Skew Causes

Designed (unavoidable) variations – mismatch in buffer load sizes, interconnect lengths

Process variation – process spread across die yielding different Leff, Tox, etc. values

Temperature gradients – changes MOSFET performance across die

IR voltage drop in power supply – changes MOSFET performance across die

Note: Delay from clock generator to fan-out points (clock latency) is not important by itself

BUT: increased latency leads to larger skew for same amount of relative variation

Sylvester / Shepard, 2001

Page 38: CSE241 VLSI Digital Circuits Winter 2003 Lecture 06: …vlsicad.ucsd.edu/courses/cse241a/web/lec_notes/Lec6_final.pdf · CSE241 VLSI Digital Circuits Winter 2003 Lecture 06: ... on

CSE241 L3 ASICs.38 Kahng & Cichy, UCSD ©2003

Clock Jitter

Clock network delay uncertaintyFrom one clock cycle to the next, the period is not exactly the same each timeMaximum difference in phase of clock between any two periods is jitterMust be considered in max path (setup) timing; typically O(50ps) for high-end designs

Sylvester / Shepard, 2001

Page 39: CSE241 VLSI Digital Circuits Winter 2003 Lecture 06: …vlsicad.ucsd.edu/courses/cse241a/web/lec_notes/Lec6_final.pdf · CSE241 VLSI Digital Circuits Winter 2003 Lecture 06: ... on

CSE241 L3 ASICs.39 Kahng & Cichy, UCSD ©2003

Clock Jitter Causes

PLL oscillation frequency

Various noise sources affecting clock generation and distribution

E.g., power supply noise dynamically alters drive strength of intermediate buffer stagesJitter reduced by minimizing IR and L*(di/dt) noise

Courtesy Cypress Semi

Sylvester / Shepard, 2001

Page 40: CSE241 VLSI Digital Circuits Winter 2003 Lecture 06: …vlsicad.ucsd.edu/courses/cse241a/web/lec_notes/Lec6_final.pdf · CSE241 VLSI Digital Circuits Winter 2003 Lecture 06: ... on

CSE241 L3 ASICs.40 Kahng & Cichy, UCSD ©2003

Clocking Methodology (Edge-Triggered)

Max(tpd) < tper – tsu – tc2q – tskewDelay is too long for data to be captured

Min(tpd) > th-tc2q+tskewDelay is too short and data can race through, skipping a state

FlipFlop

tper

Comb

Logic

Comb

Logic

Page 41: CSE241 VLSI Digital Circuits Winter 2003 Lecture 06: …vlsicad.ucsd.edu/courses/cse241a/web/lec_notes/Lec6_final.pdf · CSE241 VLSI Digital Circuits Winter 2003 Lecture 06: ... on

CSE241 L3 ASICs.41 Kahng & Cichy, UCSD ©2003

Example of tpdmax Violation

Suppose there is skew between the registers in a dataflow (regA after regB)

“i” gets its input values from regA at transition in Ck’

CL output “o” arrives after Ck transition due to skew

To correct this problem, can increase cycle time

i

o

regA

regB

tpdmax

Ck’ Ck

CkCk’

i o

tskew

Too late!

tpdmax

Comb

Logic

Comb

Logic

Page 42: CSE241 VLSI Digital Circuits Winter 2003 Lecture 06: …vlsicad.ucsd.edu/courses/cse241a/web/lec_notes/Lec6_final.pdf · CSE241 VLSI Digital Circuits Winter 2003 Lecture 06: ... on

CSE241 L3 ASICs.42 Kahng & Cichy, UCSD ©2003

Example of tpdmin Violation: Race ThroughSuppose clock skew causes regA to be clocked before regB

“i” passes through the CL with little delay (tpdmin)

“o” arrives before the rising Ck’ causes the data to be latched

This problem cannot be fixed by changing frequency have a rock instead of a chip

i

oregA

regB

tpdmin

Ck Ck’

CkCk’

i o

tskew

Too early!

tpdmin

Comb

Logic

Comb

Logic

Page 43: CSE241 VLSI Digital Circuits Winter 2003 Lecture 06: …vlsicad.ucsd.edu/courses/cse241a/web/lec_notes/Lec6_final.pdf · CSE241 VLSI Digital Circuits Winter 2003 Lecture 06: ... on

CSE241 L3 ASICs.43 Kahng & Cichy, UCSD ©2003

Time Borrowing (Cycle Stealing)

Cycle steal with flip-flops using delayed clocks

FlipFlop

FlipFlop

tpd < tper + tw

Intentional delay = skewLatch

Latch

tpd > tper

Give it back in later stages

Ck

Ck

Tpd is safely > tpdmin

Time borrowing with latches

Comb

Logic

Comb

Logic

Comb

Logic

Comb

LogicComb

Logic

Comb

Logic

Page 44: CSE241 VLSI Digital Circuits Winter 2003 Lecture 06: …vlsicad.ucsd.edu/courses/cse241a/web/lec_notes/Lec6_final.pdf · CSE241 VLSI Digital Circuits Winter 2003 Lecture 06: ... on

CSE241 L3 ASICs.44 Kahng & Cichy, UCSD ©2003

OutlineClocking

Storage elements

Clocking metrics and methodology

Clock distribution

Package and useful-skew degrees of freedom

Clock power issues

Gate timing models

Page 45: CSE241 VLSI Digital Circuits Winter 2003 Lecture 06: …vlsicad.ucsd.edu/courses/cse241a/web/lec_notes/Lec6_final.pdf · CSE241 VLSI Digital Circuits Winter 2003 Lecture 06: ... on

CSE241 L3 ASICs.45 Kahng & Cichy, UCSD ©2003

Clock Distribution

General goal of clock distributionDeliver clock to all memory elements with acceptable skewDeliver clock edges with acceptable sharpness

Clocking network design is one of the greatest challenges in the design of a large chip

Clocks generally distributed via wiring trees (and meshes)

Low-resistance interconnect to minimize delay

Multiple drivers to distribute driver requirementsUse optimal sizing principles to design buffersClock lines can create significant crosstalk

Page 46: CSE241 VLSI Digital Circuits Winter 2003 Lecture 06: …vlsicad.ucsd.edu/courses/cse241a/web/lec_notes/Lec6_final.pdf · CSE241 VLSI Digital Circuits Winter 2003 Lecture 06: ... on

CSE241 L3 ASICs.46 Kahng & Cichy, UCSD ©2003

Clock Distribution Problem StatementObjective

Minimum skew (performance and hold time issues)Minimum cell area and metal use(sometimes) minimal latency(sometimes) particular latency(sometimes) intermixed gating for power reduction(sometimes) hold to particular duty cycle: e.g. 50:50 +- 1 percent

Subject to:Process variation from lot-to-lotProcess variation across the dieRadically different loading (ff density) around the dieMetal variation across the diePower variation across the die (both static IR and dynamic)Coupling (same and other layers)

Page 47: CSE241 VLSI Digital Circuits Winter 2003 Lecture 06: …vlsicad.ucsd.edu/courses/cse241a/web/lec_notes/Lec6_final.pdf · CSE241 VLSI Digital Circuits Winter 2003 Lecture 06: ... on

CSE241 L3 ASICs.47 Kahng & Cichy, UCSD ©2003

Issues in Clock Distribution Network Design

Skew Process, voltage, and temperatureData dependenceNoise couplingLoad balancing

Power, CV2f – (no ½ or α)Clock gating

Flexibility/TunabilityCompactness – fit into existing layout/design

ReliabilityElectromigration

Page 48: CSE241 VLSI Digital Circuits Winter 2003 Lecture 06: …vlsicad.ucsd.edu/courses/cse241a/web/lec_notes/Lec6_final.pdf · CSE241 VLSI Digital Circuits Winter 2003 Lecture 06: ... on

CSE241 L3 ASICs.48 Kahng & Cichy, UCSD ©2003

Skew: Clock Delay Varies With Position

Page 49: CSE241 VLSI Digital Circuits Winter 2003 Lecture 06: …vlsicad.ucsd.edu/courses/cse241a/web/lec_notes/Lec6_final.pdf · CSE241 VLSI Digital Circuits Winter 2003 Lecture 06: ... on

CSE241 L3 ASICs.49 Kahng & Cichy, UCSD ©2003

Clock Distribution Methods

RC-TreeLess capacitanceMore accuracyFlexible wiring

GridsReliableLess data dependencyTunable (late in design)

Shown here for final stage drivers driving F/F loads

Page 50: CSE241 VLSI Digital Circuits Winter 2003 Lecture 06: …vlsicad.ucsd.edu/courses/cse241a/web/lec_notes/Lec6_final.pdf · CSE241 VLSI Digital Circuits Winter 2003 Lecture 06: ... on

CSE241 L3 ASICs.50 Kahng & Cichy, UCSD ©2003

RC-Trees

H-Tree X-Tree Binary-Tree

Asymmetric trees can and are used due to uneven sink distribution, hard macros in floorplan ( hierarchical clock distribution), etc.; the basic goal is to have even RC delays

Page 51: CSE241 VLSI Digital Circuits Winter 2003 Lecture 06: …vlsicad.ucsd.edu/courses/cse241a/web/lec_notes/Lec6_final.pdf · CSE241 VLSI Digital Circuits Winter 2003 Lecture 06: ... on

CSE241 L3 ASICs.51 Kahng & Cichy, UCSD ©2003

Grids

Gridded clock distribution common on earlier DEC Alpha microprocessors

Advantages:Skew determined by grid density, not too sensitive to load positionClock signals available everywhereTolerant to process variationsUsually yields extremely low skew values

Disadvantages:Huge amount of wiring and powerTo minimize such penalties, need to make grid pitch coarser lose the grid advantage

Pre-drivers

Global grid

Sylvester / Shepard, 2001

Page 52: CSE241 VLSI Digital Circuits Winter 2003 Lecture 06: …vlsicad.ucsd.edu/courses/cse241a/web/lec_notes/Lec6_final.pdf · CSE241 VLSI Digital Circuits Winter 2003 Lecture 06: ... on

CSE241 L3 ASICs.52 Kahng & Cichy, UCSD ©2003

Trees

H-tree (Bakoglu)One large central driver, recursive structure to match wirelengthsHalve wire width at branching points to reduce reflections

DisadvantagesSlew degradation along long RC pathsUnrealistically large central driver

- Clock drivers can create large temperature gradients (ex. Alpha 21064 ~30° C)

Non-uniform load distributionInherently non-scalable (wire R growth) Partial solution: intermediate buffers at branching points

courtesy of P. Zarkesh-Ha

Sylvester / Shepard, 2001

Page 53: CSE241 VLSI Digital Circuits Winter 2003 Lecture 06: …vlsicad.ucsd.edu/courses/cse241a/web/lec_notes/Lec6_final.pdf · CSE241 VLSI Digital Circuits Winter 2003 Lecture 06: ... on

CSE241 L3 ASICs.53 Kahng & Cichy, UCSD ©2003

Buffered Tree

L2

WGBuf EGBuf

NGBuf

SGBuf

L3

PLL

Drives all clock loads within its region

Other regions of the chip

Sylvester / Shepard, 2001

Page 54: CSE241 VLSI Digital Circuits Winter 2003 Lecture 06: …vlsicad.ucsd.edu/courses/cse241a/web/lec_notes/Lec6_final.pdf · CSE241 VLSI Digital Circuits Winter 2003 Lecture 06: ... on

CSE241 L3 ASICs.54 Kahng & Cichy, UCSD ©2003

Buffered H-tree

AdvantagesIdeally zero-skewCan be low power (depending on skew requirements)Low area (silicon and wiring)CAD tool friendly (regular)

DisadvantagesSensitive to process variationsLocal clocking loads inherently non-uniform

Sylvester / Shepard, 2001

Page 55: CSE241 VLSI Digital Circuits Winter 2003 Lecture 06: …vlsicad.ucsd.edu/courses/cse241a/web/lec_notes/Lec6_final.pdf · CSE241 VLSI Digital Circuits Winter 2003 Lecture 06: ... on

CSE241 L3 ASICs.55 Kahng & Cichy, UCSD ©2003

Tree Balancing

Some techniques:a) Introduce dummy loads

b) Snaking of wirelength to match delays

Con: Routing area often more valuable than Silicon

Sylvester / Shepard, 2001

Page 56: CSE241 VLSI Digital Circuits Winter 2003 Lecture 06: …vlsicad.ucsd.edu/courses/cse241a/web/lec_notes/Lec6_final.pdf · CSE241 VLSI Digital Circuits Winter 2003 Lecture 06: ... on

CSE241 L3 ASICs.56 Kahng & Cichy, UCSD ©2003

Examples of Distribution

H-Tree, Asymmetric RC-Tree (IBM)

GridsDEC [Alphas]

SerpentinesIntel x86[Young ISSCC97]

Page 57: CSE241 VLSI Digital Circuits Winter 2003 Lecture 06: …vlsicad.ucsd.edu/courses/cse241a/web/lec_notes/Lec6_final.pdf · CSE241 VLSI Digital Circuits Winter 2003 Lecture 06: ... on

CSE241 L3 ASICs.57 Kahng & Cichy, UCSD ©2003

Examples From Processor Chips

DEC-Alpha 21064 clock spinesDEC-Alpha 21064 RC delays

DEC-Alpha 21164 RC delays for Global Distribution (Spine + Grid)

DEC-Alpha 21164 RC local delays

Page 58: CSE241 VLSI Digital Circuits Winter 2003 Lecture 06: …vlsicad.ucsd.edu/courses/cse241a/web/lec_notes/Lec6_final.pdf · CSE241 VLSI Digital Circuits Winter 2003 Lecture 06: ... on

CSE241 L3 ASICs.58 Kahng & Cichy, UCSD ©2003

ReShape Clocks Example

Balanced, shielded H-tree for pre-clock distribution

Mesh for Block level distribution

Page 59: CSE241 VLSI Digital Circuits Winter 2003 Lecture 06: …vlsicad.ucsd.edu/courses/cse241a/web/lec_notes/Lec6_final.pdf · CSE241 VLSI Digital Circuits Winter 2003 Lecture 06: ... on

CSE241 L3 ASICs.59 Kahng & Cichy, UCSD ©2003

output mesh

Pre-clock 2 Level H-tree

All routes 5-6u M6/5, shielded with 1u grounds

~10 buffers per node

output mesh must hit every sub-block

Page 60: CSE241 VLSI Digital Circuits Winter 2003 Lecture 06: …vlsicad.ucsd.edu/courses/cse241a/web/lec_notes/Lec6_final.pdf · CSE241 VLSI Digital Circuits Winter 2003 Lecture 06: ... on

CSE241 L3 ASICs.60 Kahng & Cichy, UCSD ©2003

Block Level Mesh (.18u)

Max 600u stride

1u m5 ribs every 20 - 30 u (4 to 6 rows)

Shielded input and output m6 shorting straps

Clumps of 1-6 clock buffers, surrounded by capacitor pads

Pre-clock connects to input shorting straps

Page 61: CSE241 VLSI Digital Circuits Winter 2003 Lecture 06: …vlsicad.ucsd.edu/courses/cse241a/web/lec_notes/Lec6_final.pdf · CSE241 VLSI Digital Circuits Winter 2003 Lecture 06: ... on

CSE241 L3 ASICs.61 Kahng & Cichy, UCSD ©2003

Problems with Meshes

Burn more power at low frequencies

Blocks more routing resources (solution, integrated power distribution with ribs can provide shielding for ‘free’)

Difficult for ‘spare’ clock domains that will not tolerate regioning

Post placement (and routing) tuning required

No ‘beneficial skew’ (shudder) possible

Page 62: CSE241 VLSI Digital Circuits Winter 2003 Lecture 06: …vlsicad.ucsd.edu/courses/cse241a/web/lec_notes/Lec6_final.pdf · CSE241 VLSI Digital Circuits Winter 2003 Lecture 06: ... on

CSE241 L3 ASICs.62 Kahng & Cichy, UCSD ©2003

Problems with Meshes (#2)

Clock gating only easy at root

Fighting tools to do analysis:Clumped buffers a problem in Static Timing Analysis toolsLarge shorted meshes a problem for STA tools

Need Full extractions and Spice-Like simulation (e.g. Avant! Star-Sim) to determine skew

Page 63: CSE241 VLSI Digital Circuits Winter 2003 Lecture 06: …vlsicad.ucsd.edu/courses/cse241a/web/lec_notes/Lec6_final.pdf · CSE241 VLSI Digital Circuits Winter 2003 Lecture 06: ... on

CSE241 L3 ASICs.63 Kahng & Cichy, UCSD ©2003

Benefits of Meshes (#3)

Deterministic since shielded all the way down to rib distribution

No ecoplacement required: all buffers preplaced before block placement

Low latency since uses shorted drivers, therefore lower skew

Ecoplacements of FFs later do not require rebalance of tree

“Idealized” clocking environment for concurrent RTL design and timing convergence dance.

Page 64: CSE241 VLSI Digital Circuits Winter 2003 Lecture 06: …vlsicad.ucsd.edu/courses/cse241a/web/lec_notes/Lec6_final.pdf · CSE241 VLSI Digital Circuits Winter 2003 Lecture 06: ... on

CSE241 L3 ASICs.64 Kahng & Cichy, UCSD ©2003

Mesh Example

~ 100k flops

6 blocks

Page 65: CSE241 VLSI Digital Circuits Winter 2003 Lecture 06: …vlsicad.ucsd.edu/courses/cse241a/web/lec_notes/Lec6_final.pdf · CSE241 VLSI Digital Circuits Winter 2003 Lecture 06: ... on

CSE241 L3 ASICs.65 Kahng & Cichy, UCSD ©2003

Clock Skew Thermal Map

Pre-tuning

Page 66: CSE241 VLSI Digital Circuits Winter 2003 Lecture 06: …vlsicad.ucsd.edu/courses/cse241a/web/lec_notes/Lec6_final.pdf · CSE241 VLSI Digital Circuits Winter 2003 Lecture 06: ... on

CSE241 L3 ASICs.66 Kahng & Cichy, UCSD ©2003

Clock Skew Thermal Map #2

50ps block/ 100ps global skew, post tuning

Page 67: CSE241 VLSI Digital Circuits Winter 2003 Lecture 06: …vlsicad.ucsd.edu/courses/cse241a/web/lec_notes/Lec6_final.pdf · CSE241 VLSI Digital Circuits Winter 2003 Lecture 06: ... on

CSE241 L3 ASICs.67 Kahng & Cichy, UCSD ©2003

Alternative Clock Network Strategy

Globally – Tree

Power requirements reduced relative to global grid

Smaller routing requirements, frees up global tracks

Trees balanced easily at global level

Keeps global skew low (with minimal process variation)

Sylvester / Shepard, 2001

Page 68: CSE241 VLSI Digital Circuits Winter 2003 Lecture 06: …vlsicad.ucsd.edu/courses/cse241a/web/lec_notes/Lec6_final.pdf · CSE241 VLSI Digital Circuits Winter 2003 Lecture 06: ... on

CSE241 L3 ASICs.68 Kahng & Cichy, UCSD ©2003

OutlineClocking

Storage elements

Clocking metrics and methodology

Clock distribution

Package and useful-skew degrees of freedom

Clock power issues

Gate timing models

Page 69: CSE241 VLSI Digital Circuits Winter 2003 Lecture 06: …vlsicad.ucsd.edu/courses/cse241a/web/lec_notes/Lec6_final.pdf · CSE241 VLSI Digital Circuits Winter 2003 Lecture 06: ... on

CSE241 L3 ASICs.69 Kahng & Cichy, UCSD ©2003

Skew Reduction Using Package

• Most clock network latency occurs at global level (largest distances spanned)

• Latency ∝ Skew

• With reverse scaling, routing low-RC signals at global level becomes more difficult & area-consuming

Sylvester / Shepard, 2001

Page 70: CSE241 VLSI Digital Circuits Winter 2003 Lecture 06: …vlsicad.ucsd.edu/courses/cse241a/web/lec_notes/Lec6_final.pdf · CSE241 VLSI Digital Circuits Winter 2003 Lecture 06: ... on

CSE241 L3 ASICs.70 Kahng & Cichy, UCSD ©2003

System clock

µP/ASIC Solder bump

substrate

⇒ Incorporate globalclock distribution into the package

⇒ Flip-chip packaging allows for high density, low parasitic access from substrate to IC

• RC of package-level wiring up to 4 orders of magnitude smaller than on-chip wiring

• Global skew reduced

• Lower capacitance lower power

• Opens up global routing tracks

• Results not yet conclusive

Skew Reduction Using Package

Sylvester / Shepard, 2001

Page 71: CSE241 VLSI Digital Circuits Winter 2003 Lecture 06: …vlsicad.ucsd.edu/courses/cse241a/web/lec_notes/Lec6_final.pdf · CSE241 VLSI Digital Circuits Winter 2003 Lecture 06: ... on

CSE241 L3 ASICs.71 Kahng & Cichy, UCSD ©2003

Useful Skew (= cycle-stealing)

FF fast FF FFslow

Zero skew

hold setup hold setup

Timing Slacks

FF fast FF FFslow

Useful skew

hold setup hold setup

Useful skew• Local skew constraints• Shift slack to critical paths

Zero skew• Global skew constraint• All skew is bad

W. Dai, UC Santa Cruz

Page 72: CSE241 VLSI Digital Circuits Winter 2003 Lecture 06: …vlsicad.ucsd.edu/courses/cse241a/web/lec_notes/Lec6_final.pdf · CSE241 VLSI Digital Circuits Winter 2003 Lecture 06: ... on

CSE241 L3 ASICs.72 Kahng & Cichy, UCSD ©2003

Skew = Local Constraint

D : longest pathd : shortest pathFF FF

safe

Skew

race condition cycle time violation

-d + thold Tperiod - D - tsetup< <

permissible range

Timing is correct as long as the signal arrives in the permissible skew range

W. Dai, UC Santa Cruz

Page 73: CSE241 VLSI Digital Circuits Winter 2003 Lecture 06: …vlsicad.ucsd.edu/courses/cse241a/web/lec_notes/Lec6_final.pdf · CSE241 VLSI Digital Circuits Winter 2003 Lecture 06: ... on

CSE241 L3 ASICs.73 Kahng & Cichy, UCSD ©2003

Skew Scheduling for Design Robustness

“0 0 0”: at verge of violation

FF FF FF2 ns 6 ns T = 6 ns

“2 0 2”: more safety margin4 0

-22

4 0

Design will be more robust if clock signal arrival time is in the middle of permissible skew range, rather than on the edge

W. Dai, UC Santa Cruz

Page 74: CSE241 VLSI Digital Circuits Winter 2003 Lecture 06: …vlsicad.ucsd.edu/courses/cse241a/web/lec_notes/Lec6_final.pdf · CSE241 VLSI Digital Circuits Winter 2003 Lecture 06: ... on

CSE241 L3 ASICs.74 Kahng & Cichy, UCSD ©2003

Potential Advantages of Useful Skew

CLK

0-skew

CLK

U-skew

Reduce peak current consumption by distributing the FF switch point in the range of permissible skew

Can exploit extra margin to increase clock frequency or reduce sizing (= power)

W. Dai, UC Santa Cruz

Page 75: CSE241 VLSI Digital Circuits Winter 2003 Lecture 06: …vlsicad.ucsd.edu/courses/cse241a/web/lec_notes/Lec6_final.pdf · CSE241 VLSI Digital Circuits Winter 2003 Lecture 06: ... on

CSE241 L3 ASICs.75 Kahng & Cichy, UCSD ©2003

Conventional Zero-Skew Flow

PlacementPlacement

SynthesisSynthesis

Extraction & Delay CalculationExtraction & Delay Calculation

Static Timing AnalysisStatic Timing Analysis

0-Skew Clock Synthesis0-Skew Clock Synthesis

Clock RoutingClock Routing

Signal RoutingSignal Routing

W. Dai, UC Santa Cruz

Page 76: CSE241 VLSI Digital Circuits Winter 2003 Lecture 06: …vlsicad.ucsd.edu/courses/cse241a/web/lec_notes/Lec6_final.pdf · CSE241 VLSI Digital Circuits Winter 2003 Lecture 06: ... on

CSE241 L3 ASICs.76 Kahng & Cichy, UCSD ©2003

Useful-Skew Flow

Existing PlacementExisting Placement

Extraction & Delay CalculationExtraction & Delay Calculation

Static Timing AnalysisStatic Timing Analysis

U-Skew Clock SynthesisU-Skew Clock Synthesis

Clock RoutingClock Routing

Signal RoutingSignal Routing

Permissible range generationPermissible range generation

Initial skew schedulingInitial skew scheduling

Clock tree topology synthesisClock tree topology synthesis

Clock net routingClock net routing

Clock timing verificationClock timing verification

W. Dai, UC Santa Cruz

Page 77: CSE241 VLSI Digital Circuits Winter 2003 Lecture 06: …vlsicad.ucsd.edu/courses/cse241a/web/lec_notes/Lec6_final.pdf · CSE241 VLSI Digital Circuits Winter 2003 Lecture 06: ... on

CSE241 L3 ASICs.77 Kahng & Cichy, UCSD ©2003

OutlineClocking

Storage elements

Clocking metrics and methodology

Clock distribution

Package and used-skew degrees of freedom

Clock power issues

Gate timing models

Page 78: CSE241 VLSI Digital Circuits Winter 2003 Lecture 06: …vlsicad.ucsd.edu/courses/cse241a/web/lec_notes/Lec6_final.pdf · CSE241 VLSI Digital Circuits Winter 2003 Lecture 06: ... on

CSE241 L3 ASICs.78 Kahng & Cichy, UCSD ©2003

Power consumption in clocks due to:Clock driversLong interconnectionsLarge clock loads – all clocked elements (latches, FF’s) are driven

Different components dominateDepending on type of clock network usedEx. Grid – huge pre-drivers & wire cap. drown out load cap.

Clock Power

Sylvester / Shepard, 2001

Page 79: CSE241 VLSI Digital Circuits Winter 2003 Lecture 06: …vlsicad.ucsd.edu/courses/cse241a/web/lec_notes/Lec6_final.pdf · CSE241 VLSI Digital Circuits Winter 2003 Lecture 06: ... on

CSE241 L3 ASICs.79 Kahng & Cichy, UCSD ©2003

Clock Power Is LARGE

Not only is the clock capacitance large, it switches every cycle!

P = α C Vdd2 f

Sylvester / Shepard, 2001

Page 80: CSE241 VLSI Digital Circuits Winter 2003 Lecture 06: …vlsicad.ucsd.edu/courses/cse241a/web/lec_notes/Lec6_final.pdf · CSE241 VLSI Digital Circuits Winter 2003 Lecture 06: ... on

CSE241 L3 ASICs.80 Kahng & Cichy, UCSD ©2003

Low-Power Clocking

Gated clocksGated clocksPrevent switching in areas of chip not being usedPrevent switching in areas of chip not being usedEasier in static designsEasier in static designs

EdgeEdge--triggered flops in ARM rather than transparent latches triggered flops in ARM rather than transparent latches in Alphain Alpha

Reduced load on clock for each latch/flopReduced load on clock for each latch/flopEliminated spurious powerEliminated spurious power--consuming transitions during latch flowconsuming transitions during latch flow--throughthrough

Sylvester / Shepard, 2001

Page 81: CSE241 VLSI Digital Circuits Winter 2003 Lecture 06: …vlsicad.ucsd.edu/courses/cse241a/web/lec_notes/Lec6_final.pdf · CSE241 VLSI Digital Circuits Winter 2003 Lecture 06: ... on

CSE241 L3 ASICs.81 Kahng & Cichy, UCSD ©2003

Clock Area

Clock networks consume silicon area (clock drivers, PLL, etc.) and routing area

Routing area is most vital

Top-level metals are used to reduce RC delaysThese levels are precious resources (unscaled)Power routing, clock routing, key global signals

Reducing area also reduces wiring capacitance and power

Typical #’s: Intel Itanium – 4% of M4/5 used in clock routing

Sylvester / Shepard, 2001

Page 82: CSE241 VLSI Digital Circuits Winter 2003 Lecture 06: …vlsicad.ucsd.edu/courses/cse241a/web/lec_notes/Lec6_final.pdf · CSE241 VLSI Digital Circuits Winter 2003 Lecture 06: ... on

CSE241 L3 ASICs.82 Kahng & Cichy, UCSD ©2003

Clock Slew Rates

To maintain signal integrity and latch performance, minimum slew rates are required

Too slow – clock is more susceptible to noise, latches are slowed down, setup times eat into timing budget [Tsetup = 200 + 0.33 * Tslew(ps)], more short-circuit power for large clock driversToo fast – burns too much power, overdesigned network, enhanced ground bounce

Rule-of-thumb: Trise and Tfall of clock are each between 10-20% of clock period (10% - aggressive target)

1 GHz clock; Trise = Tfall = 100-200ps

Sylvester / Shepard, 2001

Page 83: CSE241 VLSI Digital Circuits Winter 2003 Lecture 06: …vlsicad.ucsd.edu/courses/cse241a/web/lec_notes/Lec6_final.pdf · CSE241 VLSI Digital Circuits Winter 2003 Lecture 06: ... on

CSE241 L3 ASICs.83 Kahng & Cichy, UCSD ©2003

Example: Alpha 21264

Grid + H-tree approach

Power = 32% of total

Wire usage = 3% of metals 3 & 4

4 major clock quadrants, each with a large driver connected to local grid structures

Sylvester / Shepard, 2001

Page 84: CSE241 VLSI Digital Circuits Winter 2003 Lecture 06: …vlsicad.ucsd.edu/courses/cse241a/web/lec_notes/Lec6_final.pdf · CSE241 VLSI Digital Circuits Winter 2003 Lecture 06: ... on

CSE241 L3 ASICs.84 Kahng & Cichy, UCSD ©2003

Alpha 21264 Skew Map

Ref: Compaq, ASP-DAC00

Sylvester / Shepard, 2001

Page 85: CSE241 VLSI Digital Circuits Winter 2003 Lecture 06: …vlsicad.ucsd.edu/courses/cse241a/web/lec_notes/Lec6_final.pdf · CSE241 VLSI Digital Circuits Winter 2003 Lecture 06: ... on

CSE241 L3 ASICs.85 Kahng & Cichy, UCSD ©2003

Power vs. Skew

Fundamental design decisionMeeting skew requirements is easy with unlimited power budget

Wide wires reduce RC product but increase total CDriver upsizing reduces latency ( reduces skew as well) but increases buffer cap

SOC context: plastic package power limit is 2-3 W

Sylvester / Shepard, 2001

Page 86: CSE241 VLSI Digital Circuits Winter 2003 Lecture 06: …vlsicad.ucsd.edu/courses/cse241a/web/lec_notes/Lec6_final.pdf · CSE241 VLSI Digital Circuits Winter 2003 Lecture 06: ... on

CSE241 L3 ASICs.86 Kahng & Cichy, UCSD ©2003

Clock Distribution Trends

TimingClock period dropping fast, skew must followSlew rates must also scale with cycle timeJitter – PLL’s get better with CMOS scaling but other sources of noise increase

- Power supply noise more important- Switching-dependent temperature gradients

MaterialsCu reduces RC slew degradation, potential skewLow-k decreases power, improves latency, skew, slews

PowerComplexity, dynamic logic, pipelining more clock sinksLarger chips bigger clock networks

Sylvester / Shepard, 2001

Page 87: CSE241 VLSI Digital Circuits Winter 2003 Lecture 06: …vlsicad.ucsd.edu/courses/cse241a/web/lec_notes/Lec6_final.pdf · CSE241 VLSI Digital Circuits Winter 2003 Lecture 06: ... on

CSE241 L3 ASICs.87 Kahng & Cichy, UCSD ©2003

OutlineClocking

Storage elements

Clocking metrics and methodology

Clock distribution

Package and useful-skew degrees of freedom

Clock power issues

Gate timing models

Page 88: CSE241 VLSI Digital Circuits Winter 2003 Lecture 06: …vlsicad.ucsd.edu/courses/cse241a/web/lec_notes/Lec6_final.pdf · CSE241 VLSI Digital Circuits Winter 2003 Lecture 06: ... on

CSE241 L3 ASICs.88 Kahng & Cichy, UCSD ©2003

Gate Timing Characterization

“Extract” exact transistor characteristics from layoutTransistor width, length, junction area and perimeterLocal wire length and inter-wire distance

Compute all transistor and wire capacitances

CL DA

B

F

CL

Page 89: CSE241 VLSI Digital Circuits Winter 2003 Lecture 06: …vlsicad.ucsd.edu/courses/cse241a/web/lec_notes/Lec6_final.pdf · CSE241 VLSI Digital Circuits Winter 2003 Lecture 06: ... on

CSE241 L3 ASICs.89 Kahng & Cichy, UCSD ©2003

Cell Timing Characterization

Delay tables generated using a detailed transistor-level circuit simulator SPICE (differential-equations solver)

For a number of different input slews and load capacitances simulate the circuit of the cell

Propagation time (50% Vdd at input to 50% at output)Output slew (10% Vdd at output to 90% Vdd at output)

Time

tslew

tpd

Vdd

Page 90: CSE241 VLSI Digital Circuits Winter 2003 Lecture 06: …vlsicad.ucsd.edu/courses/cse241a/web/lec_notes/Lec6_final.pdf · CSE241 VLSI Digital Circuits Winter 2003 Lecture 06: ... on

CSE241 L3 ASICs.90 Kahng & Cichy, UCSD ©2003

Non-linear effects reflected in tables

InputSlew

InputSlew

Delay at the gate

OutputCapacitance

OutputCapacitance

OutputSlew

IntrinsicDelay

Resulting waveform

DG = f (CL, Sin) and Sout = f (CL, Sin)Non-linear

Interpolate between table entries

Interpolation error is usually below 10% of SPICE

Page 91: CSE241 VLSI Digital Circuits Winter 2003 Lecture 06: …vlsicad.ucsd.edu/courses/cse241a/web/lec_notes/Lec6_final.pdf · CSE241 VLSI Digital Circuits Winter 2003 Lecture 06: ... on

CSE241 L3 ASICs.91 Kahng & Cichy, UCSD ©2003

Conservatism of Gate Delay Modeling

True gate delay depends on input arrival time patterns

STA will assume that only 1 input is switchingWill use worst slope among several inputs

Time

A B Ftpd

Time

A Ftpd

Vdd

Vdd

DA

B

F

CLD

A

B

F

CL