© 2003-2009 ran ginosar048878 lecture 3: handshake ckt implementations 1 vlsi architectures 048878...
Post on 19-Dec-2015
219 views
TRANSCRIPT
© 2003-2009 Ran Ginosar 048878 Lecture 3: Handshake Ckt Implementations 1
VLSI Architectures048878
Lecture 3
S&F Ch. 5: Handshake Ckt Implementations
© 2003-2009 Ran Ginosar 048878 Lecture 3: Handshake Ckt Implementations 2
Implementations• We only consider simple circuits
• More aggressive circuits will come later
• First, reminder on latches
© 2003-2009 Ran Ginosar 048878 Lecture 3: Handshake Ckt Implementations 3
4- & 2-phase bundled data latches
CCC
ACK ACK ACK
REQ REQ REQ
ACK
REQ
LATCH LATCH LATCH
EN EN EN
CC
ACK ACK
REQ REQ
ACK
REQ
LATCH LATCH
C P C P
C
LATCH
C P
ACK
REQ
© 2003-2009 Ran Ginosar 048878 Lecture 3: Handshake Ckt Implementations 4
4-phase dual rail – many bits
CC C
C C C
CC C
C C C
C C C
ACK
d[0].t
d[0].f
d[1].t
d[1].f
ACK
d[0].t
d[0].f
d[1].t
d[1].f
© 2003-2009 Ran Ginosar 048878 Lecture 3: Handshake Ckt Implementations 5
4-phase Fork, Join
y.t
z.tx.t
y.f
z.fx.f
Cy-ackz-ackx-ackC
y-ackz-ackx-ack
y-req
z-reqx-req
y
zx
Fork
y
zx
y-ackz-ack
x-ack
Cy-reqz-reqx-req
y z1x z0
y-ackz-ack
x-ack
x.t z0.tx.f z0.f
y.t z1.ty.f z1.f
COMPONENT 4-phase bundled data 4-phase dual-rail
Join(wait for all)
yz
x
© 2003-2009 Ran Ginosar 048878 Lecture 3: Handshake Ckt Implementations 6
4-phase Bundled-data Mux
yz
x
y zx
ctl.tctl.f
y-ackz-ack
x-ack
y-req
z-req
x-req
C
C
ctl
0
1
C
Cctl.f
ctl.t
ctl-ack
© 2003-2009 Ran Ginosar 048878 Lecture 3: Handshake Ckt Implementations 7
4-phase Bundled-data Demux
y-ackz-ack
x-ack
y-req
z-req
x-req
C
Cctl.f
ctl.t
ctl-ack
zx
y
ctl
0
1
yzx
© 2003-2009 Ran Ginosar 048878 Lecture 3: Handshake Ckt Implementations 8
4-phase Merge
COMPONENT 4-phase bundled data 4-phase dual-rail
yz
x y zx
y-reqx-req
y-ackz-ack
x-ack
y-reqz-req
x-req
C
Cz.t
y-ackz-ack
x-ack
C
x.t
z.fy.fy.t
x.f
C
CD
CD
Merge(wait for one)
© 2003-2009 Ran Ginosar 048878 Lecture 3: Handshake Ckt Implementations 9
COMPONENT 4-phase bundled data 4-phase dual-rail
yz
x y zx
y-reqx-req
y-ackz-ack
x-ack
y-reqz-req
x-req
C
Cz.t
y-ackz-ack
x-ack
C
x.t
z.fy.fy.t
x.f
C
CD
CD
Merge(wait for one)
4-phase Merge
Mutually exclusive inputs.Guaranteed elsewhere!(more later..)
Assume X active…
…C-element sees input glitch
Relative Timing: x-req < z-ack simplify CEL
© 2003-2009 Ran Ginosar 048878 Lecture 3: Handshake Ckt Implementations 10
Asymmetric C Element• Useful when we know the relative timing:
b < a only a needed to pull up
• Only one pMOS - faster
Ca
b
c
a
b
c
© 2003-2009 Ran Ginosar 048878 Lecture 3: Handshake Ckt Implementations 11
2-phase Merge• Try it at home…
• This is not an assignment!
© 2003-2009 Ran Ginosar 048878 Lecture 3: Handshake Ckt Implementations 12
Mutual Exclusion: MUTEX
R1R2
G1 G2
MU
TE
X
R2
R1 G1
G2
x1x2
© 2003-2009 Ran Ginosar 048878 Lecture 3: Handshake Ckt Implementations 13
Standard Gate MUTEXsR1
R2O2
O1G1
G2
Not fully guaranteed that outputs are M/E, but highly probable !
R1
R2O2
O1 G1
G2
Very low threshold
© 2003-2009 Ran Ginosar 048878 Lecture 3: Handshake Ckt Implementations 14
Arbiter
AR
BIT
ER
R1
R0A0
A1
R2A2
R1
A1
R2
A2M
UT
EX
C
R0
A0
C
G1
G2
Y1
Y2
© 2003-2009 Ran Ginosar 048878 Lecture 3: Handshake Ckt Implementations 15
Arbitrating Merge
AR
B-M
ER
GE
x
y x-req MU
TE
X
C
z-req
C
Gx
Gy
Fx
Fy
z
x-ack
y-req
y-ack
z-ack
x
yz
FyFx
© 2003-2009 Ran Ginosar 048878 Lecture 3: Handshake Ckt Implementations 16
Function Blocks• We said “transparent” but…
– Need a matched delay for bundled-data
– Need to generate completion for dual-rail
– Need to join inputs, fork outputs:
=
© 2003-2009 Ran Ginosar 048878 Lecture 3: Handshake Ckt Implementations 17
Transparency Revisited• Function blocks must not affect how the
latches “shake hands” (except for timing)
ack ack
=
© 2003-2009 Ran Ginosar 048878 Lecture 3: Handshake Ckt Implementations 18
Indication Revisited• FB(req_out) means
– FB(req_in)
– Computation finished, data out ready
• Simple “strong indication” for bundled data:
req_in req_out
1: ALL DATA_IN VALID
2: REQ_IN
3: COMPUTE
4: ALL DATA_OUT VALID
5: REQ_OUT
© 2003-2009 Ran Ginosar 048878 Lecture 3: Handshake Ckt Implementations 19
Strong vs. Weak Indication
• Strong Indication: All inputs must arrive before any output is allowed (“indicated”).
– Even if some outputs are ready earlier, there is no REQ_OUT, so they cannot be used.
– Implies worst-case latency
• Weak Indication: Some outputs are allowed even before all inputs arrived
– Only makes sense in dual-rail:
© 2003-2009 Ran Ginosar 048878 Lecture 3: Handshake Ckt Implementations 20
Weak Indication• No REQ on dual-rail – each bit is “self-
indicating”
• May lead to faster circuits
• Example chain of events:
DR
ack
1 2
34
6
5
7
© 2003-2009 Ran Ginosar 048878 Lecture 3: Handshake Ckt Implementations 21
Composition of FBs• Legal composition:
– All inputs and outputs are connected
– No cycles
• Legal composition of weekly indicating FBs is weakly indicating
• Legal composition of strongly indicating FBs is strongly indicating
© 2003-2009 Ran Ginosar 048878 Lecture 3: Handshake Ckt Implementations 22
Example: Ripple-carry
...
ai bi
cidi
si
...
a1b1
cind1
s1
anbn
cncout
sn
© 2003-2009 Ran Ginosar 048878 Lecture 3: Handshake Ckt Implementations 23
Example: Ripple-carry• Full adder (a,b,c) = (s,d)
– s = a b c
– d = ab + ac + bc
• Shortcuts for look-ahead (prop, gen, kill):
– p = a b s = p c
– g = ab d = g + pc, OR d' = k + pc'
– k = a' b'
• Sometimes d can be made valid without waiting for c
© 2003-2009 Ran Ginosar 048878 Lecture 3: Handshake Ckt Implementations 24
Speculative / Strong Ripple Carry• 16 bit ripple-carry adder, bundled-data
• Longest carry is 16 stages
• But if p8=0 then longest carry is 8 stages
• And if p12p8p4=0, then longest carry is 4 stages
• If willing to trade area and power for speed:
© 2003-2009 Ran Ginosar 048878 Lecture 3: Handshake Ckt Implementations 25
Speculative / Strong Ripple Carry
DELAYSELECTOR
short
medium
long
REQ_OUT
ADDER
REQ_IN
© 2003-2009 Ran Ginosar 048878 Lecture 3: Handshake Ckt Implementations 26
ST-CL
Based on David, Ginosar, Yoeli, "An Efficient Implementation of Boolean Functions as Self-Timed Circuits,'' IEEE Trans. Computers, Jan. 1992
© 2003-2009 Ran Ginosar 048878 Lecture 3: Handshake Ckt Implementations 27
Dual-Rail DIMS PLA Notation
C
C
a.f
a.t
C
C
b.f
b.t
z.f
z.t
a
b
z
a.t a.f b.fb.t
C
C
C
C
z.t z.f
© 2003-2009 Ran Ginosar 048878 Lecture 3: Handshake Ckt Implementations 28
Dual-Rail DIMS Adders
ADD
a
b
c
s
d
a.t a.f b.fb.t c.fc.t s.t s.f d.fd.t
C
C
C
C
C
C
C
C
a.t a.f b.fb.t c.fc.t s.t s.f d.fd.t
C
C
C
C
C
C
C
C
C
C
GEN
KILL
Still slow: LF(V) = LF(E)
© 2003-2009 Ran Ginosar 048878 Lecture 3: Handshake Ckt Implementations 29
Transistor Level DIMS• Too many P
transistors - slow
• Some N paths can be shared:
a.f
b.f
c.f
a.f
b.f
c.t
a.f
b.t
c.f
a.t
b.f
c.f
a.f
b.f
c.t
a.f
b.f
c.t
d.f
a.f
b.f
c.f c.t
a.f
b.t
c.f
a.t
b.f
d.f
© 2003-2009 Ran Ginosar 048878 Lecture 3: Handshake Ckt Implementations 30
Hybrid Adder• Dual-rail carry (for flexible latency)
• Bundled-data data inputs and sum output (for lower area and power)
• Data-dependent data-forward (V) latency
• Constant empty-forward (E) latency
© 2003-2009 Ran Ginosar 048878 Lecture 3: Handshake Ckt Implementations 31
Hybrid Adder
...
ai bi
...
a1b1
c1.td1.t
CARRY
anbn
cout
...
si
...
s1
SUM
sn
CD
c1.fd1.f
ci.t
ci.f
di.t
di.f
cn.t
cn.f
dn.t
dn.f
C
REQ_OUT
REQ_INcin
SUM SUM
CARRY CARRY
Dual-rail
Bun
dled
-dat
a
© 2003-2009 Ran Ginosar 048878 Lecture 3: Handshake Ckt Implementations 32
Domino Logic Dual Rail
REQ_IN
REQ_IN
REQ_IN
f
t
REQ_OUT?
Req Out: Either by (flexible) Completion Detection or by matched (worst case) delay
© 2003-2009 Ran Ginosar 048878 Lecture 3: Handshake Ckt Implementations 33
Hybrid Adder: Sum Ckt
REQ_IN
REQ_IN
s
b b
c.fc.t
bb
a a
© 2003-2009 Ran Ginosar 048878 Lecture 3: Handshake Ckt Implementations 34
Hybrid Adder: Two Carry Ckts
REQ_IN
REQ_IN
REQ_IN
d.f
d.t
b
a a ab
c.f b c.t
ba
REQ_IN
REQ_IN
REQ_IN
d.f
d.t
a a
c.f c.t
aa
bb
Weak Indication Strong Indication
KILL GEN
© 2003-2009 Ran Ginosar 048878 Lecture 3: Handshake Ckt Implementations 35
Hybrid Adder: Two Carry Ckts
WEAKCARRY
STRONGCARRY
STRONGCARRY
WEAKCARRY
STRONGCARRY
STRONGCARRY
…123456
CD
Slightly faster…