signed and unsigned integers - lth · signed and unsigned integers ... !,disadvantages: ... •...
TRANSCRIPT
Introduc)on*to*Structured*VLSI*Design*3Integer*Arithme)c**and*Pipelining Joachim Rodrigues
Lund University / EITF35/ Joachim Rodrigues 2012
Overview*
*• Fixed*point*representa)on*• Addi)on*• Mul)plica)on*in*a*processor/hardware*• Pipelining **
Lund University / EITF35/ Joachim Rodrigues 2012
Outline*
• Mul%plica%on,in,the,digital,domain,
• HW5mapping,
• Pipelining,op%miza%on,
Lund University / EITF35/ Joachim Rodrigues 2012
n-1 ! Unsigned integer: ∑ bit
i • 2i
i=0 ! Two's complement signed integer:
n-2 bit
n-1• (-2n-1) + ∑ bit
i • 2i
i=0
n-1 5 4 3 2 1 0
Signed and Unsigned Integers
MSB defines sign
Lund University / EITF35/ Joachim Rodrigues 2012
Signed*overflow*↑ * * * *3128 * *1000*0000** * * * * *3127 * *1000*0001** * * * * *... * *...** * * * * * * *1111*1100** * * * * * * *1111*1101** * * * * *32 * *1111*1110** * * * * *31 * *1111*1111*
Signed*integers * * * * *0 * *0000*0000 *0**** * * * * *1 * *0000*0001 *1** * * * * *2 * *0000*0010 *2** * * * * *3 * *0000*0011 *3** * * * * *... * *... * *...** * * * * *126 * *0111*1110 *126**Unsigned*integers*
Signed*overflow*↓ * * * *127 * *0111*1111 *127 *** * * * * * * *1000*0000 *128** * * * * * * *1000*0001 *129** * * * * * * *... * *...** * * * * * * *1111*1110 *254** * * * * * * *1111*1111 *255 *Unsigned*overflow*↓*
83bit*Signed/Unsigned*Integers*
MSB*defines*sign*
Lund University / EITF35/ Joachim Rodrigues 2012
Exercise (5 min)
• Convert 32 and 15 into unsigned – Do an addition – Do a subtraction
• Convert back to decimal and check the result
Lund University / EITF35/ Joachim Rodrigues 2012
A0*B0*
S0*
C1*
A1*B1*
S1*
C2*
Cn31*
An31*
Bn31*
Sn31*
Cn*
C0*=*0*
...*
! ,The,HW,for,sum/difference,(S),does,not,care,about,signed/unsigned,
! ,Unsigned,overflow,=,Carry5out,&,add,OR,no carry-out & subtract,⇔,Unsigned,overflow,,
! ,Signed,overflow,=,Cn,⊕ Cn51,
! ,True,sign,=,Sn51,⊕ signed,overflow,=,,(An51,⊕,Bn51,⊕,Cn51),⊕,(Cn,⊕,Cn51),=,An51,⊕,Bn51,⊕,Cn,,
Add/Subtract*
+* +* +*
Lund University / EITF35/ Joachim Rodrigues 2012
Unsigned*Overflow*Examples* 10+6 = 16, outside [0..15]
1010 +0110 C4=1 0000
Cn = C4 = 1 & add ⇔ Unsigned overflow #Carry-out & add ⇔ Unsigned overflow#
7-10 = -3, outside [0..15]
0111 - 1010
same as 0111 0101 + 1 C4=0 1101
Cn = C4 = 0 & subtract ⇔ Unsigned overflow #No carry-out & subtract ⇔ Unsigned overflow
Lund University / EITF35/ Joachim Rodrigues 2012
Signed Overflow Example
6+7 = 13, outside [-8..7]
0110 +0111 C4=0 1101
Cn ⊕ Cn-1 = C4 ⊕ C3 = 0 ⊕ 1 = 1 ⇔#Carry-outs different ⇔ Signed overflow Sn-1 ⊕ signed overflow = An-1 ⊕ Bn-1 ⊕ Cn = A3 ⊕ B3 ⊕ C4 = 0 ⊕ 0 ⊕ 0 = 0 ⇔ True sign = Positive/zero
C3 = 1
Lund University / EITF35/ Joachim Rodrigues 2012
Representation in VHDL
• numeric_std offers – signed – unsigned
• Use unsigned whenever feasible • Proper use reduces HW cost
Lund University / EITF35/ Joachim Rodrigues 2012
MULTIPLICATION IN HW
Lund University / EITF35/ Joachim Rodrigues 2012
Some conventionsltiplication
! Product,=,Mul%plicand,*,Mul%plier,
! log,(product),=,log,(mul%plicand),+,log,(mul%plier) ,! Width,of,product,is,(worst5case),sum,of,widths,of,factors,
! May,overflow,if,single,length,product,register,is,used,
! Paper5and5pencil,method,
! Condi%onal,add,(controlled,by,bits,of,mul%plier),and,shiQ,
! Par%al,product,progressively,develops,into,product,
! 1,product,bit/cycle,
! Unsigned,and,signed,mul%plica%on,
! Signs,require,extra,aTen%on,
! Sequen%al,,combina%onal,or,pipelined,implementa%on,
! Tradeoff,between,hardware,resources,,throughput,,latency,,power,
Lund University / EITF35/ Joachim Rodrigues 2012
Multiplication in a processor
• On the coming slides we will learn how a multiplication could be realized in a processor
• Hardware efficiency will be traded to achieve a higher throughput
Lund University / EITF35/ Joachim Rodrigues 2012
Example:
1011 * 1110
0000 (*0 = zero)
+1011. (*1 = copy)
+1011.. (*1 = copy)
+1011... (*1 = copy)
10011010
In*decimal:*11***14*=*154*
We*will*concentrate*on*unsigned*integers*for*the*next*few*slides*!*
Mul)plying*Using*Paper*and*Pencil*
Exercise: Multiply 1011*1110 (5 min)
Lund University / EITF35/ Joachim Rodrigues 2012
Mul)plicand***Mul)plier * * *Partl3product**Partl3mul)plier**
***1011*1110 * * * * *** * * * * * *0000***1110 **
********0000 *(0) * * * ***********+0000** * * * *************shi`*>>*00000##111 **
******1011. *(1) * * * ************+1011*.** * * * * *********>>**010110##11 **
****1011.*. *(1) * * * ************+1011*.*.** * * * * *********>>*1000010###1 **
**1011.*.*.* *(1) * * * ************+1011*.*.*.**10011010 * * * * **************10011010#
Par)al*prod*uct,#part.mul.#0 ****Mul)plicand*
0:*add*zero,*1:*add*mul)plicand * *Shi`ing*in*carry3out*prevents*overflow*
...*more*Paper*and*Pencil*
Disadvantage:*2n3bit*ALU * * **Advantage:**n3bit*ALU*
LSB ”controls” whether to add ”0” or multiplicand to partial product
Lund University / EITF35/ Joachim Rodrigues 2012
Mul)plicand*
0* Mul)plier*
bit*0*2n3bit*reg.*
n3bit*reg.*
Seq.*Mul)plica)on,*Ini)alize*
Load*Load*
Load*Load*
Add*Cn*Control*signal*
Lund University / EITF35/ Joachim Rodrigues 2012
Mul)plicand*
Par)al*product* Par)al*mul)plier*
Cn*
Condi)onal*add*
bit*0*
Repeat*step*n*)mes*
2n3bit*reg.*
x*
Seq.*Mul)plica)on,*Step*
Shi`*right*
n3bit*reg.*
Cn* Add*
Lund University / EITF35/ Joachim Rodrigues 2012
Mul)plicand*
Product*bit*0*
2n3bit*reg.*
Seq.*Mul)plica)on,*Result*n3bit*reg.*
Cn* Add*
one partial product per clock cycle => very slow How can we increase throughput?
Lund University / EITF35/ Joachim Rodrigues 2012
Mp1*Mc* Mp0*Mc*
Mp2*Mc*
P1* P0*P2*Pn31*P2n32..n*P2n31*
Mpn31*Mc*
0*
String*of*n3bit*Adders*
! ,Unrolling,loop,lowers,latency,compared,to,sequen%al,add5and5shiQ,at,the,expense,of,much,more,hardware,,! ,n,x,n,mul%plica%on,requires,
,,,n51,,n5bit,adders,,! ,tsaved_latency,=,n*(tclk5out+tset5up) ,
Lund University / EITF35/ Joachim Rodrigues 2012
Mul)plica)on*by*a*Constant*,
• The,synthesis,engine,may,be,not,smart,enough,to,do,some,evident,op%miza%ons.,,,As,a,designer,you,need,to,assure,that,mul%plica%ons/division,with,a,small,constant,is,accomplished,by,a,number,of,shiQs,and,adds,,
,Some,numerical,examples:,,*2,(*102):,mul%plicand,<<,1,*3,(*112):,mul%plicand,<<,1,+,mul%plicand,*4,(*1002):,mul%plicand,<<,2,*5,(*1012):,mul%plicand,<<,2,+,mul%plicand,*255,(*111111112):,mul%plicand,<<,8,–,mul%plicand
Lund University / EITF35/ Joachim Rodrigues 2012
! ,Either,transform,to,mul%ply,of,non5nega%ve,integers:,,1. ,Record,signs,and,negate,any,nega%ve,factors.,
2. ,Perform,unsigned,mul%plica%on.,
3. ,Negate,product,if,signs,above,differ.,,
! ,Or,directly,perform,signed,mul%plica%on:,,1. ,Take,into,account,the,sign,bit,of,mul%plicand,by,,shiQing,in,true,sign,bits,rather,than,carry5outs,,i.e.,,An51,⊕,Bn51,⊕,Cn,rather,than,Cn.,
2. ,Take,into,account,the,sign,bit,of,mul%plier,by,,doing,a,condi%onal,subtract,rather,than,a,,condi%onal,add,during,the,last,itera%on.,
Don't*forget*...*Signed*Mul)plica)on*
Lund University / EITF35/ Joachim Rodrigues 2012
+*
MP1,0*
MP0,1*0*
+*
+*
MP1,1*
MP0,2*
MP2,0*+*
+*
MP1,2*
MP0,3*0*
MP2,1*
+*
MP3,0*
+*
MP2,2*
+*
MP3,1*
+*
MP3,2*
MP0,0*MP1,3*
MP2,3*
MP3,3*
P7* P6* P5* P4* P3* P2* P1* P0*
MPi,*j*=*Mul)plieri*AND*Mul)plicandj*
Pipeline*registers*
Pipeline*registers*
Pipeline*registers*
Carry3propagate*adder*
0*
...*Pipelined*Version*
Lund University / EITF35/ Joachim Rodrigues 2012
6*x*6*Parallel*Array*Mul)plier*
Lund University / EITF35/ Joachim Rodrigues 2012
! ,The,sequen%al,shiQ5and5add,algorithm,corresponds,to,a,,,,for5loop,that,may,be,implemented,by:,
! a,state,machine,or,! instruc%ons,(low5end,microcontrollers),
! ,The,sequen%al,algorithm,may,be,unrolled,and,,,,,implemented,as,a,deep,combina%onal,circuit:,
! String,of,n5bit,adders,and,AND5gates,,or,! Carry5save,adders,,AND5gates,,and,final,(n51)5bit,adder,
! ,Advantage:,low,latency,! ,Disadvantage:,more,hardware,,,! ,The,deep,combina%onal,circuit,may,be,pipelined,! ,Advantage:,very,high,throughput,! ,Disadvantages:,pipeline,latency,,more,hardware,,and,higher,power,,
Sequen)al,*Combina)onal,*and*Pipelined*
Lund University / EITF35/ Joachim Rodrigues 2012
Pipelining
Lund University / EITF35/ Joachim Rodrigues 2012
Laundry*process*
Lund University / EITF35/ Joachim Rodrigues 2012
• Non5pipelined:,– Delay:,60,min,
– Throughput,1/60,load,per,min,
• Pipelined:,– Delay:,60,min,
– Throughput,k/(40+k*20),load,per,min,about,1/20,when,k,is,large,,
– Throughput,3,%mes,beTer,than,non5pipelined,,
Comparison*
Lund University / EITF35/ Joachim Rodrigues 2012
Pipelined*combina)onal*circuit*
Lund University / EITF35/ Joachim Rodrigues 2012
Adding*pipeline*to*a*comb*circuit*
• Candidate,circuit,for,pipeline:,– enough,input,data,to,feed,the,pipelined,circuit,– throughput,is,a,main,performance,criterion,
– comb,circuit,can,be,divided,into,stages,with,similar,propaga%on,delays,
– propaga%on,delay,of,a,stage,is*much*larger*than,the,setup,%me,and,the,clock5to5q,delay,of,the,register.,
Lund University / EITF35/ Joachim Rodrigues 2012
Exercise*(15*min)*
• Pipeline,two,45bit,adders,which,are,connected,in,series.,The,FFs,are,ideal(tsetup=,tclk5>Q=0),
,,,,,tpA=,400,ps.,The,carry,out,of,the,2nd,adder,can,be,ignored.,
,
How,many,pipeline,stages?,
Where,do,you,put,the,FFs?,
What’s,the,gain,in,throughput?,
How,many,FFs,are,required?,
,
,
Lund University / EITF35/ Joachim Rodrigues 2012
Exercise*(15*min)*
• Pipeline,two,45bit,adders,which,are,connected,in,series.,The,FFs,are,ideal(tsetup=,tclk5>Q=0),
,,,,,tpA=,400,ps.,The,carry,out,of,the,2nd,adder,can,be,ignored.,
,
How,many,pipeline,stages?,
Where,do,you,put,the,FFs?,
What’s,the,gain,in,throughput?,
How,many,FFs,are,required?,
,
,
FA
FA
FA
FA
FA
FA
FA
FA
a0
a1
a2
a3
b0
b1
b2
b3
s0p
s1p
s2p
s3p
c0
c1
c2
c3
s0
s1
s2
s3
c3
Lund University / EITF35/ Joachim Rodrigues 2012
– Derive,the,block,diagram,of,the,original,combina%onal,circuit,and,arrange,the,circuit,as,a,cascading,chain,
– Iden%fy,the,major,components,and,es%mate,the,rela%ve,propaga%on,delays,of,these,components,
– Divide,the,chain,into,stages,of,similar*propaga)on*delays*– Iden%fy,the,signals,that,cross,the,boundary,of,the,chain,– Insert,registers,for,these,signals,in,the,boundary.,
Recipe*
Lund University / EITF35/ Joachim Rodrigues 2012
Datapath*
• RTL,descrip%on,is,characterized,by,,registers,in,a,design,,and,the,combina%onal,logic,inbetween.,,
• This,can,be,illustrated,by,a,"register,and,cloud",diagram,.,
• Registers,and,the,combina%onal,logic,are,described,separately,in,two,different,processes.,
Lund University / EITF35/ Joachim Rodrigues 2012
Datapath3Sequen)al*part*
architecture SPLIT of DATAPATH is
signal X1, Y1, X2, Y2 : ... begin seq : process (CLK) begin if (CLK'event and CLK = '1') then X1 <= Y0; X2 <= Y1; X3 <= Y2; end if; end process;
Lund University / EITF35/ Joachim Rodrigues 2012
Datapath3Combinatorial*part*
LOGIC : process (X1, X2)
begin - F(X1) and G(X2) can be replaced with the code - implementing the desired combinational logic - or appropriate functions must be defined. Y1 <= F(X1); Y2 <= G(X2); end process; end SPLIT;
Do,not,constraint,the,synhtesis,tool,by,splipng,opera%ons,,e.g.,,y1=x1+x12.,
Lund University / EITF35/ Joachim Rodrigues 2012
Pipelining **
• The*instruc)ons*on*the*preceeding*slides*introduced*pipelining*of*the*DP.*
• The*cri)cal*path*is*reduced*from*F(X1)+*G(X2)*to*the*either*F(X1)*or*G(X2).*
Lund University / EITF35/ Joachim Rodrigues 2012
What’s*next?*
• Con%nue,sequence,detector,,• Lab,buddy?,,
Next*Deadline:*Prepara)on*of*sequence*detector*Tuesday*11th*
Group*A:*13315,*Group*B:15317*,,
Lund University / EITF35/ Joachim Rodrigues 2012
?,