lecture 04 control units
TRANSCRIPT
![Page 1: Lecture 04 Control Units](https://reader036.vdocuments.net/reader036/viewer/2022081718/5537d140550346592f8b4624/html5/thumbnails/1.jpg)
Control Unit :Hardwired vs. Microprogrammed Approach
Dr Shankar BalachandranIndian Institute of Technology Madras
[email protected] October 2006
![Page 2: Lecture 04 Control Units](https://reader036.vdocuments.net/reader036/viewer/2022081718/5537d140550346592f8b4624/html5/thumbnails/2.jpg)
Two Major Blocks in a CPU
Datapath Adders, multipliers, dividers Shifters, Registers Anything that changes or stores data
Control Unit Controls the data How data is stored? Where is it stored? When should data be available?
![Page 3: Lecture 04 Control Units](https://reader036.vdocuments.net/reader036/viewer/2022081718/5537d140550346592f8b4624/html5/thumbnails/3.jpg)
Control Unit
Correct sequencing of control signals Much like human brain controlling various
parts of body Sequence and timing is the key
Any aberration will result in wrong operation
![Page 4: Lecture 04 Control Units](https://reader036.vdocuments.net/reader036/viewer/2022081718/5537d140550346592f8b4624/html5/thumbnails/4.jpg)
A Simplified Control Unit
Control Unit
Fetch Unit
Decode Unit
Execution Unit
Write Back Unit
Fetch
Decode
Execute
Write Back
![Page 5: Lecture 04 Control Units](https://reader036.vdocuments.net/reader036/viewer/2022081718/5537d140550346592f8b4624/html5/thumbnails/5.jpg)
A Possible Implementation
2 to 4Decoder
CLK
Mod-3 Counter
![Page 6: Lecture 04 Control Units](https://reader036.vdocuments.net/reader036/viewer/2022081718/5537d140550346592f8b4624/html5/thumbnails/6.jpg)
Timing Diagram
CLK
Fetch
Decode
Execute
Write Back
![Page 7: Lecture 04 Control Units](https://reader036.vdocuments.net/reader036/viewer/2022081718/5537d140550346592f8b4624/html5/thumbnails/7.jpg)
Let’s Sample The Signals
1
0
0
0
0
1
0
0
0
0
1
0
0
0
0
1
![Page 8: Lecture 04 Control Units](https://reader036.vdocuments.net/reader036/viewer/2022081718/5537d140550346592f8b4624/html5/thumbnails/8.jpg)
Another Way to Generate Signals
1 0 0 0
0 1 0 0
0 0 1 0
0 0 0 1
![Page 9: Lecture 04 Control Units](https://reader036.vdocuments.net/reader036/viewer/2022081718/5537d140550346592f8b4624/html5/thumbnails/9.jpg)
Hardwired vs Microprogrammed
HardwiredUse gates to generate signalsSqueeze out the juice for performanceDifferent logic styles possible
MicroprogrammedStore the control signals in the sequenceJust read from the memory every clock cycle
![Page 10: Lecture 04 Control Units](https://reader036.vdocuments.net/reader036/viewer/2022081718/5537d140550346592f8b4624/html5/thumbnails/10.jpg)
A Model Computer (Richard Eckert, SIGCSE Bulletin, Vol. 20, No. 3, September 1988)
Accumulator
ALU
Register B
PC
MAR
MDR
RAM
IR
Control
8
8
12
12
12
12
12
12
12
4
12
Bus
R
W
LM
IPLPEP
LDED
LAEA
SA
EU
LB
LIEI
![Page 11: Lecture 04 Control Units](https://reader036.vdocuments.net/reader036/viewer/2022081718/5537d140550346592f8b4624/html5/thumbnails/11.jpg)
More Details
L = Load E = Copy to bus A,S = Add and Subtract Sign bit to control unit IP = Increment PC
ACC
ALU
B
PC
MAR
MDR
RAM
IR
ControlBus
RW
LM
IP
LPEP
LD
ED
LAEA
S
AEU
LB
LIEI
![Page 12: Lecture 04 Control Units](https://reader036.vdocuments.net/reader036/viewer/2022081718/5537d140550346592f8b4624/html5/thumbnails/12.jpg)
LDALoad
Accumulator
1 A←(Mem)
1. MAR ←IR
2. MDR ←M(MAR)
3. A ←MDR
EI,LM
R
ED,LA
STAStore
Accumulator2 (Mem) ←A
1. MAR ←IR
2.MDR ←A
3. M(MAR) ← MDR
EI,LM
EA,LD
W
ADD 3 A ←A+B 1. A←ALU(Add) A,EU,LA
SUB 4 A ←A-B 1. A←ALU(Sub) S,EU,LA
MBA 5 B ←A 1. B←A EA,LB
JMP 6 PC ←Mem 1. PC←IR EI,LP
JN 7 PC ←Mem
If –ve flag is set
1. PC←IR if NF is set NF : EI,LP
HLT 8-15 Stop Clock
“Fetch” IR ←Next Instruction
1. MAR ←PC
2. MDR ←M(MAR)
3. IR ← MDR
EP,LM
R
ED,LI,IP
Mnemonic Opcode Action Register TransfersActive
Controls
![Page 13: Lecture 04 Control Units](https://reader036.vdocuments.net/reader036/viewer/2022081718/5537d140550346592f8b4624/html5/thumbnails/13.jpg)
Hardwired Unit
IR
Decoder Control Matrix
LDASTA
ADDSUB
MBAJMP
JN
Ring Counter
NF
T5 T1
Halt
Opcode
Control Signals
CLK
![Page 14: Lecture 04 Control Units](https://reader036.vdocuments.net/reader036/viewer/2022081718/5537d140550346592f8b4624/html5/thumbnails/14.jpg)
Table with Sequencing
IP LP EP LM R W LD ED LI EI LA EA A S EU LB
Fetch T2 T0 T0 T1 T2 T2
LDA T3 T4 T5 T3 T5
STA T3 T5 T4 T3 T4
MBA T3 T3
ADD T3 T3 T3
SUB T3 T3 T3
JMP T3 T3
JN T3*F
T3*F
IP = T2; R=T1+T4*LDA; LI=T2;LP = T3*JMP+T3*JN*NF; W=T5* STA; A = T3*ADD;EP = T0; LD = T4*STA; S = T3*SUB;LM = T0+T3*LDA+T3*STA ED=T2+T5*LDA; …..
![Page 15: Lecture 04 Control Units](https://reader036.vdocuments.net/reader036/viewer/2022081718/5537d140550346592f8b4624/html5/thumbnails/15.jpg)
Control Matrix
Implement using discrete gates Usually done using PLAs Large control matrices are implemented
hierarchicallyFor speed
A well known process and design flows are widespread
![Page 16: Lecture 04 Control Units](https://reader036.vdocuments.net/reader036/viewer/2022081718/5537d140550346592f8b4624/html5/thumbnails/16.jpg)
An Alternate Implementation
IRStartingAddress
Generator
uPC
Control Store
CLK
+1
MicroinstructionRegister
+NF
& CD
MAP
1*
01
00
Control
Map CD Meaning
1 * From IR
0 0UnconditionalBranch within Microprogram
0 1
NF=0 => IncrementNF=1 =>
Conditional Branch
32 x 24
HLT
Control ROMJump Address
4-bit opcode
![Page 17: Lecture 04 Control Units](https://reader036.vdocuments.net/reader036/viewer/2022081718/5537d140550346592f8b4624/html5/thumbnails/17.jpg)
Control Store
Fetch 0
00 0011000000000000 0 0 0 01
01 0000100000000000 0 0 0 02
02 1000000110000000 0 1 0 XX
LDA 1 03 0001000001000000 0 0 0 04
04 0000100000000000 0 0 0 05
05 0000000100100000 0 0 0 00
STA 2 06 0001000001000000 0 0 0 07
07 0000001000010000 0 0 0 08
08 0000010000000000 0 0 0 00
ADD 3 09 0000000000101010 0 0 0 00
SUB 4 0A 0000000000100110 0 0 0 00
MBA 5 0B 0000000000010001 0 0 0 00
JMP 6 0C 0100000001000000 0 0 0 00
JN 7 0D 0000000000000000 1 0 0 0F
0E 0000000000000000 0 0 0 00
0F 0100000001000000 0 0 0 00
Expansion 8-E 10-1E
HLT F 1F 0000000000000000 0 0 1 XX
Instruction Op-CodeuInstructionAddress Control Signals CD MAP HLT Addr. Of Next
Control Word
![Page 18: Lecture 04 Control Units](https://reader036.vdocuments.net/reader036/viewer/2022081718/5537d140550346592f8b4624/html5/thumbnails/18.jpg)
Example 1 – MBA followed by ADD
Fetch 0
00 0011000000000000 0 0 0 01
01 0000100000000000 0 0 0 02
02 1000000110000000 0 1 0 XX
LDA 1 03 0001000001000000 0 0 0 04
04 0000100000000000 0 0 0 05
05 0000000100100000 0 0 0 00
STA 2 06 0001000001000000 0 0 0 07
07 0000001000010000 0 0 0 08
08 0000010000000000 0 0 0 00
ADD 3 09 0000000000101010 0 0 0 00
SUB 4 0A 0000000000100110 0 0 0 00
MBA 5 0B 0000000000010001 0 0 0 00
JMP 6 0C 0100000001000000 0 0 0 00
JN 7 0D 0000000000000000 1 0 0 0F
0E 0000000000000000 0 0 0 00
0F 0100000001000000 0 0 0 00
Expansion 8-E 10-1E
HLT F 1F 0000000000000000 0 0 1 XX
0B09
LB
EU
SAEA
LA
EI
LI
ED
LD
WRLM
EP
LP
IP
![Page 19: Lecture 04 Control Units](https://reader036.vdocuments.net/reader036/viewer/2022081718/5537d140550346592f8b4624/html5/thumbnails/19.jpg)
Sequence for MBA,ADD
1. MAR ←PC
2. MDR ←M(MAR)
3. IR ← MDR B←A 1. MAR ←PC
2. MDR ←M(MAR)
3. IR ← MDR A←ALU(Add)
0011000000000000
0011000000000000
0000100000000000
0000100000000000
1000000110000000
1000000110000000
0000000000010001
0000000000101010
MOV B,A
ADD
![Page 20: Lecture 04 Control Units](https://reader036.vdocuments.net/reader036/viewer/2022081718/5537d140550346592f8b4624/html5/thumbnails/20.jpg)
Example 2 – JN with Flag Set
Fetch 0
00 0011000000000000 0 0 0 01
01 0000100000000000 0 0 0 02
02 1000000110000000 0 1 0 XX
LDA 1 03 0001000001000000 0 0 0 04
04 0000100000000000 0 0 0 05
05 0000000100100000 0 0 0 00
STA 2 06 0001000001000000 0 0 0 07
07 0000001000010000 0 0 0 08
08 0000010000000000 0 0 0 00
ADD 3 09 0000000000101010 0 0 0 00
SUB 4 0A 0000000000100110 0 0 0 00
MBA 5 0B 0000000000010001 0 0 0 00
JMP 6 0C 0100000001000000 0 0 0 00
JN 7 0D 0000000000000000 1 0 0 0F
0E 0000000000000000 0 0 0 00
0F 0100000001000000 0 0 0 00
Expansion 8-E 10-1E
HLT F 1F 0000000000000000 0 0 1 XX
0D
CD
If negative FLAG is set, jump to a new location by skipping to uInstruction at 0F
LB
EU
SAEA
LA
EI
LI
ED
LD
WRLM
EP
LP
IP
![Page 21: Lecture 04 Control Units](https://reader036.vdocuments.net/reader036/viewer/2022081718/5537d140550346592f8b4624/html5/thumbnails/21.jpg)
Example 3 – JN with Flag Not Set
Fetch 0
00 0011000000000000 0 0 0 01
01 0000100000000000 0 0 0 02
02 1000000110000000 0 1 0 XX
LDA 1 03 0001000001000000 0 0 0 04
04 0000100000000000 0 0 0 05
05 0000000100100000 0 0 0 00
STA 2 06 0001000001000000 0 0 0 07
07 0000001000010000 0 0 0 08
08 0000010000000000 0 0 0 00
ADD 3 09 0000000000101010 0 0 0 00
SUB 4 0A 0000000000100110 0 0 0 00
MBA 5 0B 0000000000010001 0 0 0 00
JMP 6 0C 0100000001000000 0 0 0 00
JN 7 0D 0000000000000000 1 0 0 0F
0E 0000000000000000 0 0 0 00
0F 0100000001000000 0 0 0 00
Expansion 8-E 10-1E
HLT F 1F 0000000000000000 0 0 1 XX
0D
CDCD
LB
EU
SAEA
LA
EI
LI
ED
LD
WRLM
EP
LP
IP
![Page 22: Lecture 04 Control Units](https://reader036.vdocuments.net/reader036/viewer/2022081718/5537d140550346592f8b4624/html5/thumbnails/22.jpg)
Let’s Review the Microprogramming Model Store the microprogram in control store Fetch the instruction Get the set of control signals from the
control word Move the microinstruction address Lather, Rinse, Repeat
![Page 23: Lecture 04 Control Units](https://reader036.vdocuments.net/reader036/viewer/2022081718/5537d140550346592f8b4624/html5/thumbnails/23.jpg)
What is Microcode?
Michael Slater's "Microprocessor Based Design" (pg.42):
Microcode tells the processor every detailed step required to execute each machine language instruction. Microcode is thus at an even more detailed level than machine language, and in fact defines the machine language. In a standard microprocessor, the microcode is stored in a ROM or a programmable logic array (PLA) that is part of the microprocessor chip and cannot be modified by the user.'
![Page 24: Lecture 04 Control Units](https://reader036.vdocuments.net/reader036/viewer/2022081718/5537d140550346592f8b4624/html5/thumbnails/24.jpg)
Thought Experiment
Why is the design a little clumsy? What can we do about it?
![Page 25: Lecture 04 Control Units](https://reader036.vdocuments.net/reader036/viewer/2022081718/5537d140550346592f8b4624/html5/thumbnails/25.jpg)
Reason for Clumsiness
JN – Conditional Flag check Without any condition check, the whole
process is very smooth Solution – Avoid all conditional checks
![Page 26: Lecture 04 Control Units](https://reader036.vdocuments.net/reader036/viewer/2022081718/5537d140550346592f8b4624/html5/thumbnails/26.jpg)
Real Life
A little American Football Story Theory vs. Practice
In theory, there is no difference between theory and practice
In practice, theory and practice are two different things altogether
Live with condition checksKeep designs as clean as possible
![Page 27: Lecture 04 Control Units](https://reader036.vdocuments.net/reader036/viewer/2022081718/5537d140550346592f8b4624/html5/thumbnails/27.jpg)
A General Approach
IR
Starting and Branch
AddressGenerator
uPC
Control Store
Control Word
External Inputs
Conditional Codes
![Page 28: Lecture 04 Control Units](https://reader036.vdocuments.net/reader036/viewer/2022081718/5537d140550346592f8b4624/html5/thumbnails/28.jpg)
Format of Microinstructions
Pick yoursYour choice is as best as your neighbor’s
What we did :One bit position per control signalOrder of the bits ?
Don’t matterCan result in long microinstructions
Not the number of microinstructions, but the width
![Page 29: Lecture 04 Control Units](https://reader036.vdocuments.net/reader036/viewer/2022081718/5537d140550346592f8b4624/html5/thumbnails/29.jpg)
A Note About Density
Observe that only a few bits are set to 1 Poor usage of bit space This scheme is called Horizontal
Microprogram Alternate Version : Encode the bits
Vertical Microprogram
![Page 30: Lecture 04 Control Units](https://reader036.vdocuments.net/reader036/viewer/2022081718/5537d140550346592f8b4624/html5/thumbnails/30.jpg)
Vertical Microprogram
Encode the bits by grouping similar elements together
General Idea :Group similar resources together
There can be only one source or destination register
Some operations are mutually exclusive Read vs Write of memory
![Page 31: Lecture 04 Control Units](https://reader036.vdocuments.net/reader036/viewer/2022081718/5537d140550346592f8b4624/html5/thumbnails/31.jpg)
Design Issues
Encoding reduces the bit-spaceBut requires decoders
Cost of decoder vs bit-spaceUsually decoder cost is very low
![Page 32: Lecture 04 Control Units](https://reader036.vdocuments.net/reader036/viewer/2022081718/5537d140550346592f8b4624/html5/thumbnails/32.jpg)
Another Idea
Group concuurently active signals Every meaningful combination gets a code Complex decoder to interpret every code
![Page 33: Lecture 04 Control Units](https://reader036.vdocuments.net/reader036/viewer/2022081718/5537d140550346592f8b4624/html5/thumbnails/33.jpg)
Vertical vs Horizontal
Horizontal FasterMore areaMore common currently
Cheap transistors
VerticalSlowerMore microinstructions
![Page 34: Lecture 04 Control Units](https://reader036.vdocuments.net/reader036/viewer/2022081718/5537d140550346592f8b4624/html5/thumbnails/34.jpg)
Microsequencing
Other ways to save on hardware Every instruction had its own
microprogram sequence Also, instructions have several addressing
modesOnly the first few microinstructions differ
Can we share microcode?
![Page 35: Lecture 04 Control Units](https://reader036.vdocuments.net/reader036/viewer/2022081718/5537d140550346592f8b4624/html5/thumbnails/35.jpg)
A Powerful Technique in Sharing
Bit-ORing Example Two instructions share some microcode Eventually, must branch The default branch (one instruction’s) is X0 The other branch is stored at X1 Change the least significant bit(s?) to get a new address
Compare that with : Having two conditional branches Store two fields, one for each branch Both very unclean
![Page 36: Lecture 04 Control Units](https://reader036.vdocuments.net/reader036/viewer/2022081718/5537d140550346592f8b4624/html5/thumbnails/36.jpg)
Thought Experiment :
What if we provided explicit branch instead of storing next field in our microprogram?
Typical instruction set will need a lot of branches
Lot of time will be wasted on branching
![Page 37: Lecture 04 Control Units](https://reader036.vdocuments.net/reader036/viewer/2022081718/5537d140550346592f8b4624/html5/thumbnails/37.jpg)
A Pat on Our Back
We provided explicit field for addressBranch location is now data It is already saved
Caution :Microinstruction can get very wide
Solution :There is no free lunch.
![Page 38: Lecture 04 Control Units](https://reader036.vdocuments.net/reader036/viewer/2022081718/5537d140550346592f8b4624/html5/thumbnails/38.jpg)
Can we pipeline microfetch?
A neat idea : Why wait till the current micro-op is over? Branch field gives next operation Get the next op
Caveat : External inputs and status flags may change the order What about interrupts?
They are going to follow you everywhere Should have a mechanism that can invalidate microcode
prefetch Similar to pipeline flush for instructions
Commonly used
![Page 39: Lecture 04 Control Units](https://reader036.vdocuments.net/reader036/viewer/2022081718/5537d140550346592f8b4624/html5/thumbnails/39.jpg)
Historical Perspectives
Hardwired Logic Popular before 60’s
Only way people did it Popular now
Speed Benefits
Microprogram Popular in 70’s
Memory was slower than CPU No on-chip cache Best way is to store the microcode
Now – Depends on who you ask? Shades of gray :
Extremes of spectrum are harder to find nowadays
![Page 40: Lecture 04 Control Units](https://reader036.vdocuments.net/reader036/viewer/2022081718/5537d140550346592f8b4624/html5/thumbnails/40.jpg)
Tools for Design
Hardwired Any state machine optimizer Assigning states, minimizing tranisitions, races,
hazards,…….. Microcoding
Small ones can be in binary Large ones – Use microassembler
Very useful debug tool Can use microassembler simultaneously with actual
hardware development
![Page 41: Lecture 04 Control Units](https://reader036.vdocuments.net/reader036/viewer/2022081718/5537d140550346592f8b4624/html5/thumbnails/41.jpg)
Hardwired vs Microcoding
Hardwired units are faster and smaller Emulation is easy with microcoding Hardwired design is complex if large Bugs in hardwired design cannot be fixed
in field Hardwired control is not suited for loops
Looping with microcode can be made as fast
![Page 42: Lecture 04 Control Units](https://reader036.vdocuments.net/reader036/viewer/2022081718/5537d140550346592f8b4624/html5/thumbnails/42.jpg)
Hardwired vs Microcode vs RISC RISC
Simpler instruction set Hardwired Implementation
RISC instructions are like microcodes Instructions come from I-Cache instead of Control
Store
Difference : Contents are not fixed Advantage : Only load what you want on the I-Cache
Keeps size smaller as compared to Control Stores
![Page 43: Lecture 04 Control Units](https://reader036.vdocuments.net/reader036/viewer/2022081718/5537d140550346592f8b4624/html5/thumbnails/43.jpg)
Microprogram vs Software Imagine Floating Point Division Solution 1 : Write in software
Long process Error prone Many fetches repeatedly from memory for the given
sequence of operations
Solution 2 : Microcode Long process too – but designer’s not programmers Relatively error free – more thorough design Requires many cycles but fetched and used locally
![Page 44: Lecture 04 Control Units](https://reader036.vdocuments.net/reader036/viewer/2022081718/5537d140550346592f8b4624/html5/thumbnails/44.jpg)
Emulation A very common use of microcoding IBM System/360
32 bit architecture 16-bit registers
Secret : Most implementations were 8-bit
Keep cost low Heavy microcoding Programmers oblivious
In 1992, International Meta Systems (IMS) announced the 3250 Designed to emulate the x86, 68K, and 6502 architectures Uses customizable microcode, among other techniques Went bust, never released
![Page 45: Lecture 04 Control Units](https://reader036.vdocuments.net/reader036/viewer/2022081718/5537d140550346592f8b4624/html5/thumbnails/45.jpg)
Another Interesting Note
Writable Control StoreWhat if you, a programmer, can write your
own control store?Not a mad scientist thought
Implemented inVAX 8800PDP-11/60 IBM System/370
![Page 46: Lecture 04 Control Units](https://reader036.vdocuments.net/reader036/viewer/2022081718/5537d140550346592f8b4624/html5/thumbnails/46.jpg)
Current Trends
Microcode Update Linux Utility - microcode_ctl
Companion to IA32 microcode driver It decodes and sends new microcode to the kernel
driver to be uploaded to Intel IA32 processors Update is volatile – lost on reboots
Microcode updates are also rolled into BIOS updates typically Ready even before an OS is loaded
![Page 47: Lecture 04 Control Units](https://reader036.vdocuments.net/reader036/viewer/2022081718/5537d140550346592f8b4624/html5/thumbnails/47.jpg)
Intel Said…..
The Pentium(R) Pro processor and Pentium(R) II processor maycontain design defects or errors known as errata that may cause theproduct to deviate from published specifications. Many times, theeffects of the errata can be avoided by implementing hardware orsoftware work-arounds, which are documented in the Pentium Pro Processor Specification Update and the Pentium II ProcessorSpecification Update. Pentium Pro and Pentium II processors include afeature called "reprogrammable microcode", which allows certain typesof errata to be worked around via microcode updates. The microcodeupdates reside in the system BIOS and are loaded into the processorby the system BIOS during the Power-On Self Test, or POST.
![Page 48: Lecture 04 Control Units](https://reader036.vdocuments.net/reader036/viewer/2022081718/5537d140550346592f8b4624/html5/thumbnails/48.jpg)
Current Trends
Hyperthreading in P4A second logical CPUComplete state of the system in both CPUs
Microcoding in P4Two pointers control flow independentlyBoth processors share the ROM entriesAccess is alternated between the CPUs
![Page 49: Lecture 04 Control Units](https://reader036.vdocuments.net/reader036/viewer/2022081718/5537d140550346592f8b4624/html5/thumbnails/49.jpg)
Thank You