architecting for vlsi implementation - smdp2vlsi.gov.in · architecting for vlsi implementation...
TRANSCRIPT
Architecting for VLSI Implementation
Presenter : Chandra Shekhar
DirectorCEERIPilani – 333 031(Rajasthan)
Phone : � � �� � � �� � � � �FAX : � � � � � � �� � � � �
Email : � � �� � � � � � � � �� � � � �
Architecting for VLSI Implementation
Logic Specification vs. Implementation
Logic Specification and Logic Implementation are two different things.
� Logic Specification precedes Logic Implementation.
� For a particular Logic Specification, there are many different possible LogicImplementations.
� These different Logic Implementations may widely differ in their cost, speedof operation and power consumption.
Logic Specification is also called Behavioural Description of logic.
Logic Implementation is also called Structural Description of logic.c� CEERI, Pilani 1
Architecting for VLSI Implementation
Specifying Logic
How do we specify logic ?
1. Through Boolean Expressions.
� � �� �� � � � � � �� � � � � � � � � �2. Through Truth Tables.
3. Through Natural Language Statements.
c� CEERI, Pilani 2
Architecting for VLSI Implementation
�
!"Specifying Logic
4. Through Programming Language Statements.
( � , � � � , # $% $& ,� $% , ' () * ) � + , . . . )
5. Through Behavioural Description Constructs of Hardware Description Lan-guages ( ,- � . , , � � & � / ) e.g. Process statement in ,- � . .
c� CEERI, Pilani 3
Architecting for VLSI Implementation
Implementing Logic
How do you efficiently implement logic given the constraints on
� Speed of Operation.
� Power Consumption.
� Design Time.
� Design Cost.
c� CEERI, Pilani 4
Architecting for VLSI Implementation
�
!"Implementing Logic
� Product Cost.
� Upgradability.
The strategic planning and selection of an optimal approach for implementationof logic is typically called architecting or architecture design.
c� CEERI, Pilani 5
Architecting for VLSI Implementation
Specifying and Implementing Logic
Example Logic Specification
0 � � � � � � � � � 1Architecture #1 (Combinational) for Logic Implementation
++
++
ABCD
E
Z
c� CEERI, Pilani 6
Architecting for VLSI Implementation
�
!"Implementing Logic
Architecture #2 (Combinational) for Logic Implementation
+AB
DC
+E
++
Z
c� CEERI, Pilani 7
Architecting for VLSI Implementation
�
!"Implementing Logic
Architecture #3 (Sequential) for Logic Implementation
B C D E RA
2:1
+
Control
Mux
Select
c� CEERI, Pilani 8
Architecting for VLSI Implementation
Sequential Architectures
Characteristics of Sequential Architectures:
� They need storage elements besides combinational logic.
� They need a sequence of steps to implement the full logic specification.
� Next step should be taken only when the logic function of the previousstep has been completed and its result saved.
c� CEERI, Pilani 9
Architecting for VLSI Implementation
�
!"Sequential Architectures
� The stepping can be asynchronous/self-timed/synchronous (with a timingsignal called clock).
� Depending upon the selection of method of stepping, sequential architec-tures can be asynchronous/self-timed/synchronous.
c� CEERI, Pilani 10
Architecting for VLSI Implementation
�
!"Implementing Logic
Architecture #4 (Pipelined; Synchronous) for Logic Implementation
+AB
DC
+
++
Z
E
c� CEERI, Pilani 11
Architecting for VLSI Implementation
�
!"Implementing Logic
Architecture #5 (Pipelined; Synchronous) for Logic Implementation
+AB
DC
+
+ +Z
E
c� CEERI, Pilani 12
Architecting for VLSI Implementation
Pipelined Architectures
Characteristics of Pipelined Architectures:
� They increase the sustained throughput of logic function computation (roughlyby a factor of 2 for a 2 -stage pipelined architecture)
� They do not reduce the delay of computation of the logic function.
� Their cost is higher due to the need of pipeline registers.
c� CEERI, Pilani 13
Architecting for VLSI Implementation
�
!"Pipelined Architectures
� They can be coarse-grained or fine-grained.
� The pipeline can be balanced (all pipeline stages have identical delays) orunbalanced (different pipeline stages have different delays).
c� CEERI, Pilani 14
Architecting for VLSI Implementation
Other Architectural Choices
� Parallel Combinational Architectures.
� Parallel Sequential Architectures.
� Parallel Pipelined Architectures.
� Mixed Architectures.
c� CEERI, Pilani 15
Architecting for VLSI Implementation
�
!"Implementing Logic
Architecture #6 (Control-Programmable; Sequential) for Logic Implemen-tation
B C D E RA
Mux2:1
ALU
Op_Select
Select
Control
c� CEERI, Pilani 16
Architecting for VLSI Implementation
Control-Programmable Sequential Architectures
Characteristics of Control-Programmable Sequential Architectures:
� They have a fixed execution unit, but a programmable controller.
� By appropriately programming the controller, any logic function can beimplemented.
� A popular choice for control programming is through micro-programmingvia a Writable Control Store (WCS).
c� CEERI, Pilani 17
Architecting for VLSI Implementation
�
!"Implementing Logic
Architecture #7 (Instruction-set Based; Programmable; Sequential) forLogic Implementation
The von Neumann architecture of a general-purpose stored-program digitalcomputer (CISC).
Memory CPU
� � � � 3 � 3 0
� � � 0 3 � 3 0
� � � 0 3 � 3 0
� � � 0 3 1 3 0
c� CEERI, Pilani 18
Architecting for VLSI Implementation
CPU Block Diagram
Control
Sequencer
DecoderInstruction
Bus ControllerGenerator
Clock
State
InstructionRegister
Register Bank MAR PC ALU MDR
Execution Unit
Generator
c� CEERI, Pilani 19
Architecting for VLSI Implementation
Instruction-Set Based Architectures
Characteristics of Instruction-Set Based Architectures:
� They completely decouple the implementing of hardware from the logicspecification (the user logic specification).
� Each instruction in the instruction set specifies a soft gate (or virtual gate)with an appropriate logic function and its connectivity to other ‘soft gates’(through operand address specification).
� A sequence of instructions (program), therefore, can be translated into anequivalent logic network of ‘soft gates’ (or a netlist of ‘soft gates’).
c� CEERI, Pilani 20
Architecting for VLSI Implementation
�
!"Instruction-Set Based Architectures
� The equivalent logic network of ‘virtual gates’ (‘soft gates’) can be easilymodified by changing the order of instructions in the program (instructionsequence) or by changing the operand address or both.
� The implementation of each ‘soft logic gate’ (instruction) using hardwarelogic is done by the CPU.
� A user implements his logic specification using only ‘soft gates’.
� A Random Access Memory (RAM) is used to store the logic specification’simplementation in terms of ‘soft gates’ — including logical values of the allthe circuit nodes in the equivalent logic network of ‘soft gates’.
c� CEERI, Pilani 21
Architecting for VLSI Implementation
�
!"Instruction-Set Based Architectures
� The instruction set (which defines the ‘soft gates’) acts as a hardware-software interface for the implementation of user specified logic function.
� The hardware implementation of logic functions of each instruction (‘softgate’) is decided by the CPU architect/designer (and, therefore, is beyondthe control of the programmer).
� The ‘soft gates’ implementation of the user’s logic specification is com-pletely under the control of the programmer.
c� CEERI, Pilani 22
Architecting for VLSI Implementation
CISC Architectures (Register-Memory Architectures)
Characteristics of CISC Architectures:
� Feature a large variety of addressing modes to address the memory operand(for implementing data structures in the memory4 convenient specifica-tion of interconnections amongst ‘soft gates’).
� Typically 2 operands per instruction up to one of which can be in the mem-ory (the other is in a general purpose register).
� Most instructions can use most of the addressing modes.
c� CEERI, Pilani 23
Architecting for VLSI Implementation
�
!"Benefits of CISC Architectures
� Excellent support for data structuring and program structuring at assemblylanguage level.
� Compact object codes.
c� CEERI, Pilani 24
Architecting for VLSI Implementation
�
!"Disadvantages of CISC Architectures
� Variable instruction lengths and many different instruction formats greatlyincrease the complexity of CPU implementation (instruction decoding andcontrol generation part of the CPU).
� Widely varying clock cycle counts for completion of different instructions— makes the use of pipelining difficult.
� Increased complexity of the control part which occupies a large part of thechip area (crowding out the execution unit).
� Increased complexity of the control part also becomes a speed bottle-neck.
c� CEERI, Pilani 25
Architecting for VLSI Implementation
�
!"Implementing Logic
Architecture #8 (Instruction-set Based; Programmable; Sequential) forLogic Implementation
The Harvard architecture of a general-purpose stored-program digital com-puter (used in DSPs).
MemoryCPU
MemoryData
Instruction
c� CEERI, Pilani 26
Architecting for VLSI Implementation
Benefits of Harvard Architectures
� Reduced clock cycle counts for completion of instructions due to concur-rent fetching of operands and instructions (overlapped implementation oftwo ‘soft gates’ by the CPU).
� Increased throughput due to above.
c� CEERI, Pilani 27
Architecting for VLSI Implementation
�
!"Implementing Logic
Architecture #9 (Instruction-set Based; Programmable; Pipelined) for LogicImplementation
The RISC architecture of a general-purpose stored-program digital computer.
c� CEERI, Pilani 28
Architecting for VLSI Implementation
Pipelined RISC Architecture
Instruction Decode /Register Fetch
A
+ 4
+
Mem
ory
Inst
ruct
ion
IR
+PC
S−Ex
Reg
iste
rs
Imm
Reg
Dat
aM
emor
y
LM
DAdd
ress
Dat
a
Instruction Fetch Compute Address /Execute
Memory Access Write Back
c� CEERI, Pilani 29
Architecting for VLSI Implementation
RISC Architectures (Register-Register Architectures)
Characteristics of RISC Architectures:
� A reduced instruction set featuring only very frequently used instructionsencoded in a few simple and fixed-field instruction formats (fewer types of‘soft gates’).
� Typically having only register operands (higher speed of interconnectionsbetween ‘soft gates’).
c� CEERI, Pilani 30
Architecting for VLSI Implementation
�
!"RISC Architectures
These will drastically reduce the complexity of the control part thereby releas-ing chip area for more resources in the execution unit including larger registerfiles.
Also, easier pipelining of the CPU is possible leading to increase in speed(overlapped implementation of several ‘soft gates’ by the CPU) and throughput.
c� CEERI, Pilani 31
Architecting for VLSI Implementation
�
!"Key Features of RISC Architectures
� Load-Store architectures :
Only .� $ and 5 � � � � instructions can transfer data from and to memoryusing a few simple addressing modes.
All other instructions operate only on Register operands – typically 2 sourceoperands and 1 destination operand.
� Simplified instruction decoding.
� Drastic reduction in the complexity of the control part.
c� CEERI, Pilani 32
Architecting for VLSI Implementation
�
!"Key Features of RISC Architectures
� Easier pipelining of instruction execution.
� Much larger fraction of chip area becomes available for execution unit re-sources (e.g. a larger register file, more powerful operational units, morebuses) which can lead to enhanced performance.
c� CEERI, Pilani 33
Architecting for VLSI Implementation
Architectural Evolution of CPUs
�
!"Generation 1
F D E
Instruction 1 Instruction 2 Instruction 3
F D E F D E
Time
�
!"Generation 2
D E
F D
F D E
E
Instruction 2
Instruction 3
Instruction 1
F
F D E
Time
ExecuteDecodeFetch Instruction
c� CEERI, Pilani 34
Architecting for VLSI Implementation
�
!"Generation 3
F
D
A W
E
R
F D A R E W
F D A R E W
F D A R E W
F D A R E W
F D A R E W
F D A R E W
Time
Fetch Instruction
Decode
Address Calculation
Read Operands
Execute
Write Result
c� CEERI, Pilani 35
Architecting for VLSI Implementation
�
!"Generation 4
F D A R E W
F
D
A W
E
R
F D A R E W
F D A R E W
F D A R E W
F D A R E W
F D A R E W
F D A R E W
F D A R E W
F D A R E W
F D A R E W
F D A R E W
F D A R E W
Fetch Instruction
Decode
Address Calculation
Read Operands
Execute
Write Result
Time
c� CEERI, Pilani 36
Architecting for VLSI Implementation
�
!"Generation 5
F
D
A
F D A R
F D A
F D
F
F
F D A R
F D A R
F D A
F D A
F D
F D
F
E E E E E
E E E E E
E E E E E
E E E E
E E E
E E E E E
E E E E E
E E E E E
E E E E
E E E E E E
E E E E
E E E E E
Dataflow Model
W
W
W
W
W
W
W
W
W
W
W
W
Fetch Instruction
Decode
Address Calculation W
E
R Read Operands
Execute
Write Result
Time
c� CEERI, Pilani 37
Architecting for VLSI Implementation
Throughput and Performance Evolution
Throughput depends upon :
� How many bits does a microprocessor process simultaneously ?
4, 8, 16, 32, 64 bits (Improvement = 16 times)
� How many clock cycles does it take to complete 1 instruction (cycles perinstruction or CPI) ?
8, 1, 1/4 (Improvement = 32 times)
c� CEERI, Pilani 38
Architecting for VLSI Implementation
�
!"Throughput and Performance Evolution
� What is the maximum clock speed at which the processor can run ?
0.5 MHz (in 1971) to 3.5 GHz (in 2004) (Improvement = 7000 times)
Performance 6 Operand Bit-width
6 789 :
6 f ;<=
Total Improvement = 16 > 32 > 7000 = 3.584 Million times
c� CEERI, Pilani 39
Architecting for VLSI Implementation
Contributing Factors to Increased Throughput
1. Increase of operand bit-width (from 4 bits to 64 bits) : direct consequenceof feature size reduction and chip size increase of MOS technologies.
2. Reduction of CPI : due to architectural innovations and pipelining (includ-ing multiple pipelines running concurrently).
3. Increase of clock frequency :
� Due to architectural innovations and pipelining.
� Due to feature size reduction of MOS technologies.
c� CEERI, Pilani 40
Architecting for VLSI Implementation
SoC and Embedded System Design
SoC and Embedded System Design represents the convergence of hardwareand software design.
Besides digital functions, a SoC typically also integrates some analog and/ormixed signal and/or RF functions on a single chip.
The boundary between what functions must necessarily be done in analog (orcan be better done in analog) and what functions are better done as digital hasbeen fairly clear and stable for quite some time.
However, it is only more recent that the boundary between what digital func-tions are better done in hardware and what functions are better done in soft-ware has been sought to be defined in view of the speed-power-cost, time-to-market and system upgradability points of view of the proposed solution.c� CEERI, Pilani 41
Architecting for VLSI Implementation
Hardware vs. Software Decision
It needs reminding that software is actually implemented through a hardwarearchitecture (that of the processor) with the processor’s instruction set definingthe hardware-software boundary/interface.
Logic functionality of the instruction set is realized in hardware, where as thehigher-end logic functionality is realized in software (using a sequence of in-structions from the instruction set).
Obviously, software provides a more flexible way of performing logic functions.A change in the sequence of instructions or a change in the operands of in-structions changes the logic function. However, this flexibility is afforded bysoftware at a cost — in terms of speed and power.
c� CEERI, Pilani 42
Architecting for VLSI Implementation
�
!"Hardware vs. Software Decision
Memory provides the means of building a soft logic network (represented bysoftware) as opposed to the hard logic network (represented by hardware).
Each software logic gate receives its configuration as well as inputs from mem-ory via memory bus and stores its result in the memory via memory bus.
A hardware logic gate by contrast receives its inputs directly from the output ofa preceding hardware logic gate over a short wire.
Thus, there is typically an overhead of four memory transfers per logic opera-tion when using software logic gates as opposed to hardware logic gates.
c� CEERI, Pilani 43
Architecting for VLSI Implementation
�
!"Hardware vs. Software Decision
These memory transfers occur over the memory bus and the ? @A - ? @A busesinternal to the memory and are, therefore, very slow as well as power consum-ing owing to the large capacitances associated with memory buses (B tens of
C D ) and ? @A - ? @A buses (B several C D ).
So, software logic, though very flexible, is both slow and very power consum-ing.
Besides, there isn’t much concurrency in software logic. Classical von Neu-mann CPU architectures of software logic have no concurrency.
Pipelined RISC architectures process instructions in an overlapped mannerand hence have a concurrency equal to the number of stages in the pipeline.
c� CEERI, Pilani 44
Architecting for VLSI Implementation
�
!"Hardware vs. Software Decision
Superscalars (with multiple pipelines) have still higher concurrencies (B 10-15). However, it is no where close to the concurrency in hardware logic —which can be massive.
For these reasons, software logic provides a very flexible but slow and highpower consuming logic implementation option, whereas hardware logic pro-vides a totally rigid but fast and low power logic implementation option.
The software logic design is faster and its implementation is less expensive inmany situations.
Hence, one needs to carefully partition one’s system logic into software logicand hardware logic.
c� CEERI, Pilani 45
Architecting for VLSI Implementation
�
!"Hardware vs. Software Decision
Very often in the past performance (speed) has been the sole criterion fordeciding what portion of the system logic be implemented in hardware.
More recently, in the context of battery-operated portable/hand-held devices,power consumption has emerged as the additional criterion for deciding thesystem logic to be realized in hardware.
c� CEERI, Pilani 46
Architecting for VLSI Implementation
Architecture #10 (Application Specific Instruction-set Based; Programmable;Non-pipelined/Pipelined) for Logic Implementation
Besides the standard hardware and software options, there is another optionthat effectively draws upon the strengths of both hardware logic and softwarelogic to provide a solution that optimally mixes the benefits of both these ap-proaches in the context of a given application or class of applications — thatof Application Specific Instruction Set Processor (ASIP).
This logic implementation is application specific, using a programmable pro-cessor usually for embedded systems applications.
c� CEERI, Pilani 47
Architecting for VLSI Implementation
ASIP Architectures
ASIPs (Application Specific Instruction-Set Processors) fill the gap betweentwo kinds of architectures for electronic system design :
1. General-purpose Instruction-set + CPU based system design : where nei-ther the instruction set nor the CPU architecture is tailored for the applica-tion.
Thus, while there is all the flexibility afforded by this approach, perfor-mance may be inadequate and power consumption excessive.
2. Application-specific dedicated hardware designs : where the architectureand design are optimized for performance and power, but there is no flex-ibility.
c� CEERI, Pilani 48
Architecting for VLSI Implementation
Example ASIP Block Diagram
HOST COMM.CONTROLLER
DATARAM
OUTPUT
SAMPLE RAMSPEECH
OUTPUTREG.
MEMORY PC LOGIC
P C
INSTRUCTION
SEQUENCERCONTROL
DECODERAND
ADDR.GEN.
TEMPORARYREGISTERS
FLOATING−POINTFLOATING−POINT
ADDER−SUB MULTIPLIER
MULTI−FUNCTIONUNIT
FETCHED
REGISTERPARAMETER
UNITADDRESS
PARAMETERRAM
PROGRAM
CONTROLLER
REGISTERINSTRUCTION
c� CEERI, Pilani 49
Architecting for VLSI Implementation
Reconfigurable Computing
Another interesting and potentially useful area with a bearing on embeddedsystems is the area of reconfigurable computing. So far, by and large, logicreconfiguration has been provided by software which runs on a hard archi-tecture. However, with the FPGA technology’s integration in SoC-embeddedsystems, hardware architecture is no longer that hard. It can be reconfigured— providing major advantages of speed and power consumption.
This adds one more dimension to programming — that of hardware program-ming (e.g. architectural configuration / reconfiguration).
c� CEERI, Pilani 50
Architecting for VLSI Implementation
�
!"Reconfigurable Computing
FPGA blocks of SoCs or on SoC platforms provide a low cost means of im-plementation of Application Specific Instruction Set Processor (ASIP) ideas,and indeed, dynamically reconfigurable instruction sets and implementationarchitectures — particularly where there are repetitive functions / long runningloops.
This holds an immense potential of enhancing speed and reducing powerconsumption of single-function or multi-function / multi-standard hand-held de-vices.
c� CEERI, Pilani 51
Architecting for VLSI Implementation
Acknowledgment
I wish to thank my colleagues of the IC Design Group at CEERI, Pilani fortheir continued support, interest in larger perspectives, and enthusiasm forvisualizing the scenarios of the future — to select the new directions for R&Defforts.
c� CEERI, Pilani 52