ee37e20051 lesson 5: processor design topic 1 – methods and concepts
TRANSCRIPT
EE37E 2005 1
Lesson 5: Processor Design
Topic 1 – Methods and Concepts
EE37E 2005 2
Introduction
References:-Modern Processor Design Book ( pp. 1 – 16)- Computer Organization and Design Book (pp. 54- 89)
EE37E 2005 3
• While introducing this topic we will focus on these points:– Evolution of microprocessors– Instruction set processor design– Principles
• Microprocessors are Instruction set processors (ISPs). • An ISP executes instructions from a predefined
instruction set.• A microprocessor’s functionality is fully characterized
by the instruction set it is capable of executing.• This predefined instruction set is also called the
instruction set architecture.
EE37E 2005 4
• An ISA serves as an interface between software and hardware.
• In terms of processor design methodology, an ISA is the specification of the design while the microprocessor or ISP is the implementation of a design.
EE37E 2005 5
Computer System ComponentsComputer System Components
SDRAMPC100/PC133100-133MHZ64-128 bits wide2-way inteleaved~ 900 MBYTES/SEC
Double DateRate (DDR) SDRAMPC3200400MHZ (effective 200x2)64-128 bits wide4-way interleaved~3.2 GBYTES/SEC(second half 2002)
RAMbus DRAM (RDRAM)PC800, PC1060 400-533MHZ (DDR)16-32 bits wide channel~ 1.6 - 3.2 GBYTES/SEC ( per channel)
CPU
CachesSystem Bus
I/O Devices:
Memory
Controllers
adapters
DisksDisplaysKeyboards
Networks
NICs
I/O BusesMemoryController
Examples: Alpha, AMD K7: EV6, 400MHZ Intel PII, PIII: GTL+ 133MHZ Intel P4 800MHZ
Example: PCI-X 133MHZ PCI, 33-66MHZ 32-64 bits wide 133-1024 MBYTES/SEC
1000MHZ - 3 GHZ (a multiple of system bus speed)Pipelined ( 7 -21 stages )Superscalar (max ~ 4 instructions/cycle) single-threadedDynamically-Scheduled or VLIWDynamic and static branch prediction
L1
L2 L3
Memory Bus
Support for one or more CPUs
Fast EthernetGigabit EthernetATM, Token Ring ..
NorthBridge
SouthBridge
Chipset
EE37E 2005 6
Computer System ComponentsComputer System Components
CPU
CachesSystem Bus
I/O Devices:
Memory
Controllers
adapters
Disks (RAID)DisplaysKeyboards
Networks
NICs
I/O BusesMemoryController
L1
L2 L3
Memory Bus
Conventional & Block-based Trace Cache.
Integrate MemoryController & a portionof main memory with CPU: Intelligent RAM
Integrated memory Controller: AMD Opetron
IBM Power5
Memory Latency Reduction:
Enhanced CPU Performance & Capabilities:
• Support for Simultaneous Multithreading (SMT): Alpha EV8.• VLIW & intelligent compiler techniques: Intel/HP EPIC IA-64.• More Advanced Branch Prediction Techniques.• Chip Multiprocessors (CMPs): The Hydra Project. IBM Power 4,5• Vector processing capability: Vector Intelligent RAM (VIRAM). Or Multimedia ISA extension.• Digital Signal Processing (DSP) capability in system.• Re-Configurable Computing hardware capability in system.
SMTCMP
NorthBridge
SouthBridge
Chipset
EE37E 2005
Recent Trends in Computer DesignRecent Trends in Computer Design• The cost/performance ratio of computing systems have seen a
steady decline due to advances in:– Integrated circuit technology: decreasing feature size,
• Clock rate improves roughly proportional to improvement in • Number of transistors improves proportional to (or faster).
– Architectural improvements in CPU design.
• Microprocessor systems directly reflect IC improvement in terms of a yearly 35 to 55% improvement in performance.
• Assembly language has been mostly eliminated and replaced by other alternatives such as C or C++
• Standard operating Systems (UNIX, NT) lowered the cost of introducing new architectures.
• Emergence of RISC architectures and RISC-core architectures.
• Adoption of quantitative approaches to computer design based on empirical performance observations.
EE37E 2005 8
Microprocessor Architecture TrendsMicroprocessor Architecture Trends
C IS C M ac h i n e sins truc tio ns take var iable t im e s to c o m ple te
R IS C M ac h i n e s ( m i c r o c o d e )s im ple ins truc tio ns , o ptim ize d fo r spe e d
R IS C M ac h i n e s ( p i p e l i n e d )s am e individual ins truc tio n late nc y
gre ate r thro ughput thro ugh ins truc tio n "o ve r lap"
S u p e r s c a l ar P r o c e s s o r sm ultiple ins truc tio ns e xe c uting s im ultane o us ly
M u l t i t h r e ad e d P r o c e s s o r saddit io nal H W re so urc e s ( re gs , P C , SP )e ac h c o nte xt ge ts pro c e s so r fo r x c yc le s
V L IW"Supe r ins truc tio ns " gro upe d to ge the r
de c re ase d H W c o ntro l c o m ple xity
S i n g l e C h i p M u l t i p r o c e s s o r sduplic ate e ntire pro c e s so rs
( te c h so o n due to M o o re 's Law)
S IM U L TA N E O U S M U L TITH R E A D IN Gm ultiple H W c o nte xts ( re gs , P C , SP )e ac h c yc le , any c o nte xt m ay e xe c ute
CMPs
(SMT)
SMT/CMPs (e.g. IBM Power5 in 2004)
EE37E 2005 9Year
Tra
nsis
tors
1000
10000
100000
1000000
10000000
100000000
1970 1975 1980 1985 1990 1995 2000
i80386
i4004
i8080
Pentium
i80486
i80286
i8086
Evolution of microprocessors
CMOS improvements:• Die size: 2X every 3 yrs• Line width: halve / 4-7 yrs
“Graduation Window”
Alpha 21264: 15 millionPentium Pro: 5.5 millionPowerPC 620: 6.9 millionAlpha 21164: 9.3 millionSparc Ultra: 5.2 million
Moore’s Law
Figure1: Evolution of microprocessors
EE37E 2005 10
• Three decades of the history of microprocessors tell a truly remarkable story of advances in the computer industry (Table 1).
1970 - 1980
1980 - 1990
1990 -2000
2000 -2010
Transistor count
2K – 100K 100K – 1 M
1M – 100M
100M – 2 B
Clock frequency
0.1 – 3 MHz
3 – 30 MHz
30 MHz – 1 GHz
1 – 15 GHz
Instructions/Cycle
0.1IPC 0.1IPC-0.9IPC
0.9IPC-1.9IPC
1.9IPC-2.9IPC
Table 1. The amazing decades of the evolution of microprocessors
EE37E 2005 11
Hierarchy of Computer ArchitectureHierarchy of Computer Architecture
I/O systemInstr. Set Proc.
Compiler
OperatingSystem
Application
Digital DesignCircuit Design
Instruction Set Architecture
Firmware
Datapath & Control
Layout
Software
Hardware
Software/Hardware Boundary
High-Level Language Programs
Assembly LanguagePrograms
Microprogram
Register TransferNotation (RTN)
Logic Diagrams
Circuit Diagrams
Machine Language Program
EE37E 2005 12
Instruction Set Processor Design
• Critical to an ISP is the instruction set architecture, which specifies the functionality that must be implemented by the instruction set processor (ISP).
EE37E 2005 13
The Design Process
• "To Design Is To Represent“– Design activity yields description/representation of
an object• Traditional craftsman does not distinguish between
the conceptualization and the artifact• Separation comes about because of complexity• Concept is captured in one or more representation
languages
– This process IS design
• Design Begins With Requirements– Functional Capabilities: what it will do– Performance Characteristics: Speed, Power, Area,
Cost, . . .
EE37E 2005 14
Design Process (cont.)
• Design Finishes As Assembly– Design understood in terms of
components and how they have been assembled
– Top Down decomposition of complex functions (behaviors) into more primitive functions
• Bottom-up composition of primitive building blocks into more complex assemblies
CPU
Datapath Control
ALU Regs Shifter
NandGate
Design is a "creative process," not a simple method
EE37E 2005 15
Design as Search
Design involves educated guesses and verification
-- Given the goals, how should these be prioritized?
-- Given alternative design pieces, which should be selected?
-- Given design space of components & assemblies, which part will yield the best solution?
Feasible (good) choices vs. Optimal choices
Problem A
Strategy 1 Strategy 2
SubProb 1 SubProb2 SubProb3
BB1 BB2 BB3 BBn
EE37E 2005 16
Instruction Set Architecture(subset of Computer Architecture)
“... the attributes of a [computing] system as seen by the programmer, i.e., the conceptual structure and functional behavior, as distinct from the organization of the data flows and controls the logic design, and the physical implementation.”
– Amdahl, Blaaw, and Brooks, 1964
SOFTWARESOFTWARE• Organization of Programmable Storage
• Data Types & Data Structures: Encodings & Representations
• Instruction Set
• Instruction Formats
• Modes of Addressing and Accessing Data Items and Instructions
• Exceptional Conditions
EE37E 2005 17
The Instruction Set: a Critical Interface
instruction set
software
hardware
Figure 2: ISA
EE37E 2005 18
Dynamic Static Interface
• We have discussed two critical roles played by the ISA:– Contract between software and Hardware, which
facilitates the development pf programs and machines
– Specification for microprocessor design
• The third role is an associated definition of an interface that separates what is done statically at the compile time versus what is done dynamically at run time. This interface is called the “ Dynamic-static Interface”
EE37E 2005 19
Architecture (DSI)
Program
Machine (Hardware)
(Software)
Compiler
complexity
Hardware
complexity
Exposed to software
Hidden in hardware
“Static”
“Dynamic”
Figure 3: The dynamic-static feature
EE37E 2005 20
Computer Architecture Topics
Instruction Set Architecture
Pipelining, Hazard Resolution,Superscalar, Reordering, Prediction, Speculation,Vector, DSP
Addressing,Protection,Exception Handling
L1 Cache
L2 Cache
DRAM
Disks, WORM, Tape
Coherence,Bandwidth,Latency
Emerging TechnologiesInterleavingBus protocols
RAID
VLSI
Input/Output and Storage
MemoryHierarchy
Pipelining and Instruction Level Parallelism
EE37E 2005 21
Principles of Processor Performance
EE37E 2005 22
Definitions•Performance is in units of things per sec
– bigger is better
•If we are primarily concerned with response time–performance(x) = 1
execution_time(x)
" X is n times faster than Y" means
Performance(X) Execution_time(Y) n = =
Performance(Y) Execution_time(X)
EE37E 2005 23
Cycles Per Instruction
IC = Instruction CountCPI = Clock Per Instruction
n
jjj ICPI
CPI
1
Time Cycle timeCPU
RateClock
CPIIC timeCPU
timecycleClock CPIIC timeCPUIC
cyclesclock ofNumber
FrequencyClock
cyclesclock ofNumber timeCPU
timecycleClock cyclesclock ofNumber timeCPU
EE37E 2005 24
Cycles Per Instruction
We may separate the contribution of each type ofinstruction to the execution time defining:
j
CPIj
IC
ICCPI
j
j
n
jjj
n instructio execute torequired clocks
ofnumber average theis and executed, is
ninstructio that timesofnumber theis where
cyclesclock ofNumber 1
Processor pipelining and memory interactions limit the accuracy of this approach, but its a good first guess. For accuracy, it is necessary to simulate the instructions of an entire program with issue, pipeline and memory interactions.
EE37E 2005 25
Aspects of CPU Performance (CPU Law)
CPU time = Seconds = Instructions x Cycles x Seconds
Program Program Instruction Cycle
CPU time = Seconds = Instructions x Cycles x Seconds
Program Program Instruction Cycle
EE37E 2005 26
Amdahl's Law
Speedup due to enhancement E:
Suppose that enhancement E accelerates a fraction F of the task by a factor S, and the remainder of the task is unaffected
E.g. special instructions, memory, IO, parallel processing
E w/oePerformanc
E w/ ePerformanc
E w/ Time Exec
E w/oTime ExecSpeedup(E)
EE37E 2005 27
Amdahl’s Law
enhanced
enhancedenhancedoldnew Speedup
FractionFraction1ExTimeExTime
enhanced
enhancedenhanced
new
oldoverall
Speedup
FractionFraction1
1
ExTime
ExTimeSpeedup
EE37E 2005 28
Amdahl’s Law
• Example: Floating point instructions improved to run 2X; but only 10% of actual instructions are FP
95.0ExTime2
0.11.01ExTimeExTime oldoldnew
053.195.0
1
95.0ExTime
ExTime
ExTime
ExTimeSpeedup
old
old
new
oldoverall
EE37E 2005 29
Topic 2: Instruction Set Architecture Design
Adapted from Prof. Jerry Breecher’s Notes + my CS21Q Notes
(http://babbage.clarku.edu/~jbreecher/arch/arch.html)
EE37E 2005 30
Introduction
7.1 Introduction
7.2 Classifying Instruction Set Architectures
7.3 Memory Addressing
7.4 Operations in the Instruction Set
7.5 Type and Size of Operands
7.6 Encoding and Instruction Set
7.7 The Role of Compilers
7.8 The MIPS Architecture and Bonus
7.9. Endianess
EE37E 2005 31
Introduction
The Instruction Set Architecture is that portion of the machine visible to the assembly level programmer or to the compiler writer.
Questions:
- What are the advantages and disadvantages of various instruction set alternatives?
- How do languages and compilers affect ISA?
instruction set
software
hardware
EE37E 2005 32
Classifying Instruction Set Architectures
Classifications can be by:
1. Stack/accumulator/register2. Number of memory operands.3. Number of total operands.
EE37E 2005 33
Instruction Set Architectures
Accumulator:1 address add A acc acc + mem[A]1+x address addx A acc acc + mem[A + x]
Stack:0 address add tos tos + next
General Purpose Register:2 address add A B EA(A) EA(A) + EA(B)3 address add A B C EA(A) EA(B) + EA(C)
Load/Store:0 Memory load R1, Mem1
load R2, Mem2add R1, R2
1 Memory add R1, Mem2
Basic ISA Classes
ALU Instructions can have two or three operands.
ALU Instructions can have 0, 1, 2, 3 operands. Shown here are cases of 0 and 1.
EE37E 2005 34
Instruction Set Architectures
Basic ISA Classes
Stack Accumulator Register(Register-memory)
Register(load-store)
Push A Load A Load R1, A Load R1, A
Push B Add B Add R1, B Load R2, B
Add Store C Store C, R1 Add R3, R1, R2
Pop C Store C, R3
The results of different address classes is easiest to see with the examples here, all of which implement the sequences for C = A + B.
Registers are the class that won out. The more registers on the CPU, the better.
EE37E 2005 35
Instruction Set Architectures
Intel 80x86 Integer Registers
GPR0 EAX Accumulator
GPR1 ECX Count register, string, loop
GPR2 EDX Data Register; multiply, divide
GPR3 EBX Base Address Register
GPR4 ESP Stack Pointer
GPR5 EBP Base Pointer – for base of stack seg.
GPR6 ESI Index Register
GPR7 EDI Index Register
CS Code Segment Pointer
SS Stack Segment Pointer
DS Data Segment Pointer
ES Extra Data Segment Pointer
FS Data Seg. 2
GS Data Seg. 3
PC EIP Instruction Counter
Eflags Condition Codes
EE37E 2005 36
Memory Addressing
Sections Include:
Interpreting Memory Addresses
Addressing Modes
Displacement Address Mode
Immediate Address Mode
EE37E 2005 37
Memory Addressing
What object is accessed as a function of the address and length?
Objects have byte addresses – an address refers to the number of bytes counted from the beginning of memory.
Little Endian – puts the byte whose address is xx00 at the least significant position in the word.
Big Endian – puts the byte whose address is xx00 at the most significant position in the word.
Alignment – data must be aligned on a boundary equal to its size. Misalignment typically results in an alignment fault that must be handled by the Operating System.
Interpreting Memory Addresses
EE37E 2005 38
Memory Addressing
Addressing Modes
This table shows the most common modes. A more complete set is in Figure 2.6
Addressing Mode Example Instruction Meaning When Used
Register Add R4, R3 R[R4] <- R[R4] + R[R3] When a value is in a register.
Immediate Add R4, #3 R[R4] <- R[R4] + 3 For constants.
Displacement Add R4, 100(R1) R[R4] <- R[R4] + M[100+R[R1] ]
Accessing local variables.
Register Deferred Add R4, (R1) R[R4] <- R[R4] + M[R[R1] ]
Using a pointer or a computed address.
Absolute Add R4, (1001) R[R4] <- R[R4] + M[1001] Used for static data.
EE37E 2005 39
Memory Addressing
Displacement Addressing Mode
How big should the displacement be?
For addresses that do fit in displacement size:Add R4, 10000 (R0)
For addresses that don’t fit in displacement size, the compiler must do the following:
Load R1, addressAdd R4, 0 (R1)
Depends on typical displaces as to how big this should be.
On both IA32 and DLX, the space allocated is 16 bits.
EE37E 2005 40
Memory Addressing
Immediate Address Mode
Used where we want to get to a numerical value in an instruction.
So how would you get a 32 bit value into a register?
At high level:
a = b + 3;
if ( a > 17 )
goto Addr
At Assembler level:
Load R2, 3Add R0, R1, R2
Load R2, 17CMPBGT R1, R2
Load R1, AddressJump (R1)
EE37E 2005 41
Operations In The Instruction Set
Sections Include:
Detailed information about types of instructions.
Instructions for Control Flow (conditional branches, jumps)
EE37E 2005 42
Operations In The Instruction Set
Arithmetic and logical and, add Data transfer move, load Control branch, jump, call System system call, traps Floating point add, mul, div, sqrt Decimal add, convert String move, compare Multimedia - 2D, 3D? e.g., Intel MMX and Sun VIS
Operator Types
EE37E 2005 43
Operations In The Instruction Set
Control Instructions Issues:– taken or not– where is the target – link return address– save or restore
Instructions that change the PC:– (conditional) branches, (unconditional) jumps– function calls, function returns– system calls, system returns
Control InstructionsConditional branches are 20%
of all instructions!!
EE37E 2005 44
Type And Size of Operands
The type of the operand is usually encoded in the Opcode – a LDW implies loading of a word.
Common sizes are:
Character (1 byte)Half word (16 bits)Word (32 bits)Single Precision Floating Point (1 Word)Double Precision Floating Point (2 Words)
Integers are two’s complement binary.Floating point is IEEE 754.Some languages (like COBOL) use packed decimal.
EE37E 2005 45
The MIPS Architecture
MIPS is very RISC oriented.
EE37E 2005 46
The MIPS Architecture
MIPS Characteristics
There’s MIPS – 32 that we learned in CS140
32 bit byte addresses aligned Load/store only displacement
addressing Standard datatypes 3 fixed length formats 32 32 bit GPRs (r0 = 0) 16 64 bit (32 32 bit) FPRs FP status register No Condition Codes
Data transfer • load/store word, load/store
byte/halfword signed? • load/store FP single/double • moves between GPRs and FPRs ALU • add/subtract signed? immediate? • multiply/divide signed? • and,or,xor immediate?, shifts: ll, rl,
ra immediate? • sets immediate?
There’s MIPS – 64 – the current arch.Standard datatypes 4 fixed length formats (8,16,32,64)32 64 bit GPRs (r0 = 0) 64 64 bit FPRs
Addressing Modes• Immediate• Displacement • (Register Mode used only for ALU)
EE37E 2005 47
The MIPS Architecture
MIPS Characteristics
Control • branches == 0, <> 0 • conditional branch testing FP bit • jump, jump register • jump & link, jump & link register • trap, return from exception
Floating Point• add/sub/mul/div • single/double • fp converts, fp set
EE37E 2005 48
The MIPS Architecture
The MIPS Encoding
Op
31 26 01516202125
Rs1 Rd immediate
Op
31 26 025
Op
31 26 01516202125
Rs1 Rs2
target
Rd Opx
Register-Register
561011
Register-Immediate
Op
31 26 01516202125
Rs1 Rs2/Opx immediate
Branch
Jump / Call
EE37E 2005 49
Byte Ordering
• How should bytes within multi-byte word be ordered in memory?
• Conventions– Sun’s, Mac’s are “Big Endian” machines
• Least significant byte has highest address
– Alphas, PC’s are “Little Endian” machines• Least significant byte has lowest address
EE37E 2005 50
Byte Ordering Example
• Big Endian– Least significant byte has highest address
• Little Endian– Least significant byte has lowest address
• Example– Variable x has 4-byte representation 0x01234567– Address given by &x is 0x100
0x100 0x101 0x102 0x103
01 23 45 67
0x100 0x101 0x102 0x103
67 45 23 01
Big Endian
Little Endian
01 23 45 67
67 45 23 01
EE37E 2005 51
Machine-Level Code Representation
• Encode Program as Sequence of Instructions– Each simple operation
• Arithmetic operation• Read or write memory• Conditional branch
– Instructions encoded as bytes• Alpha’s, Sun’s, Mac’s use 4 byte instructions
– Reduced Instruction Set Computer (RISC)• PC’s use variable length instructions
– Complex Instruction Set Computer (CISC)– Different instruction types and encodings for different
machines• Most code not binary compatible
• Programs are Byte Sequences Too!
EE37E 2005 52
Classification of Processors• We can classify processors according to the areas
in which they are mostly used. • We can identity four different group of processors:
– General purpose processors that are used in building computers
– Digital Signal processors which are processors designed specifically for signal processing.
– Microcontrollers which are small microcromputers which integrate in the same chip a core processors plus I/O elements and small amount of memories
– Application specific processors which design to performed specific function (i.e. Network processors)
EE37E 2005 53
General Purpose Processors
• These processors are used to built major computer platforms.
• We can name:– Intel / AMD based computers also called IBM
compatible– Macintosh computers built using PowerPC
processors– Sun machines that use Ultrasparc Processors.
EE37E 2005 54
Examples of General Purpose Processors
Type of Computer Processors Used Technology
Macinstosh PowerPC (IBM, Motorola)
Superscalar
Sun Ultrasparc(SUN)
RISC
IBM Compatible Intel ProcessorsAthlon, Duron (AMD), Cyrix
Superscalar
EE37E 2005 55
DSP• Digital Signal Processing (DSP) is used in a wide variety of
applications, and it is hard to find a good definition that is general.• We can start by dictionary definitions of the words:
– Digital * operating by the use of discrete signals to represent data
in the form of numbers – Signal * a variable parameter by which information is conveyed
through an electronic circuit – Processing * to perform operations on data according to programmed
instructions • Which leads us to a simple definition of: Digital Signal
processing
* changing or analyzing information which is measured as discrete sequences of numbers
EE37E 2005 56
• Note two unique features of Digital Signal processing as opposed to plain old ordinary digital processing:– signals come from the real world - this intimate connection with the
real world leads to many unique needs such as the need to react in real time and a need to measure signals and convert them to digital numbers
– signals are discrete - which means the information in between discrete samples is lost
• The advantages of DSP are common to many digital systems and include:– Versatility:
• digital systems can be reprogrammed for other applications (at least where programmable DSP chips are used)
• digital systems can be ported to different hardware (for example a different DSP chip or board level product)
– Repeatability:• digital systems can be easily duplicated • digital systems do not depend on strict component tolerances • digital system responses do not drift with temperature
– Simplicity:• some things can be done more easily digitally than with analogue
systems
EE37E 2005 57
• DSP is used in a very wide variety of applications.
• But most share some common features:– they use a lot of math
(multiplying and adding signals)
– they deal with signals that come from the real world
– they require a response in a certain time
• Where general purpose DSP processors are concerned, most applications deal with signal frequencies that are in the audio range.