03 isa computer architecture
TRANSCRIPT
INSTRUCTION SET ARCHITECTURE
1
Instruction Set Architecture (ISA)
2
What is an ISA? A functional contract
All ISAs similar in high-level ways But many design choices in
details Two “philosophies”:
CISC/RISC Difference is blurring
Good ISA… Enables high-performance At least doesn’t get in the way
Compatibility is a powerful force Tricks: binary translation,
µISAs
Application
OS
Firmware
Compiler
CPU I/O
Memory
Digital Circuits
Gates & Transistors
CISC vs RISC3
CISC RISC
Emphasis on hardware Emphasis on software
Includes multi-clockcomplex instructions
Single-clock,reduced instruction only
Memory-to-memory:"LOAD" and "STORE"incorporated in instructions
Register to register:"LOAD" and "STORE"are independent instructions
Transistors used for storingcomplex instructions
Spends more transistorson memory registers
Read article CISC vs RISC placed in generalshare folder of computer architecture
Program Compilation4
Program written in a “high-level” programming language C, C++, Java, C# structured control: loops, functions, conditionals structured data: scalars, arrays, pointers, structures
Compiler: translates program to assembly Parsing and straight-forward translation Compiler also optimizes Compiler itself another application … who compiled
compiler?
CPUMem I/O
System softwareAppApp App
int array[100], sum;
void array_sum() {
for (int i=0; i<100;i++) {
sum += array[i];
}
}
Assembly & Machine Language5
Assembly language Human-readable representation
Machine language Machine-readable representation 1s and 0s (often displayed in “hex”)
Assembler Translates assembly to machine
CPUMem I/O
System softwareAppApp App
Example is in “LC4” a toy instruction set architecture, or ISA
Example Assembly Language & ISA
6
MIPS: example of real ISA 32/64-bit operations 32-bit instructions 32/64 registers
32 integer, 64 floating point ~100 different instructions
CPUMem I/O
System softwareAppApp App .datadata
aarray: .space 100
sum: .word 0sum: .word 0
.text.text
array_sum:array_sum:
li $5, 0li $5, 0
la $1, array
la $2, sumla $2, sum
array_sum_loop:array_sum_loop:
lw $3, 0($1)
lw $4, 0($2)lw $4, 0($2)
add $4, $3, $4
sw $4, 0($2)sw $4, 0($2)
addi $1, $1, 1
addi $5, $5, 1addi $5, $5, 1
li $6, 100
blt $5, $6, array_sum_loopblt $5, $6, array_sum_loop
Example code is MIPS, but all ISAs are pretty similar
Instruction Execution Model7
A computer is just a finite state machine Registers (few of them, but fast) Memory (lots of memory, but slower) Program counter (next insn to execute)
Called “instruction pointer” in x86 A computer executes instructions
Fetches next instruction from memory Decodes it (figure out what it does) Reads its inputs (registers & memory) Executes it (adds, multiply, etc.) Write its outputs (registers & memory) Next insn (adjust the program counter)
Program is just “data in memory” Makes computers programmable
(“universal”)
CPUMem I/O
System softwareAppApp App
FetchFetchDecodeDecodeRead InputsRead InputsExecuteExecuteWrite OutputWrite OutputNext InsnNext Insn
Instruction = Insn
Designing a Computer8
Control
Datapath MemoryCentral Processing
Unit (CPU)or “processor”
Input
Output
FIVE PIECES OF HARDWARE
Start by Defining ISA9
What is instruction set architecture (ISA)? ISA
Defines registers Defines data transfer modes (instructions) between
registers, memory and I/O There should be s uffic ie nt instructions to efficiently
translate any program for machine processing Next, define instruction set format – binary
representation used by the hardware Variable-length vs. fixed-length instructions
Types of ISA10
Complex instruction set computer (CISC) Many instructions (several hundreds) An instruction takes many cycles to execute Example: Intel Pentium
Reduced instruction set computer (RISC) Small set of instructions (typically 32) Simple instructions, each executes in one clock
cycle –Well, almost. Effective use of pipelining Example: ARM, MIPS
On Two Types of ISA11
Brad Smith, “ARM and Intel Battle over the Mobile Chip’s Future,” Computer, vol. 41, no. 5, pp. 15-18, May 2008.
Compare 3Ps: Performance Power consumption Price
Pipelining of RISC Instructions12
FetchInstruction
DecodeOpcode
FetchOperands
ExecuteOperation
StoreResult
Although an instruction takes five clock cycles,one instruction can be completed every cycle.
MIPS Instruction Set (RISC)13
Instructions execute simple functions. Maintain regularity of format – each instruction
is one word, contains o p c o d e and a rg um e nts . Minimize memory accesses – whenever
possible use registers as arguments. Three types of instructions:
Register (R)-type – only registers as arguments. Immediate (I)-type – arguments are registers and
numbers (constants or memory addresses). Jump (J)-type – argument is an address.
MIPS Arithmetic Instructions14
All instructions have 3 operands Operand order is fixed (destination first)
Example:
C code: a = b + c;
MIPS ‘code’: add a, b, c
“The natural number of operands for an operation like addition is three… requiring every instruction to have exactly three operands conforms to the philosophy of keeping the hardware simple”
2004 © Morgan Kaufman Publishers
Arithmetic Instr. (Continued)15
Design Principle: simplicity favors regularity. Of course this complicates some things...
C code: a = b + c + d;
MIPS code: add a, b, cadd a, a, d
Operands must be registers 32 registers provided Each register contains 32 bits
Registers vs. Memory16
Arithmetic instructions operands must be registers 32 registers provided
Compiler associates variables with registers. What about programs with lots of variables? Mus t us e
m e m o ry .
Processor I/O
Control
Datapath
Memory
Input
Output
Memory Organization17
Viewed as a large, single-dimension array, with an address.
A memory address is an index into the array. "Byte addressing" means that the index points to a byte
of memory.
2004 © Morgan Kaufman Publishers
32 bit word
32 bit word
32 bit word
32 bit word
.
.
.
8 bits of data 8 bits of data 8 bits of data 8 bits of data
8 bits of data 8 bits of data 8 bits of data 8 bits of data
8 bits of data 8 bits of data
8 bits of data
8 bits of data
8 bits of data
8 bits of data 8 bits of data
8 bits of data
8 bits of data
8 bits of data
8 bits of data
8 bits of data
Byte 0 byte 1 byte 2 byte 3
byte 4 byte 10
Memory Organization18
Bytes are nice, but most data items use larger "words" For MIPS, a word contains 32 bits or 4 bytes.
word addresses
232 bytes with addresses from 0 to 232 – 1 230 words with addresses 0, 4, 8, ... 232 – 4 Words are aligned
i.e., what are the least 2 significant bits of a word address?
...
Registers hold 32 bits of data
Use 32 bit address
2004 © Morgan Kaufman Publishers
0
4
8
12
.
32 bits of data
32 bits of data
32 bits of data
32 bits of data
32 bits of data
Instructions19
Load and store instructions Example:
C code: A[12] = h + A[8];
MIPS code: lw $t0, 32($s3) #addr of A in reg s3add $t0, $s2, $t0 #h in reg s2sw $t0, 48($s3)
Can refer to registers by name (e.g., $s2, $t2) instead of number Store word has destination last Remember arithmetic operands are registers, not memory!
Can’t write: add 48($s3), $s2, 32($s3)
2004 © Morgan Kaufman Publishers
Our First Example20
Can we figure out the code of subroutine?
Initially, k is in reg 5; base address of v is in reg 4; return addr is in reg 31
swap(int v[], int k);{ int temp;
temp = v[k]v[k] = v[k+1];v[k+1] = temp;
}
swap:sll $2, $5, 2add $2, $4, $2lw $15, 0($2)lw $16, 4($2)sw $16, 0($2)sw $15, 4($2)jr $31
What Happens?21
When the program reaches “call swap” statement: Jump to swap routine
Registers 4 and 5 contain the arguments (register convention) Register 31 contains the return address (register convention)
Swap two words in memory Jump back to return address to continue rest of the
program
.
.call swap...
return address
Memory and Registers22
Word 0
Word 1
Word 2
v[0] (Word n)
048
12 .
4n . . .
4n+4k.
v[1] (Word n+1)
Register 0
Register 1
Register 2
Register 3
Register 4
Register 31
Register 5
v[k] (Word n+k)
4n
k
Memorybyte addr.
Ret. addr.v[k+1] (Word n+k+1)
.
.
Our First Example23
Now figure out the code:
swap(int v[], int k);{ int temp;
temp = v[k]v[k] = v[k+1];v[k+1] = temp;
}
swap:sll $2, $5, 2add $2, $4, $2lw $15, 0($2)lw $16, 4($2)sw $16, 0($2)sw $15, 4($2)jr $31
2004 © Morgan Kaufman Publishers
So Far We’ve Learned24
MIPS— loading words but addressing bytes— arithmetic on registers only
Instruction Meaning
add $s1, $s2, $s3 $s1 = $s2 + $s3sub $s1, $s2, $s3 $s1 = $s2 – $s3lw $s1, 100($s2) $s1 = Memory[$s2+100]
sw $s1, 100($s2) Memory[$s2+100] = $s1
2004 © Morgan Kaufman Publishers
Stored Program Concept25
Instructions are bits Programs are stored in memory
to be read or written just like data
Fetch and Execute Cycles Instructions are fetched and put into a special register Opcode bits in the register "control" the subsequent actions Fetch the “next” instruction and continue
Processor Memorymemory for data, programs,
compilers, editors, etc.
Control26
Decision making instructions alter the control flow, i.e., change the "next" instruction to be executed
MIPS conditional branch instructions:
bne $t0, $t1, Label beq $t0, $t1, Label
Example: if (i==j) h = i + j;
bne $s0, $s1, Labeladd $s3, $s0, $s1
Label: ....
2004 © Morgan Kaufman Publishers
Control27
MIPS unconditional branch instructions:j label
Example:if (i!=j) beq $s4, $s5, Lab1 h=i+j; add $s3, $s4, $s5else j Lab2 h=i-j; Lab1: sub $s3,
$s4, $s5Lab2: ...
Can you build a simple fo r loop?
So Far We’ve Learned28
Instruction Meaningadd $s1,$s2,$s3 $s1 = $s2 + $s3sub $s1,$s2,$s3 $s1 = $s2 – $s3lw $s1,100($s2) $s1 = Memory[$s2+100] sw $s1,100($s2) Memory[$s2+100] = $s1bne $s4,$s5,Label Next instr. is at Label if
$s4 ≠ $s5beq $s4,$s5,Label Next instr. is at Label if
$s4 = $s5j Label Next instr. is at Label
Formats:OP
OP
OP
rs rt rd sa funct
rs rt immediate
R Format
I Format
6 bits 5 bits 5 bits 5 bits 5 bits 6 bits
6 bits 5 bits 5 bits 16 bits
J Format6 bits 26 bits
jump target
Three Ways to Jump: j, jr, jal29
j ins tr # jump to machine instruction ins tr (unconditional jump)
jr $ra # jump to address in register ra (used by calee to go back to caller)
jal a ddr # set $ra = PC+4 and go to a ddr (jump and link; used to jump to a procedure)
Control Flow30
We have: beq, bne, what about Branch-if-less-than? New instruction:
if $s1 < $s2 then $t0 = 1
slt $t0, $s1, $s2 else $t0 = 0
Can use this instruction to build new “pseudoinstruction”blt $s1, $s2, Label
Note that the assembler needs a register to do this,— there are policy of use conventions for registers
2004 © Morgan Kaufman Publishers
Pseudoinstructions31
blt $s1, $s2, reladdr Assembler converts to:
slt $1, $s1, $s2bne $1, $zero, reladdr
Other pseudoinstructions: bgt, ble, bge, li, move Not implemented in hardware Assembler expands pseudoinstructions into
machine instructions Register 1, called $at, is reserved for converting
pseudoinstructions into machine code.
Constants32
Small constants are used quite frequently (50% of operands) e.g., A = A + 5;
B = B + 1;C = C – 18;
Solutions? Why not? put 'typical constants' in memory and load them. create hard-wired registers (like $zero) for constants like one.
MIPS Instructions: addi $29, $29, 4
slti $8, $18, 10andi $29, $29, 6ori $29, $29, 4
Design Principle: Make the common case fast. Which fo rm a t?2004 © Morgan Kaufman Publishers
How About Larger Constants?33
We'd like to be able to load a 32 bit constant into a register Must use two instructions, new "load upper immediate" instruction
lui $t0, 1010101010101010
Then must get the lower order bits right, i.e.,ori $t0, $t0, 1010101010101010
1010101010101010 0000000000000000
0000000000000000 1010101010101010
1010101010101010 1010101010101010
ori
1010101010101010 0000000000000000
filled with zeros
2004 © Morgan Kaufman Publishers
Assembly Language vs. Machine Language
34
Assembly provides convenient symbolic representation much easier than writing down numbers e.g., destination first
Machine language is the underlying reality e.g., destination is no longer first
Assembly can provide 'pseudoinstructions' e.g., “move $t0, $t1” exists only in Assembly implemented using “add $t0, $t1, $zero”
When considering performance you should count real instructions and clock cycles
2004 © Morgan Kaufman Publishers
Overview of MIPS35
simple instructions, all 32 bits wide very structured, no unnecessary baggage only three instruction formats
rely on compiler to achieve performance
R-type
I-type
J-type
2004 © Morgan Kaufman Publishers
Addresses in Branches and Jumps36
Instructions:bne $t4, $t5, Label Next instruction is at Label
if $t4 ≠ $t5
beq $t4, $t5, Label Next instruction is at Label if $t4 = $t5
j Label Next instruction is at Label Formats:
op rs rt 16 bit rel. address
op 26 bit absolute address
I
J
Addresses in Branches37
Instructions:bne $t4,$t5,Label Next instruction is at Label if $t4 ≠ $t5beq $t4,$t5,Label Next instruction is at Label if $t4 = $t5
Formats: – 215 to 215 – 1 ~ ±32 Kwords
Relative addressing 226 = 64 Mwords with respect to PC (program counter) most branches are local (principle of locality)
Jump instruction just uses high order bits of PC address boundaries of 256 MBytes (maximum jump 64 Mwords)
op rs rt 16 bit address
op 26 bit address
Example: Loop in C38
while ( save[i] == k ) i += 1;
Given a value for k, set i to the index of element in array save [ ] that does not equal k.
Alternative Architectures39
Design alternative: provide more powerful operations goal is to reduce number of instructions executed danger is a slower cycle time and/or a higher CPI
Let’s look (briefly) at IA-32
–“The path toward operation complexity is thus fraught with peril.
To avoid these problems, designers have moved toward simpler
instructions”
IA–32 (a.k.a. x86)40
1978: The Intel 8086 is announced (16 bit architecture) 1980: The 8087 floating point coprocessor is added 1982: The 80286 increases address space to 24 bits,
+instructions 1985: The 80386 extends to 32 bits, new addressing modes 1989-1995: The 80486, Pentium, Pentium Pro add a few instructions
(mostly designed for higher performance) 1997: 57 new “MMX” instructions are added, Pentium II 1999: The Pentium III added another 70 instructions (SSE –
streaming SIMD extensions) 2001: Another 144 instructions (SSE2) 2003: AMD extends the architecture to increase address space to
64 bits, widens all registers to 64 bits and makes other changes (AMD64)
2004: Intel capitulates and embraces AMD64 (calls it EM64T) and adds more media extensions
“This his to ry illus tra te s the im p a c t o f the “g o ld e n ha ndcuffs ” o f c o m p a tibility :“a dd ing ne w fe a ture s a s s o m e o ne m ig ht a dd c lo thing to a p a cke d ba g ”“a n a rchite c ture tha t is d ifficult to e x p la in a nd im p o s s ible to lo ve ”
IA-32 Overview41
Complexity: Instructions from 1 to 17 bytes long one operand must act as both a source and destination one operand can come from memory complex addressing modes
e.g., “base or scaled index with 8 or 32 bit displacement” Saving grace:
the most frequently used instructions are not too difficult to build compilers avoid the portions of the architecture that are slow
“wha t the x 8 6 la c ks in s ty le is m a d e up in q ua ntity , m a king it be a utiful fro m the rig ht p e rs p e c tive ”
IA-32 Register Restrictions42
Fourteen major registers. Eight 32-bit general purpose registers.
ESP or EBP cannot contain memory address. ESP cannot contain displacement from base address. . . .
IA-32 Typical Instructions43
Four major types of integer instructions: Data movement including move, push, pop Arithmetic and logical (destination register or memory) Control flow (use of condition codes / flags ) String instructions, including string move and string compare
Some IA-32 Instructions44
PUSH 5-bit opcode, 3-bit register operand
JE 4-bit opcode, 4-bit condition, 8-bit jump offset
5-b | 3-b
4-b | 4-b | 8-b
Some IA-32 Instructions45
MOV 6-bit opcode, 8-bit register/mode*, 8-bit offset
XOR 8-bit opcode, 8-bit reg/mode*, 8-bit base, 8-b index
6-b |d|w| 8-b | 8-b
bit indicates move to or from memory
bit indicates byte or double word operation
8-b | 8-b | 8-b | 8-b
Some IA-32 Instructions46
ADD 4-bit opcode, 3-bit register, 32-bit immediate
TEST 7-bit opcode, 8-bit reg/mode, 32-bit immediate
4-b | 3-b |w| 32-b
7-b |w| 8-b | 32-b
Additional References47
IA-32, IA-64 (CISC) A. S. Tanenbaum, Struc ture d Co m p ute r O rg a niz a tio n, Fifth
Ed itio n, Upper Saddle River, New Jersey: Pearson Prentice-Hall, 2006, Chapter 5.
ARM (RISC) D. Seal, ARM Archite c ture Re fe re nc e Ma nua l, Se c o nd Ed itio n ,
Addison-Wesley Professional, 2000. SPARC (Scalable Processor Architecture) PowerPC
V. C. Hamacher, Z. G. Vranesic and S. G. Zaky, Computer Organization, Fourth Edition, New York: McGraw-Hill, 1996.
Instruction Complexity48
Increasing instruction complexityPro
gra
m s
ize
in m
ach
ine
inst
ruct
ion
s (P
)
Av.
exe
cutio
n tim
e pe
r in
stru
ctio
n (
T)
PT
P×T
Summary49
Instruction complexity is only one variable lower instruction count vs. higher CPI / lower clock rate
– we will s e e p e rfo rm anc e m e a s ure s la te r Design Principles:
simplicity favors regularity smaller is faster good design demands compromise make the common case fast
Instruction set architecture a very important abstraction indeed!