cs/ece 552: instruction setssinclair/courses/cs552/spring...cs/ece 552: instruction sets prof....
TRANSCRIPT
CS/ECE 552: Instruction Sets
Prof. Matthew D. Sinclair
Lecture notes based in part on slides created by MikkoLipasti, Mark Hill, David Wood, Guri Sohi, Josh San
Miguel, John Shen, and Jim Smith
This Class
• Instruction set architectures (ISAs)
• MIPS
• ALU ops, loads/stores, branches/jumps
2
Instruction Sets
MIPS/MIPS-like ISA used in 552Simple, sensible, regular, easy to design CPU
Most common: x86 (IA-32) and ARMx86: Intel Pentium/Core i7, AMD Athlon, etc.
ARM: cell phones, embedded systems, IOT
Others:PowerPC (IBM servers)
SPARC (Sun)
Alpha (Alpha)
RISC-V (SiFive, some use internally elsewhere, e.g. NVIDIA)
3
Basics
• C statement
f = (g + h) – (i + j)
• MIPS instructions
add t0, g, h
add t1, i, j
sub f, t0, t1
• Multiple instructions for one C statement
4
Opcode/Mnemonic: Specifies operation
Operands: Input and output data
Source
Destination
Why not bigger instructions?
• Why not “f = (g + h) – (i + j)” as one instruction?• Church’s thesis: A very primitive computer can
compute anything that a fancy computer can compute – you need only logical functions, read and write memory, and data-dependent decisions
• Therefore, ISA selected for practical reasons:– Performance and cost, not computability
• Regularity tends to improve both– E.g. H/W to handle arbitrary number of operands is
complex and slow and UNNECESSARY
• Tradeoff between regularity and fewer instructors– Discussed in greater detail in CS/ECE 752
5
Arithmetic-Logic Unit Ops
• Some ALU ops:– add, addi, addu, addiu (immediate, unsigned)
– sub …
– mul, div – wider result • 32b x 32b = 64b product
• 32b / 32b = 32b quotient and 32b remainder
– and, andi
– or, ori
– sll, srl
• Why registers?– Short name fits in instruction word: log2(32) = 5 bits
• But are registers enough?6
Memory and Load/Store
• Need more than 32 words of storage
• An array of locations M[j] indexed by j
• Data movement (on words or integers)
– Load word for register <= memory
lw $t1, 0($s1) # where [$s1]=4008; get input g
– Store word for register => memory
sw $t1, 0($s0) # where [$s0]=4004; save output f
7
Memory and Load/Store
$0
$31
Processor
Re
gis
ters
ALU
Memory0
maxmem
4004
4008
f
g
8000
8004
A[0]
A[1]
8008 A[2]
8
Branches and Jumps
while ( i != j) {
j= j + i;
i= i + 1;
}
9
Branches and Jumps
while ( i != j) {
j= j + i;
i= i + 1;
}
# [$s1] is i, [$s2] is j
Loop:
Exit:
10
Branches and Jumps
while ( i != j) {
j= j + i;
i= i + 1;
}
# [$s1] is i, [$s2] is j
Loop: beq $s1, $s2, Exit
add $s2, $s2, $s1
addi $s1, $s1 , 1
j Loop
Exit:
11
Branches and Jumps
• MIPS branchesbeq $s1, $s2, imm # if ($s1==$s2) PC = PC + imm<< 2 else PC += 4;
bne …
slt, sle, sgt, sge
• Unconditional jumpsj addr # PC = addr
jr $s3 # PC = $s3
jal addr # $31 = PC + 4; PC = addr;
(used for function calls)
12
Exercise – MIPS Assembly
What does this assembly code do?
13
lw $t0, 0($s0)nor $t0, $t0, $zeroaddi $t0, $t0, 1sw $t0, 0($s0)
Exercise – MIPS Assembly
What does this assembly code do?
14
lw $t0, 0($s0)nor $t0, $t0, $zeroaddi $t0, $t0, 1sw $t0, 0($s0)
Computes two’s complement of integer at ($s0).
Exercise – MIPS Assembly
What does this assembly code do?
15
lw $t0, 0($s0)nor $t0, $t0, $zeroaddi $t0, $t0, 1sw $t0, 0($s0)
Computes two’s complement of integer at ($s0).
Reduce the number of instructions executed?
Exercise – MIPS Assembly
What does this assembly code do?
16
lw $t0, 0($s0)sub $t0, $zero, $t0sw $t0, 0($s0)
Computes two’s complement of integer at ($s0).
Reduce the number of instructions executed?
Exercise – MIPS Assembly
What does this assembly code do?
17
addi $t0, $zero, 4096lw $t1, 0($s0)slt $t2, $t1, $zeronor $t2, $zero, $t2slt $t3, $t1, $t0and $t4, $t2, $t3bne $t4, $zero, TARGET
Exercise – MIPS Assembly
What does this assembly code do?
18
addi $t0, $zero, 4096lw $t1, 0($s0)slt $t2, $t1, $zeronor $t2, $zero, $t2slt $t3, $t1, $t0and $t4, $t2, $t3bne $t4, $zero, TARGET
Branches to TARGET if 0 ≤ ($s0) < 4096.
Exercise – MIPS Assembly
What does this assembly code do?
19
addi $t0, $zero, 4096lw $t1, 0($s0)slt $t2, $t1, $zeronor $t2, $zero, $t2slt $t3, $t1, $t0and $t4, $t2, $t3bne $t4, $zero, TARGET
Branches to TARGET if 0 ≤ ($s0) < 4096.
Reduce the number of instructions executed?
Exercise – MIPS Assembly
What does this assembly code do?
20
addi $t0, $zero, -4096lw $t1, 0($s0)slt $t2, $t1, $zeronor $t2, $zero, $t2slt $t3, $t1, $t0and $t4, $t0, $t1beq $t4, $zero, TARGET
Branches to TARGET if 0 ≤ ($s0) < 4096.
Reduce the number of instructions executed?
Exercise – MIPS Assembly
What does this assembly code do?
21
addi $t0, $zero, -4096lw $t1, 0($s0)and $t4, $t0, $t1beq $t4, $zero, TARGET
Branches to TARGET if 0 ≤ ($s0) < 4096.
Reduce the number of instructions executed?
Exercise – MIPS Assembly
What does this assembly code do?
22
addi $t0, $zero, -4096lw $t1, 0($s0)and $t4, $t0, $t1beq $t4, $zero, TARGET
Branches to TARGET if 0 ≤ ($s0) < 4096.
Can reduce even more?
Exercise – MIPS Assembly
What does this assembly code do?
23
addi $t0, $zero, -4096lw $t1, 0($s0)sltiu $t4, $t1, 4096bne $t4, $zero, TARGET
Branches to TARGET if 0 ≤ ($s0) < 4096.
Can reduce even more?
Exercise – MIPS Assembly
What does this assembly code do?
24
lw $t1, 0($s0)sltiu $t4, $t1, 4096bne $t4, $zero, TARGET
Branches to TARGET if 0 ≤ ($s0) < 4096.
Can reduce even more?
CS/ECE 552: Instruction Sets (Part 2)
Prof. Matthew D. Sinclair
Lecture notes based in part on slides created by MikkoLipasti, Mark Hill, David Wood, Guri Sohi, Josh San
Miguel, John Shen, and Jim Smith
This Class
• Machine code
• Stack and procedures
• Endianness
26
MIPS Machine Language
• All instructions are 32 bits wide
• Assembly: add $1, $2, $3
• Machine language:33222222222211111111110000000000
10987654321098765432109876543210
00000000010000110000100000010000
000000 00010 00011 00001 00000 010000
alu-rr 2 3 1 zero add (signed)
27
Instruction Format
• R-format
– opc rs rt rd shamt function
– 6 5 5 5 5 6
• Digression:
– How do you store the number 4,392,976?• Same as add $1, $2, $3
• Stored program: instructions are represented as numbers
– Programs can be read/written in memory like numbers
• Other R-format: addu, sub, …
28
Instruction Format
• Assembly: lw $1, 72($2)
• Machine: 100011 00010 00001 0000000001001000
lw 2 1 72
• I-format
– opc rs rt address/immediate
– 6 5 5 16
29
Summary: Instruction Formats
R: opcode rs rt rd shamt function
6 5 5 5 5 6
I: opcode rs rt address/immediate
6 5 5 16
J: opcode addr
6 26
30
Stack
• Stack: parameters, return values, return address• Stack grows from higher to lower addresses
(indefinitely)• $sp ($29) is stack pointer: points to top (recent) word• Push $t2:
addi $sp, $sp, -4
sw $t2, 0($sp)
• Pop $t2:lw $t2, 0($sp)
addi $sp, $sp, 4
31
Procedure Calls
• Calling convention is part of ABI– Caller
• Save registers ($t regs are caller-saved)– Push current values of registers that you will read after the procedure returns
• Set up arguments
• Call procedure (jal)
• Get results
• Restore (pop) registers
– Callee• Save registers ($s regs are callee-saved)
– Push current values of registers that you will (over)write within the procedure
• Perform procedure work, set up result
• Restore (pop) registers
• Return (jr $ra)32
ProcedureExample
swap(int v[], int k) {int temp = v[k];v[k] = v[k+1];v[k+1] = temp;
}
# $a0 is &v[0] and $a1 is k (i.e., 1st and 2nd incoming arguments)# $t0, $t1 and $t2 are caller-saved temporaries
swap: sll $t1, $a1, 2add $t1, $a0, $t1 lw $t0, 0($t1) lw $t2, 4($t1) sw $t2, 0($t1) sw $t0, 4($t1) jr $ra
33
ProcedureExample
swap(int v[], int k) {int temp = v[k];v[k] = v[k+1];v[k+1] = temp;
}
# $a0 is &v[0] and $a1 is k (i.e., 1st and 2nd incoming arguments)# $t0, $t1 and $t2 are caller-saved temporaries
swap: sll $t1, $a1, 2 # [$t1] = k*4add $t1, $a0, $t1 lw $t0, 0($t1) lw $t2, 4($t1) sw $t2, 0($t1) sw $t0, 4($t1) jr $ra
34
ProcedureExample
swap(int v[], int k) {int temp = v[k];v[k] = v[k+1];v[k+1] = temp;
}
# $a0 is &v[0] and $a1 is k (i.e., 1st and 2nd incoming arguments)# $t0, $t1 and $t2 are caller-saved temporaries
swap: sll $t1, $a1, 2 # [$t1] = k*4add $t1, $a0, $t1 # [$t1] = v + k*4 = &(v[k])lw $t0, 0($t1) lw $t2, 4($t1) sw $t2, 0($t1) sw $t0, 4($t1) jr $ra
35
ProcedureExample
swap(int v[], int k) {int temp = v[k];v[k] = v[k+1];v[k+1] = temp;
}
# $a0 is &v[0] and $a1 is k (i.e., 1st and 2nd incoming arguments)# $t0, $t1 and $t2 are caller-saved temporaries
swap: sll $t1, $a1, 2 # [$t1] = k*4add $t1, $a0, $t1 # [$t1] = v + k*4 = &(v[k])lw $t0, 0($t1) # [$t0] = temp = v[k]lw $t2, 4($t1) sw $t2, 0($t1) sw $t0, 4($t1) jr $ra
36
ProcedureExample
swap(int v[], int k) {int temp = v[k];v[k] = v[k+1];v[k+1] = temp;
}
# $a0 is &v[0] and $a1 is k (i.e., 1st and 2nd incoming arguments)# $t0, $t1 and $t2 are caller-saved temporaries
swap: sll $t1, $a1, 2 # [$t1] = k*4add $t1, $a0, $t1 # [$t1] = v + k*4 = &(v[k])lw $t0, 0($t1) # [$t0] = temp = v[k]lw $t2, 4($t1) # [$t2] = v[k+1]sw $t2, 0($t1) sw $t0, 4($t1) jr $ra
37
ProcedureExample
swap(int v[], int k) {int temp = v[k];v[k] = v[k+1];v[k+1] = temp;
}
# $a0 is &v[0] and $a1 is k (i.e., 1st and 2nd incoming arguments)# $t0, $t1 and $t2 are caller-saved temporaries
swap: sll $t1, $a1, 2 # [$t1] = k*4add $t1, $a0, $t1 # [$t1] = v + k*4 = &(v[k])lw $t0, 0($t1) # [$t0] = temp = v[k]lw $t2, 4($t1) # [$t2] = v[k+1]sw $t2, 0($t1) # v[k] = v[k+1]sw $t0, 4($t1) jr $ra
38
ProcedureExample
swap(int v[], int k) {int temp = v[k];v[k] = v[k+1];v[k+1] = temp;
}
# $a0 is &v[0] and $a1 is k (i.e., 1st and 2nd incoming arguments)# $t0, $t1 and $t2 are caller-saved temporaries
swap: sll $t1, $a1, 2 # [$t1] = k*4add $t1, $a0, $t1 # [$t1] = v + k*4 = &(v[k])lw $t0, 0($t1) # [$t0] = temp = v[k]lw $t2, 4($t1) # [$t2] = v[k+1]sw $t2, 0($t1) # v[k] = v[k+1]sw $t0, 4($t1) # v[k+1] = tempjr $ra
39
ProcedureExample
swap(int v[], int k) {int temp = v[k];v[k] = v[k+1];v[k+1] = temp;
}
# $a0 is &v[0] and $a1 is k (i.e., 1st and 2nd incoming arguments)# $t0, $t1 and $t2 are caller-saved temporaries
swap: sll $t1, $a1, 2 # [$t1] = k*4add $t1, $a0, $t1 # [$t1] = v + k*4 = &(v[k])lw $t0, 0($t1) # [$t0] = temp = v[k]lw $t2, 4($t1) # [$t2] = v[k+1]sw $t2, 0($t1) # v[k] = v[k+1]sw $t0, 4($t1) # v[k+1] = tempjr $ra # return
40
ProcedureExample
swap(int v[], int k) {int temp = v[k];v[k] = v[k+1];v[k+1] = temp;
}
# $a0 is &v[0] and $a1 is k (i.e., 1st and 2nd incoming arguments)# $t0, $t1 and $t2 are caller-saved temporaries
swap: sll $t1, $a1, 2 # [$t1] = k*4add $t1, $a0, $t1 # [$t1] = v + k*4 = &(v[k])lw $t0, 0($t1) # [$t0] = temp = v[k]lw $t2, 4($t1) # [$t2] = v[k+1]sw $t2, 0($t1) # v[k] = v[k+1]sw $t0, 4($t1) # v[k+1] = tempjr $ra # return
What can go wrong if we use $s1 instead of $t1?41
ProcedureExample
swap(int v[], int k) {int temp = v[k];v[k] = v[k+1];v[k+1] = temp;
}
# $a0 is &v[0] and $a1 is k (i.e., 1st and 2nd incoming arguments)# $t0, $t1 and $t2 are caller-saved temporaries
swap: sll $t1, $a1, 2 # [$t1] = k*4add $t1, $a0, $t1 # [$t1] = v + k*4 = &(v[k])lw $t0, 0($t1) # [$t0] = temp = v[k]lw $t2, 4($t1) # [$t2] = v[k+1]sw $t2, 0($t1) # v[k] = v[k+1]sw $t0, 4($t1) # v[k+1] = tempjr $ra # return
What can go wrong if we use $s1 instead of $t1?42
lw $s1, 0($t2)jal swapadd $t3, $s4, $s1
Endianness• How bytes within a word are addressed
• Big endian: LS byte at address xxxxxx11 (binary)
– E.g. IBM, SPARC
• Little endian: LS byte at address xxxxxx00 (binary)
– E.g. Intel x86
• Mode selectable
– E.g. PowerPC, MIPS
• Causes headaches for
– Ugly pointer arithmetic
– Multibyte datatype transfers from one machine to another
43
Endianness
• All instructions are 32 bits wide
• Assembly: add $1, $2, $3
• Machine language:33222222222211111111110000000000
10987654321098765432109876543210
00000000010000110000100000010000
000000 00010 00011 00001 00000 010000
alu-rr 2 3 1 zero add (signed)
44
Little endian
addr 00 0x10
addr 01 0x08
addr 10 0x43
addr 11 0x00
Big endian
addr 00 0x00
addr 01 0x43
addr 10 0x08
addr 11 0x10
Exercise – MIPS Assembly
Fill in the blanks:
45
// N set to 64int A[N]; // $A[0] = 0x100int sum = 0;int i = 0;
do {sum += A[i];i++;
} while (i < N);
addi $s0, $zero, ___add $t0, $zero, $zeroadd $t1, $zero, $zero___ $s1, $zero, 64
LOOP: sll $t2, $t1, ___add $t2, $t2, $s0lw $t3, 0($t2)add $t0, ___, $t3addi $t1, $t1, 1slt $t4, ___, ___bne $t4, $zero, LOOP
Exercise – MIPS Assembly
Fill in the blanks:
46
// N set to 64int A[N]; // $A[0] = 0x100int sum = 0;int i = 0;
do {sum += A[i];i++;
} while (i < N);
addi $s0, $zero, 256add $t0, $zero, $zeroadd $t1, $zero, $zeroaddi $s1, $zero, 64
LOOP: sll $t2, $t1, 2add $t2, $t2, $s0lw $t3, 0($t2)add $t0, $t0, $t3addi $t1, $t1, 1slt $t4, $t1, $s1bne $t4, $zero, LOOP
Exercise – MIPS Assembly
Does the assembly correspond to the C code?
47
// N set to 64int A[N]; // $A[0] = 0x100int sum = 0;
for (int i = 0; i < N; i++) {sum += A[i];
}
addi $s0, $zero, 256add $t0, $zero, $zeroadd $t1, $zero, $zeroaddi $s1, $zero, 64
LOOP: sll $t2, $t1, 2add $t2, $t2, $s0lw $t3, 0($t2)add $t0, $t0, $t3addi $t1, $t1, 1slt $t4, $t1, $s1bne $t4, $zero, LOOP
Exercise – MIPS Assembly
Fill in the blanks:
48
// N set to 64int A[N]; // $A[0] = 0x100int sum = 0;
for (int i = 0; i < N; i++) {sum += A[i];
}
addi $s0, $zero, 256add $t0, $zero, $zeroadd $t1, $zero, $zeroaddi $s1, $zero, 64
LOOP: slt $t4, $t1, $s1___ $t4, $zero, ___sll $t2, $t1, 2add $t2, $t2, $s0lw $t3, 0($t2)add $t0, $t0, $t3addi $t1, $t1, 1___ LOOP
EXIT:
Exercise – MIPS Assembly
Fill in the blanks:
49
// N set to 64int A[N]; // $A[0] = 0x100int sum = 0;
for (int i = 0; i < N; i++) {sum += A[i];
}
addi $s0, $zero, 256add $t0, $zero, $zeroadd $t1, $zero, $zeroaddi $s1, $zero, 64
LOOP: slt $t4, $t1, $s1beq $t4, $zero, EXITsll $t2, $t1, 2add $t2, $t2, $s0lw $t3, 0($t2)add $t0, $t0, $t3addi $t1, $t1, 1j LOOP
EXIT:
Exercise – MIPS Assembly
Reduce the number of instructions executed?
50
// N set to 64int A[N]; // $A[0] = 0x100int sum = 0;
for (int i = 0; i < N; i++) {sum += A[i];
}
addi $s0, $zero, 256add $t0, $zero, $zeroadd $t1, $zero, $zeroaddi $s1, $zero, 64
LOOP: slt $t4, $t1, $s1beq $t4, $zero, EXITsll $t2, $t1, 2add $t2, $t2, $s0lw $t3, 0($t2)add $t0, $t0, $t3addi $t1, $t1, 1j LOOP
EXIT:
Exercise – MIPS Assembly
Reduce the number of instructions executed?
51
// N set to 64int A[N]; // $A[0] = 0x100int sum = 0;
for (int i = 0; i < N; i++) {sum += A[i];
}
addi $s0, $zero, 256add $t0, $zero, $zeroadd $t1, $zero, $zeroaddi $s1, $zero, 64
LOOP: slt $t4, $s0, $s1beq $t4, $zero, EXITsll $t2, $t1, 2add $t2, $t2, $s0lw $t3, 0($s0)add $t0, $t0, $t3addi $s0, $s0, 4j LOOP
EXIT:
Exercise – MIPS Assembly
Reduce the number of instructions executed?
52
// N set to 64int A[N]; // $A[0] = 0x100int sum = 0;
for (int i = 0; i < N; i++) {sum += A[i];
}
addi $s0, $zero, 256add $t0, $zero, $zeroadd $t1, $zero, $zeroaddi $s1, $zero, 64
LOOP: slt $t4, $s0, $s1beq $t4, $zero, EXITsll $t2, $t1, 2add $t2, $t2, $s0lw $t3, 0($s0)add $t0, $t0, $t3addi $s0, $s0, 4j LOOP
EXIT:
Exercise – MIPS Assembly
Reduce the number of instructions executed?
53
// N set to 64int A[N]; // $A[0] = 0x100int sum = 0;
for (int i = 0; i < N; i++) {sum += A[i];
}
addi $s0, $zero, 256add $t0, $zero, $zeroaddi $s1, $zero, 64
LOOP: slt $t4, $s0, $s1beq $t4, $zero, EXITlw $t3, 0($s0)add $t0, $t0, $t3addi $s0, $s0, 4j LOOP
EXIT:
Exercise – MIPS Assembly
Reduce the number of instructions executed?
54
// N set to 64int A[N]; // $A[0] = 0x100int sum = 0;
for (int i = 0; i < N; i++) {sum += A[i];
}
addi $s0, $zero, 256add $t0, $zero, $zeroaddi $s1, $zero, ___
LOOP: slt $t4, $s0, $s1beq $t4, $zero, EXITlw $t3, 0($s0)add $t0, $t0, $t3addi $s0, $s0, 4j LOOP
EXIT:
Exercise – MIPS Assembly
Reduce the number of instructions executed?
55
// N set to 64int A[N]; // $A[0] = 0x100int sum = 0;
for (int i = 0; i < N; i++) {sum += A[i];
}
addi $s0, $zero, 256add $t0, $zero, $zeroaddi $s1, $zero, 512
LOOP: slt $t4, $s0, $s1beq $t4, $zero, EXITlw $t3, 0($s0)add $t0, $t0, $t3addi $s0, $s0, 4j LOOP
EXIT: