ece 154a introduction to computerstrukov/ece154afall2013/viewgraphs/mips.pdf · • choices and...

83
ECE 154A Introduction to Computer Introduction to Computer Architecture Dmitri Strukov MIPS Instruction Set Architecture & MIPS Instruction Set Architecture & Single Cycle Datapath and Control

Upload: dinhliem

Post on 20-Jun-2018

230 views

Category:

Documents


0 download

TRANSCRIPT

ECE 154A Introduction to ComputerIntroduction to Computer 

ArchitectureDmitri Strukov

MIPS Instruction Set Architecture &MIPS Instruction Set Architecture & Single Cycle Datapath and Control

OutlineOutline

• Admin• Choices and Basic Design Principles• RISC Architecture• Major Types of Instructions

– Arithmetic InstructionsL d St I t ti– Load Store Instructions

– Control Instructions• Datapath & ControlDatapath & Control 

– Single Cycle Implementation– Multi Cycle Implementation

AdminAdmin

‐ Website with lecture slides:Website with lecture slides:http://www.ece.ucsb.edu/~strukov/ece154aFall2013/ece154A.htm2013/ece154A.htm

‐ Check the reading assignments on the web‐ HW#1 is online and due next Monday 11 pm‐ HW#1 is online and due next Monday 11 pm (hw box) 

‐ Tentative midterms dates:‐ Tentative midterms dates:‐ #1: October 29th

‐ #2: November 19th‐ #2: November 19

Choices and Basic Design Principles in g pComputer Architecture

Simple Computer  Store‐program (Von‐Neumann) computer

Algorithm for F = A x B + C / D

Step 1: Temp1  = A x B

/

Memoryaddresses

Step 2: Temp2  = C / D

Step 3: F = Temp1 + Temp2Control data

Datapathoperation

Performance = 1 / Exec Time = 1 / CCT x IC x CPI 

Read A and B from 

Read C and D from 

Read temp1 and 

Load first instructio

n to

memory, compute  temp1, write 

temp1 to

Load second instructio

memory, compute  temp2, write 

temp2 to

Load second instructio

temp2 from 

memory, compute  F, write F to

time

n to control

temp1 to memory n to 

control

temp2 to memory n to 

control

write F to memory

Improving PerformanceMemoryaddresses

• Performance depends on

Control data

• Performance depends on– Algorithm: affects IC, possibly 

CPI– Programming language: affects

Datapathoperation

– Programming language: affects IC, CPI

– Compiler: affects IC, CPI– Instruction set architecture:Instruction set architecture: 

affects IC, CPI, CCT

Performance = 1 / Exec Time = 1 / CCT x IC x CPI 

Control

• Instruction is encoded as a Memoryaddresses

sequence of bits (in memory)– E.g. instruction may have 

encoded the type operation with data and memory addresses of

Control datadata and memory addresses of data

• Control circuitry decodes instruction into a set of

Datapathoperation

– decodes instruction into a set of or sequence of (for multicycleimplementation) of control signals, which orchestrates the gdata movement and date processing

– Should also take care of choosing the next instruction (FSM)the next instruction  (FSM)

Control: Program Counter

• Program counter =Memoryaddresses

Program counter = register with currently executed instruction Control dataexecuted instruction

Wi h i lDatapath

operation

• With sequential execution

1

Store Program ArchitectureStore Program Architecture

• Where do we store program?Where do we store program?• Two options:

– In separate memory or hardwiredIn separate memory or hardwired(Harvard architecture)

– In the same memory where the data are kept( t hit t )

Memoryaddresses

(store program architecture)

Cons and pros?

Control data

Cons and pros?Datapath

operation

Store Program ArchitectureStore Program Architecture

• Where do we store code?• Two options:

– In separate memory(Harvard architecture)‐ Higher throughput / faster read

– In the same memory where the data are kept(store program architecture)‐ Can modify code from the program (plus in 

Memoryaddresses

y p g (pgeneral but could be a problem for security)

‐ More efficient use of memory ‐ Better CCT and/or CPI

Control data

Datapathoperation

Two Key Principles of Machine Design

Memory

1. Instructions are represented as numbers and, as such, are indistinguishable from data

Accounting prg(machine code)

2. Programs are stored in alterable memory (that can be read or written to)

just like data

C compiler  (machine code)

jPayroll         data

Source code in CSource code in C for Acct prg

Operands of InstructionsOperands of Instructions

• Choices?C o ces?– Directly on memory

• have instructions in which addresses of input data and t t d t di tl ifi d dd i ioutput data are directly specified as addresses in main 

memory

– Only with small (local) memory called register file (so‐called LOAD‐STORE ARCHITECTURE)

• have instructions in which addresses of input data and output data are directly specified as addresses in local p y pmemory

• have additional instruction which move data between local and main memoryy

Load Store ArchitectureBEFORE 

Read A and B from

Read C and D from

Read temp1 and 

Memoryaddresses

Load first 

instruction to control

from memory, compute  temp1, write 

temp1 to memory

Load second instruction to control

from memory, compute  temp2, write 

temp2 to memory

Load second instruction to control

temp2 from 

memory, compute  F, write F 

to 

Control data

UCSB | ECE 154A | Fall 2013 timecontrol memory control memory control memory

Datapathoperation

LOAD‐STORE ARCHITECTURE

Memoryaddresses

Load first

Read A and B from

Write temp1 to

compute  temp1, and write

Control data

data

time

first instruction to control

from memory to local memory

temp1 to  main 

memory

and write temp1 to local 

memory RF

Datapathoperation

Algorithm for F = A x B + C / DStep 1: Temp1  = A x BStep 2: Temp2  = C / DStep 3: F = Temp1 + Temp2

Load Store Architecture: Effect on P f ?Performance?

Memoryaddresses

Load first

Read A and B from

Write temp1 to

compute  temp1, and write

Controldata

data

time

first instruction to control

from memory to local memory

temp1 to  main 

memory

and write temp1 to local 

memory

Datapathoperation

Algorithm for F = A x B + C / DStep 1: Temp1  = A x BStep 2: Temp2  = C / DStep 3: F = Temp1 + Temp2

Load Store Architecture: Effect on P f ?Performance?

‐ IC is worse‐ CPI x CCT is better

‐ Large memory = large delay (CCT or CPI)g y g y ( )‐ Temporal locality of data 

‐ better code density (smaller opfields)

Load first

Read A and B from

Write temp1 to

compute  temp1, and write

y ( p )Memory

addresses

time

first instruction to control

from memory to local memory

temp1 to  main 

memory

and write temp1 to local 

memoryControl data

data

RF

Algorithm for F = A x B + C / DStep 1: Temp1  = A x BStep 2: Temp2  = C / DStep 3: F = Temp1 + Temp2

Datapathoperation

Load Store Architecture: Effect on P f ?Performance?

‐ IC is worse‐ CPI x CCT is better

‐ Large memory = large delay (CCT or CPI)g y g y ( )‐ Temporal locality of data 

‐ better code density (smaller opfields)

Load first

Read A and B from

Write temp1 to

compute  temp1, and write

y ( p )Memory

addresses

time

first instruction to control

from memory to local memory

temp1 to  main 

memory

and write temp1 to local 

memoryControl data

data

RF

Do not have to do this step!Algorithm for F = A x B + C / DStep 1: Temp1  = A x BStep 2: Temp2  = C / DStep 3: F = Temp1 + Temp2

Datapathoperation

Do not have to do this step!

Operands of Instructions: VariationsOperands of Instructions: Variations

• Accumulator architectureAccumulator architecture

‐ Results of operations are always stored in special (accumulator) registerspecial (accumulator) register

• Stack architecture

‐ Datapath always operates with recent data (which are at the top of a stack)

‐ Cons and pros?Cons and pros?

Choice of Instructions?Choice of Instructions?

• Fixed length vs flexible lengthFixed length vs. flexible length

• Length of instruction, i.e. few vs. many1 i t ti i h (OISC)– 1 instruction is enough (OISC) 

• Subtract and Branch if Less than or Equal to zero

Choice of Instructions?Choice of Instructions?

• Fixed length  = simpler design– Easy decoding (faster CCT) …– …but could be sparser code (higher IC)

CISC (complex instruction set computing) Examples: x86   (Intel Atom, Intel Core, AMD Opteron), Motorola p ( , , p ),

68k, VAXvs. 

RISC (reduced instruction set computing)  Examples: MIPS (focus of this class, Sony PlayStation 2), ARM ( 

Apple A5x (ipad), Qualcomm snapdragon, Cortex‐A9 (Microsoft  surface),  Nvidea Tegra)

How Many Bits in One Register?

• 8‐bit Intel 8080 processor (1974)8 bit Intel 8080 processor   (1974)Memory

addresses

data

• 32‐bit for mobile and 64‐bit for high Control

Datapath

data

ti

performance processors today– Could be much larger for vector processors 

operation

RISC (MIPS) Architecture 

MIPS (RISC) Design Principlesl f l• Simplicity favors regularity

– fixed size instructions– small number of instruction formats– opcode always the first 6 bits

• Smaller is fasterli it d i t ti t– limited instruction set

– limited number of registers in register file– limited number of addressing modes  (TBD)

• Make the common case fast– arithmetic operands from the register file (load‐store machine))

– allow instructions to contain immediate operands  (TBD)

• Good design demands good compromises• Good design demands good compromises– three instruction formats

MIPS‐32 ISA• Instruction Categories

– Computational: Arith, Shift, Logical– Memory transfer: Load/Store R0 ‐ R31

Registers

Memory transfer: Load/Store – Control: Jump and Branch– Others:

• Floating Point• Floating Point– coprocessor

• Memory Management• Special

PCHI

LOSpecial

3 Instruction Formats: all 32 bits wide

op

op

rs rt rd sa funct

rs rt immediate

R format

I format

op jump target J format

MIPS Register FileRegister File

H ld hi 32 bi ig

src1 addr

dd

32 bits

src1data

325

5

• Holds thirty‐two 32‐bit registers– Two read ports and– One write port

src2 addr

dst addr

write datasrc2data

32locations

32

5

5

32 Registers arewrite data data

Faster than main memory‐ But register files with more locations                                            are slower (e.g., a 64 word file could                                              b h l h f l )

write control

be as much as 50% slower than a 32 word file)

‐ Read/write port increase impacts speed quadratically

Easier for a compiler to use‐ e.g., (A*B) – (C*D) – (E*F) can do multiplies in any order vs. stack

Can hold variables so that‐ code density improves (since register are named with fewer bits than a memory location)

Aside:  MIPS Register ConventionName Register

NumberUsage Preserve

on call?$zero 0 constant 0 (hardware) n a$zero 0 constant 0 (hardware) n.a.$at 1 reserved for assembler n.a.$v0 - $v1 2-3 returned values no$a0 - $a3 4-7 arguments yes$t0 - $t7 8-15 temporaries no$ $$s0 - $s7 16-23 saved values yes$t8 - $t9 24-25 temporaries no$gp 28 global pointer yes$gp 28 global pointer yes$sp 29 stack pointer yes$fp 30 frame pointer yesp p y$ra 31 return addr (hardware) yes

Memory OperandsMemory Operands

• To apply computational operationspp y p p– Load values from memory into registers– Store result from register to memory

• Memory is byte addressed (for historic reasons)• Memory is byte addressed (for historic reasons)– Each address identifies an 8‐bit byte

• Words are aligned in memory– Address must be a multiple of 4 (last two bits are always 0)

MIPS Instruction Fields

• MIPS fields are given names to make them easier to refer to

op           rs rt rd         shamt funct

op 6‐bits opcode that specifies the operation

rs 5 bits register file address of the first source operandrs 5‐bits register file address of the first source operand

rt 5‐bits register file address of the second source operand

rd 5‐bits register file address of the result’s destination

shamt 5‐bits shift amount (for shift instructions)

funct 6‐bits function code augmenting the opcode

Levels of RepresentationHigh Level Language

Program (e.g., C)

Compiler

temp = v[k];

v[k] = v[k+1];

v[k+1] = temp;

focus of discussion now

ldr r0, [r2]ldr r1, [r2, #4]str r1, [r2]str r0, [r2, #4]

Assembly  Language Program (e.g.,ARM)

Compiler

AssemblerMachine Language

Program (ARM)

Assembler

Machine

0000 1001 1100 0110 1010 1111 0101 10001010 1111 0101 1000 0000 1001 1100 0110 1100 0110 1010 1111 0101 1000 0000 1001 0101 1000 0000 1001 1100 0110 1010 1111

Hardware Architecture Description (e.g., block diagrams)

Machine Interpretation

( g , g )

Architecture Implementation

Logic Circuit Description(Circuit Schematic Diagrams)

Dan Garcia

Major Types of Instructions

MIPS Arithmetic Instructions

• MIPS assembly language arithmetic statementadd $t0 $s1 $s2add $t0, $s1, $s2sub $t0, $s1, $s2

Each arithmetic instruction performs one operation

Each specifies exactly three operands that are all contained in the datapath’s register file ($t0,$s1,$s2) 

destination   source1    op source2

Instruction Format (R format)

0 17 18 8 0 0 220               17              18              8                0              0x22

MIPS Arithmetic Instructions• MIPS assembly language arithmetic statement

add $t0, $s1, $s2sub $t0, $s1, $s2

Each arithmetic instruction performs one operationp p

Each specifies exactly three operands that are all contained in the datapath’s register file ($t0,$s1,$s2) 

destination   source1    op source2

Instruction Format (R format) Instruction Format (R format)

0             17                 18           8                    0           0x22

MIPS Shift Operations• Need operations to pack and unpack 8‐bit characters into 32‐bit words

• Shifts move all the bits in a word left or rightsll $t2, $s0, 8 #$t2 = $s0 << 8 bitssll $t2, $s0, 8 #$t2 $s0 << 8 bitssrl $t2, $s0, 8 #$t2 = $s0 >> 8 bits

• Instruction Format (R format)( )

0               16             10           8     0x00

Such shifts are called logical because they fill with zeros Notice that a 5‐bit shamt field is enough to shift a 32‐bit value 25 –1 31 bit iti1 or 31 bit positions

also have sllv, srlv, and srav

MIPS Logical Operations• There are a number of bit‐wise logical operations in the MIPS 

ISA

d $t0 $t1 $t2 #$t0 $t1 & $t2and $t0, $t1, $t2 #$t0 = $t1 & $t2

or $t0, $t1, $t2 #$t0 = $t1 | $t2

nor $t0, $t1, $t2 #$t0 = not($t1 | $t2)

• Instruction Format (R format)

andi $t0, $t1, 0xFF00 #$t0 = $t1 & ff00

i $t0 $t1 0 FF00 #$t0 $t1 | ff00

0               9           10     8          0              0x24  

ori $t0, $t1, 0xFF00 #$t0 = $t1 | ff00

• Instruction Format (I format)

0x0D          9         8                                0xFF00

MIPS Immediate Instructions S ll t t d ft i t i l d Small constants are used often in typical code

Possible approaches?t “t i l t t ” i d l d th put “typical constants” in memory and load them 

create hard‐wired registers (like $zero) for constants like 1

have special instructions that contain constants !

addi $sp, $sp, 4 #$sp = $sp + 4slti $t0, $s2, 15 #$t0 = 1 if $s2<15

M hi f t (I f t)• Machine format (I format):

0x0A          18          8                                   0x0F

Best approach: the constant is kept inside the instruction itself! Best approach: the constant is kept inside the instruction itself! Immediate format limits values to the range +215–1 to ‐215

Note that how the constant are treated is determined by the type of instruction e.g.    in addi constant is two’s complement ; addiu constant is unsigned number  

Review: Unsigned Binary Integers• Given an n‐bit number

0121 00

11

2n2n

1n1n 2x2x2x2xx

Range: 0 to +2n – 1 Range: 0 to +2 1

Example0000 0000 0000 0000 0000 0000 0000 1011 0000 0000 0000 0000 0000 0000 0000 10112= 0 + … + 1×23 + 0×22 +1×21 +1×20

= 0 + … + 8 + 0 + 2 + 1 = 1110

Using 32 bits 0 to +4,294,967,295, , ,

Review: 2s‐Complement Signed Integers

• Given an n‐bit number0121 0

01

12n

2n1n

1n 2x2x2x2xx

Range: –2n – 1 to +2n – 1 – 1 Range:  2 to +2 1

Example1111 1111 1111 1111 1111 1111 1111 1100 1111 1111 1111 1111 1111 1111 1111 11002= –1×231 + 1×230 + … + 1×22 +0×21 +0×20

= –2,147,483,648 + 2,147,483,644 = –410

Using 32 bits –2,147,483,648 to +2,147,483,647, , , , , ,

Review: 2s‐Complement Signed Integers

• Bit 31 is sign bitg– 1 for negative numbers– 0 for non‐negative numbers

• ( 2n – 1) can’t be represented• –(–2n  1) can t be represented• Non‐negative numbers have the same unsigned and 2s‐complement representation

• Some specific numbers– 0: 0000 0000 … 0000

1: 1111 1111 1111– –1: 1111 1111 … 1111– Most‐negative: 1000 0000 … 0000– Most‐positive: 0111 1111 … 1111

Review: Signed Negation• Complement and add 1

Complement means 1→ 0 0→ 1– Complement means 1 → 0, 0 → 1

11111...111xx 2

x1x

2

Example: negate +2 +2 = 0000 0000 … 001022 –2 = 1111 1111 … 11012 + 1

= 1111 1111 … 111022

Sign Extension g

• Representing a number using more bitsp g g– Preserve the numeric value

• In MIPS instruction setaddi: extend immediate value– addi: extend immediate value

– lb, lh: extend loaded byte/halfword (will discuss later)– beq, bne: extend the displacement    (will discuss later)

• Replicate the sign bit to the left– c.f. unsigned values: extend with 0s

• Examples: 8 bit to 16 bit• Examples: 8‐bit to 16‐bit– +2: 0000 0010 => 0000 0000 0000 0010– –2: 1111 1110 => 1111 1111 1111 1110

MIPS Memory Access Instructions

• MIPS has two basic data transfer instructions for accessing memorylw $t0, 4($s3) #load word from memorysw $t0, 8($s3) #store word to memory

• The data is loaded into (lw) or stored from (sw) a register in the register file

The memory address – a 32 bit address – is formed by adding y y gthe contents of the base address register to the offset value A 16‐bit field meaning access is limited to memory locations within a region of 213 or 8,192 words (215 or 32,768 bytes) of the address in g ythe base register

L d/S I i F (I f )

Machine Language ‐ Load Instruction• Load/Store Instruction Format (I format):

lw $t0, 24($s3)

35            19             8                       2410

Memory0xf f f f f f f f2410 + $s3 =

$s3 0x12004094

. . . 0001 1000+ . . . 1001 0100. . . 1010 1100 =

0x120040ac$t0

0 000000040x000000080x0000000c

$s3. . . 1010 1100 0x120040ac

data word address (hex)0x000000000x00000004

Byte Addresses

• Most architectures address individual bytes in memory– Alignment restriction ‐ the memory address of a wordmust be on natural word boundaries (a multiple of 4 inmust be on natural word boundaries (a multiple of 4 in MIPS‐32)

• Big Endian: leftmost byte is word address/IBM 360/370, Motorola 68k, MIPS, Sparc, HP PA

• Little Endian: rightmost byte is word addressIntel 80x86, DEC Vax, DEC Alpha (Windows NT), , p ( )

3          2          1           0little endian byte 0

msb lsb

0          1          2           3big endian byte 0big endian byte 0

Aside: Loading and Storing Bytesd l b• MIPS provides special instructions to move bytes

lb $t0, 1($s3) #load byte from memorysb $t0, 6($s3) #store byte to memory$ , ($ ) # y y

0x28          19             8                 16 bit offset

Wh t 8 bit t l d d d t d? What 8 bits get loaded and stored?

load byte places the byte from memory in the rightmost 8 bits of the destination register

‐ what happens to the other bits in the register?

store byte takes the byte from the rightmost 8 bits of a register and writes it to a byte in memorywrites it to a byte in memory

‐ what happens to the other bits in the memory word?

S ll ( 16 bit) t t b l d d ith ddi i t ti

Aside: How About Larger Constants?• Small (<16 bit) constants can be loaded with addi instruction

• We'd also like to be able to load a 32 bit constant into a register, for this we must use two instructions

• a new "load upper immediate" instructionlui $t0, 1010101010101010

• Then must get the lower order bits right, use          ori $t0, $t0, 1010101010101010

16             0           8           101010101010101022

1010101010101010 00000000000000001010101010101010

0000000000000000 1010101010101010

0000000000000000

1010101010101010               1010101010101010

MIPS Control Flow Instructions• MIPS conditional branch instructions:bne $s0, $s1, Lbl #go to Lbl if $s0$s1 gbeq $s0, $s1, Lbl #go to Lbl if $s0=$s1

Ex: if (i j) h i + j– Ex: if (i==j) h = i + j;bne $s0, $s1, Lbl1add $s3, $s0, $s1

Lbl1: ...

Instruction Format (I format): Instruction Format (I format):

0x05           16          17                      16 bit offset

How is the branch destination address specified?

Specifying Branch Destinations(l k l d ) dd d h b ff• Use a register (like in lw and sw) added to the 16‐bit offset

– which register?  Instruction Address Register  (the PC)• its use is automatically implied by instruction

h b l d d d ( ) h h ld h dd f h• PC might be already updated (PC+4) so that it holds the address of the next instruction

– limits the branch distance to ‐215 to +215‐1 (word) instructions from the (instruction after the) branch instruction, but mostfrom the (instruction after the) branch instruction, but most branches are local anyway

from the low order 16 bits of the branch instruction

offset

16

sign‐extend

PCAdd

32 323232

00

branch dstaddress

Add

32

3232 ?

Add4 32

h b h b h k d f

In Support of Branch Instructions• We have  beq, bne, but what about other kinds of branches (e.g., branch‐if‐less‐than)?  For this, we need yet another instruction, slt

• Set on less than instruction:slt $t0, $s0, $s1 # if $s0 < $s1 then

# $ 0 1 l# $t0 = 1 else # $t0 = 0

• Instruction format (R format):

• Alternate versions of slt0            16             17               8                                  0x24

slti $t0, $s0, 25 # if $s0 < 25 then $t0=1 ...

sltu $t0, $s0, $s1# if $s0 < $s1 then $t0=1 ...

sltiu $t0, $s0, 25# if $s0 < 25 then $t0=1 ...

2

Aside:  More Branch Instructions• Can use slt, beq, bne, and the fixed value of 0 in register $zero to create other conditions– less than   blt $s1, $s2, Label

slt $at, $s1, $s2 #$at set to 1 if

– less than or equal to  ble $s1, $s2, Label

bne $at, $zero, Label #$s1 < $s2

q , ,– greater than   bgt $s1, $s2, Label– great than or equal to   bge $s1, $s2, Label

Such branches are included in the instruction set as pseudo instructions ‐ recognized (and expanded) by the assembler Its why the assembler needs a reserved register ($at)

l h d l b h

Other Control Flow Instructions• MIPS also has an unconditional branch instruction or jump instruction:

j label #go to label Instruction Format (J Format):

0x02                                  26‐bit address

from the low order 26 bits of the jump instruction26

432

00

PC 32

Aside:  Branching Far Away

• What if the branch destination is further away than can be captured in 16 bits?be captured in 16 bits?

The assembler comes to the rescue – it inserts an unconditional jump to the branch target and inverts the condition

beq $s0 $s1 L1beq $s0, $s1, L1

becomes

bne $s0, $s1, L2j L1

L2:L2:

Another Example: If Statements

• C code:if (i==j) f = g+h;else f = g-h;

– f, g, … in $s0, $s1, …• Compiled MIPS code:p

bne $s3, $s4, Elseadd $s0, $s1, $s2j ij Exit

Else: sub $s0, $s1, $s2Exit: …

Assembler calculates addresses

Another Way of Describing What Instructions DoAssembly instruction What it does  (Verilog‐like format)

add      Rd, Rs, Rt RF[Rd] = RF[Rs] + RF[Rt]

addi Rt, Rs, Imm RF[Rt] = RF[Rs] + se Imm

and Rd Rs Rt RF[Rd] = RF[Rs] AND RF[Rt]and      Rd, Rs, Rt RF[Rd]   RF[Rs] AND  RF[Rt] 

andi Rt, Rs, Imm RF[Rt] = RF [Rs] AND ze Imm

sll Rd, Rt, sa RF[Rd] = RF[Rt] << sa

ll Rd Rt R RF[Rd] RF[Rt] RF[R ]sllv Rd, Rt, Rs RF[Rd] = RF[Rt] << RF[Rs]

sra Rd, Rt, sa RF[Rd] = RF[Rt] >> sa (padding with msb)

srl Rd, Rt, sa RF[Rd] = RF[Rt] >> sa (padding with 0)

lb         Rt, offset(Rs) RF[Rt] = se (Mem[RF[Rs] + se Offset])

lbu Rt, offset(Rs) RF[Rt] = ze (Mem[RF[Rs] + se Offset])

lui Rt, Imm RF[Rt]  = Imm <<16  |  0x0000

lw Rt, offset(Rs) RF[Rt] = Mem[RF[Rs] + se Offset]

sw Rt, offset(Rs) Mem[RF[Rs] + se Offset] = RF[Rt]

beq Rs, Rt, Label If (RF[Rs] == RF[Rt] ) then PC = PC + 4 + se (Imm <<2)beq Rs, Rt, Label If (RF[Rs]   RF[Rt] ) then PC   PC   4   se (Imm 2) 

j            Label PC = PC(31:28)    I    Imm << 2

slti Rt, Rs, Imm If (RF[Rs] < se Imm) then RF[Rt] = 1 else RF[Rt] = 0

MIPS Addressing ModesAddressing Instruction Other elements involved OperandAddressing Instruction Other elements involved Operand

Implied

I di t

Some place in the machine

ExtendImmediate

Register

Extend, if required

Reg f ile Reg spec Reg data

Base Memory

Add Reg file

Mem addr

Constant offset

Reg base Reg data

Mem data

PC-relative Add

PC

Constant offset

Memory

Mem addr Mem

data Incremented

Schematic representation of addressing modes in MIPS.

Pseudodirect Memory

Mem data

PC Mem addr

p g

More Elaborate Addressing Modes

Addressing Instruction Other elements involved Operand

Memory Add

Reg f ile Mem addr Mem

data Index reg

Indexed x := B[i]

Base reg

Memory Reg f ile

Mem addr Mem

data

Increment amount

Base reg

Update (with base) Incre-

ment

x := Mem[p]p := p + 1

Update (with indexed) Memory Add

Reg f ile Mem addr Mem

data Index reg Base reg

Increment

x := B[i]i := i + 1

Mem data PC

Mem addrMemory

Indirect

amount

Memory

Increment

t := Mem[p]x := Mem[t]

Schematic representation of more elaborate addressing

Mem addr, 2nd access

Mem data, 2nd access

This part maybe replaced with any other form of address specif ication x := Mem[Mem[p]]

Schematic representation of more elaborate addressing modes not supported in MIPS.

C to Assembly for Loop Statements• C code:while (save[i] == k) i += 1;while (save[i] == k) i += 1;

– i in $s3, k in $s5, address of save in $s6• Compiled MIPS code:Compiled MIPS code:Loop: sll $t1, $s3, 2 #t1 = i*4

add $t1, $t1, $s6lw $t0 0($t1)lw $t0, 0($t1)bne $t0, $s5, Exitaddi $s3, $s3, 1j Loopj Loop

Exit: …

‐ There are multiple ways of translating c code to assembly!‐ The fewer instructions count the faster execution time (neglecting other 

complications like the effect on CPI)!

Assembly to Binary for Loop Example

• Loop code from earlier exampleLoop code from earlier example– Assume Loop at location 80000

Loop: sll $t1, $s3, 2 80000 0 0 19 9 4 0

add $t1, $t1, $s6 80004 0 9 22 9 0 32

lw $t0, 0($t1) 80008 35 9 8 0

bne $t0, $s5, Exit 80012 5 8 21 2

addi $s3, $s3, 1 80016 8 19 19 1

j Loop 80020 2 20000

Exit: 80024Exit: … 80024

More Examples: Okay For‐Loop Code

C code:   for(i = 1; i <= 10; i++)  A[i] = A[i] + 1; 

Okay assembly code:  assume  i: $s1,   base of A: $s2   

addi $s1, $0, 1 # i = 1, ,LOOP: slti $t0, $s1, 11 # LOOP: if(i < 11)  $t0 = 1 else $t0 = 0 

beq $t0, $0, END # if ($t0==0)  goto END;  else {sll $t0, $s1, 2 # $t0 = i *4;add $t0 $s2 $t0 # $t0 = $t0 + addr A;add       $t0, $s2, $t0       # $t0 = $t0 + addr A;lw $t1, 0($t0) # $t1 = A[i];addi $t1, $t1, 1 # $t1 = $t1 + 1;sw $t1, 0($t0) # A[i] = $t1;addi $s1, $s1, 1 # i = i + 1;j              LOOP # goto LOOP }

END: # END:

2 control flow instructions + 7 other instructions in the loop

More Examples: Better For‐Loop Code

C code:   for(i = 1; i <= 10; i++)  A[i] = A[i] + 1; 

Better assembly code:  assume  i:$s1,   base of A:$s2   

addi $s1 $0 1 # i 1;addi $s1, $0, 1 # i = 1;addi $t0, $s2, 4 # $t0 = addr A + 4;  (* pointer to A[1] *)addi $t2, $0, 11 # $t2 = 11;

LOOP: lw $t1 0($t0) # do { $t1 = A[i];LOOP: lw $t1, 0($t0) # do { $t1 = A[i];addi $t1, $t1, 1 #         $t1 = $t1 + 1;sw $t1, 0($t0) #         A[i] = $t1;addi $s1, $s1, 1 #         i = i + 1;$ , $ , ;addi $t0, $t0, 4 #        $t0 = $t0 + 4; }  bne $s1, $t2, LOOP # while (i != 11);

1 control flow instructions + 5 other instructions in the loop

More Examples: Even Better For‐loop CodeCode

C code:   for(i = 1; i <= 10; i++)  A[i] = A[i] + 1; 

Even better assembly code:          assume  i: $s1,   base of A: $s2   

addi $t0, $s2, 4 # $t0 = addr A + 4;  (* pointer to a[1] *)$ , $ , $ ; ( p [ ] )addi $t2, $t0, 40 # $t2 = $t0 + 40;  (* pointer to a[11] *)

LOOP: lw $t1, 0($t0) # do { $t1 = A[i];addi $t1, $t1, 1 #         $t1 = $t1 + 1;sw $t1, 0($t0) #         A[i] = $t1;addi $t0, $t0, 4 #         $t0 = $t0 + 4; }  bne $t0, $t2, LOOP # while ($t2 != $t0);addi $s1, $0, 11 # i =  11

(note that in this case the variable i is not used at all The last line is just to make C code functionally(note that in this case the variable i is not used at all. The last line is just to make C code functionally equivalent to assembly code, since in  C variable i will be equal to 11 after the completion of the loop)

1 control flow instructions + 4 other instructions in the loop

Single Cycle and Multi CycleSingle Cycle and Multi Cycle Datapath and Control 

Our implementation of the MIPS is simplifiedProcessor Datapath and Control Our implementation of the MIPS is simplified

memory-reference instructions: lw, sw arithmetic-logical instructions: add, sub, and, or, slt control flow instructions: beq, j

Generic implementation use the program counter (PC) to supply

the instruction address and fetch the instruction from memory (and update the PC)

FetchPC = PC+4

DecodeExec

decode the instruction (and read registers) execute the instruction

All instructions (except j) use the ALU after reading the All instructions (except j) use the ALU after reading the registers

CSE431 Chapter 4A.61 Irwin, PSU, 2008

How? memory-reference? arithmetic? control flow?

Review: Clocking Methodologies The clocking methodology defines when data in a state

Review: Clocking Methodology The clocking methodology defines when data in a state

element is valid and stable relative to the clock State elements - a memory element such as a register Edge-triggered – all state changes occur on a clock edge

Typical execution read contents of state elements -> send values through g

combinational logic -> write results to one or more state elementsState

element1

Stateelement

2

Combinationallogic

1 2

clock

one clock cycle

Assumes state elements are written on every clock cycle; if not, need explicit write control signal

CSE431 Chapter 4A.62 Irwin, PSU, 2008

write occurs only when both the write control is asserted and the clock edge occurs

Fetching Instructions Fetching instructions involves

Fetching Instruction Fetching instructions involves

reading the instruction from the Instruction Memory updating the PC value to be the address of the next

(sequential) instruction(sequential) instruction

Addclock

InstructionMemory

dd

4Fetch

PC = PC+4

clock

ReadAddress

Instruction

Memory

PCDecodeExec

PC is updated every clock cycle, so it does not need an explicit write control signal just a clock signal

CSE431 Chapter 4A.63 Irwin, PSU, 2008

Reading from the Instruction Memory is a combinational activity, so it doesn’t need an explicit read control signal

Decoding Instructions Decoding instructions involves

Decoding Instruction Decoding instructions involves

sending the fetched instruction’s opcode and function field bits to the control unit

ControlUnit

FetchPC = PC+4

Read Addr 1Register Read

DecodeExec

and Instruction

Write Data

Read Addr 2

Write Addr

Register

FileData 1

ReadData 2

reading two values from the Register File

CSE431 Chapter 4A.64 Irwin, PSU, 2008

- Register File addresses are contained in the instruction

Executing R Format Operations R format operations (add sub slt and or)Executing R Format Instruction R format operations (add, sub, slt, and, or)

R-type:31 25 20 15 5 0

op rs rt rd functshamt

10

perform operation (op and funct) on values in rs and rt store the result back into the Register File (into location rd)

p

R d Add 1

ALU controlRegWrite

Instruction

Read Addr 1

Read Addr 2

Write Addr

Register

File

ReadData 1

ReadALU

overflowzero

FetchPC = PC+4

DecodeExecWrite Data

Data 2

N t th t R i t Fil i t itt l ( )

CSE431 Chapter 4A.65 Irwin, PSU, 2008

Note that Register File is not written every cycle (e.g. sw), so we need an explicit write control signal for the Register File

Executing Load and Store Operations Load and store operations involvesExecuting Load Instruction Load and store operations involves

compute memory address by adding the base register (read from the Register File during decode) to the 16-bit signed-extended offset field in the instructionoffset field in the instruction

store value (read from the Register File during decode) written to the Data Memory

load value read from the Data Memory written to the Register load value, read from the Data Memory, written to the Register File

R d Add 1

overflowzero

ALU controlRegWrite MemWrite

Instruction

Read Addr 1

Read Addr 2

Write Addr

Register

File

ReadData 1

ReadALU

zero

DataMemory

Address

W it D t

Read Data

Write DataData 2 Write Data

Sign MemRead

CSE431 Chapter 4A.66 Irwin, PSU, 2008

Extend16 32

Executing Branch Operations Branch operations involvesExecuting Branch Instruction Branch operations involves

compare the operands read from the Register File during decode for equality (zero ALU output)

compute the branch target address by adding the updated PC to compute the branch target address by adding the updated PC to the 16-bit signed-extended offset field in the instr

AddAdd

Branchtarget

ALU control

Shiftleft 2

4 Add targetaddress

I t ti

Read Addr 1

Read Addr 2Register Read

Data 1

zero

PC

(to branch control logic)

Instruction

Write Data

Read Addr 2

Write AddrFile

ReadData 2

ALU

CSE431 Chapter 4A.67 Irwin, PSU, 2008

SignExtend16 32

Executing Jump Operations Jump operation involvesExecuting Jump Instruction Jump operation involves

replace the lower 28 bits of the PC with the lower 26 bits of the fetched instruction shifted left by 2 bits

Add

44

Read Instruction

InstructionMemory

PC

Shiftleft 2

Jumpaddress

4

28

AddressInstructionPC

26

CSE431 Chapter 4A.68 Irwin, PSU, 2008

Creating a Single Datapath from the PartsCreating a Single Datapath Assemble the datapath segments and add control lines

and multiplexors as needed Single cycle design fetch decode and execute each Single cycle design – fetch, decode and execute each

instructions in one clock cycle no datapath resource can be used more than once per

instruction, so some must be duplicated (e.g., several adders) multiplexors needed at the input of shared elements with

control lines to do the selection write signals to control writing to the Register File and Data

Memory

Cycle time is determined by length of the longest path

CSE431 Chapter 4A.69 Irwin, PSU, 2008

Fetch, R, and Memory Access PortionsFetch, R, and Memory Access Portions

MemtoReg

Instruction

Add

4

Read Addr 1Read

ovfzero

ALU controlRegWrite

Address

MemWriteALUSrc

ReadAddress

Instruction

st uct oMemory

PC

W it D t

Read Addr 2

Write Addr

Register

File

ReadData 1

ReadData 2

ALUData

Memory

Address

Write Data

Read Data

Write Data

MemReadSign

Extend16 32Extend16 32

CSE431 Chapter 4A.70 Irwin, PSU, 2008

Adding the Control Selecting the operations to perform (ALU Register FileAdding Control Selecting the operations to perform (ALU, Register File

and Memory read/write) Controlling the flow of data (multiplexor inputs)

31 25 20 15 0

R-type:31 25 20 15 5 0

op rs rt rd functshamt

10

Ob tiI-Type: op rs rt address offset

31 25 20 15 0 Observations op field always

in bits 31-26 31 25 0 addr of registers

to be read are always specified by the

fi ld (bit 25 21) d t fi ld (bit 20 16) f l d i th b

J-type: op target address

rs field (bits 25-21) and rt field (bits 20-16); for lw and sw rs is the base register

addr. of register to be written is in one of two places – in rt (bits 20-16) for lw; in rd (bits 15 11) for R type instructions

CSE431 Chapter 4A.71 Irwin, PSU, 2008

for lw; in rd (bits 15-11) for R-type instructions

offset for beq, lw, and sw always in bits 15-0

Single Cycle Datapath with Control UnitSingle Cycle Datapath & Control

Add

4 Shiftleft 2

Add

PCSrc

0

1

MemWrite

MemReadMemtoReg

ALUSrc

left 2ALUOp

ControlUnit

Instr[31-26]

Branch

Instruction Read Addr 1R i t Read

ovf

RegWrite

Address

RegDst

Instr[25-21]

ReadAddress

Instr[31-0]

Memory

PC

Write Data

Read Addr 2

Write Addr

Register

File

ReadData 1

ReadData 2

ALU

zeroData

Memory

Address

Write Data

Read Data 1

1

1

00

0

Instr[20-16]

Instr[15 Write Data

SignExtend16 32

ALUcontrol

1

Instr[15-0]

-11]

CSE431 Chapter 4A.72 Irwin, PSU, 2008

Instr[5-0]

R-type Instruction Data/Control FlowR-Type Instruction Datapath & Control Flow

Add

4 Shiftleft 2

Add

PCSrc

0

1

MemWrite

MemReadMemtoReg

ALUSrc

left 2ALUOp

ControlUnit

Instr[31-26]

Branch

Instruction Read Addr 1R i t Read

ovf

RegWrite

Address

RegDst

Instr[25-21]

ReadAddress

Instr[31-0]

Memory

PC

Write Data

Read Addr 2

Write Addr

Register

File

ReadData 1

ReadData 2

ALU

zeroData

Memory

Address

Write Data

Read Data 1

1

1

00

0

Instr[20-16]

Instr[15 Write Data

SignExtend16 32

ALUcontrol

1

Instr[15-0]

-11]

CSE431 Chapter 4A.73 Irwin, PSU, 2008

Instr[5-0]

Load Word Instruction Data/Control FlowLoad Word Instruction Datapath & Control Flow

Add

4 Shiftleft 2

Add

PCSrc

0

1

MemWrite

MemReadMemtoReg

ALUSrc

left 2ALUOp

ControlUnit

Instr[31-26]

Branch

Instruction Read Addr 1R i t Read

ovf

RegWrite

Address

RegDst

Instr[25-21]

ReadAddress

Instr[31-0]

Memory

PC

Write Data

Read Addr 2

Write Addr

Register

File

ReadData 1

ReadData 2

ALU

zeroData

Memory

Address

Write Data

Read Data 1

1

1

00

0

Instr[20-16]

Instr[15 Write Data

SignExtend16 32

ALUcontrol

1

Instr[15-0]

-11]

CSE431 Chapter 4A.74 Irwin, PSU, 2008

Instr[5-0]

Branch Instruction Data/Control FlowBranch Instruction Datapath & Control Flow

Add

4 Shiftleft 2

Add

PCSrc

0

1

MemWrite

MemReadMemtoReg

ALUSrc

left 2ALUOp

ControlUnit

Instr[31-26]

Branch

Instruction Read Addr 1R i t Read

ovf

RegWrite

Address

RegDst

Instr[25-21]

ReadAddress

Instr[31-0]

Memory

PC

Write Data

Read Addr 2

Write Addr

Register

File

ReadData 1

ReadData 2

ALU

zeroData

Memory

Address

Write Data

Read Data 1

1

1

00

0

Instr[20-16]

Instr[15 Write Data

SignExtend16 32

ALUcontrol

1

Instr[15-0]

-11]

CSE431 Chapter 4A.75 Irwin, PSU, 2008

Instr[5-0]

Adding the Jump Operation 1Instr[25-0]

Adding Jump Instruction

Add

4 ShiftAdd

0

1

Shiftleft 2

0

132

[ ]

26PC+4[31-28]

28

MemWrite

MemReadMemtoReg

ALUSrc

left 2 PCSrcALUOp

ControlUnit

Instr[31-26]

BranchJump

I t ti Read Addr 1

ovf

RegWrite

ALUSrc

RegDst

Instr[25-21]

ReadAddress

Instr[31-0]

InstructionMemory

PC

Read Addr 1

Read Addr 2

Write Addr

Register

File

ReadData 1

ReadData 2

ALU

zeroData

Memory

Address

Write Data

Read Data 1

1

00

0

Instr[20-16]

I t [15 Write DataData 2 Write Data

SignExtend16 32

ALUcontrol

10

Instr[15-0]

Instr[15 -11]

CSE431 Chapter 4A.76 Irwin, PSU, 2008

16 32 control

Instr[5-0]

Instruction Critical Paths What is the clock cycle time assuming negligible

Instruction Critical Path What is the clock cycle time assuming negligible delays for muxes, control unit, sign extend, PC access, shift left 2, wires, setup and hold times except:

Instruction and Data Memory (200 ps) ALU and adders (200 ps) Register File access (reads or writes) (100 ps)

Instr. I Mem Reg Rd ALU Op D Mem Reg Wr TotalR 200 100 200 100 600

Register File access (reads or writes) (100 ps)

R-typeload

200 100 200 100 600

200 100 200 200 100 800storebeq

200 100 200 200 700200 100 200 500

CSE431 Chapter 4A.77 Irwin, PSU, 2008

jump 200 200

Single Cycle Disadvantages & AdvantagesU th l k l i ffi i tl th l k l t

Single Cycle Implementation Cons and Pros Uses the clock cycle inefficiently – the clock cycle must

be timed to accommodate the slowest instruction especially problematic for more complex instructions like

floating point multiply

ClkCycle 1 Cycle 2

Clk

lw sw Waste

May be wasteful of area since some functional units (e g adders) must be duplicated since they can not be(e.g., adders) must be duplicated since they can not be shared during a clock cycle

but

CSE431 Chapter 4A.78 Irwin, PSU, 2008

Is simple and easy to understand

Multi‐Cycle DatapathMain Idea: Break execution of instruction into smaller steps (cycles) and let instruction to p ( y )execute in variable number of cycles  

• Note that CCT is smaller but CPI is now larger as compared to single cycle• Instruction execution time is not longer defined by the slowest instruction

Issues with multicycle datapathIssues with multicycle datapath• Equal amount of work per each cycle . Typical steps are IF – Instruction Fetch, ID –Instruction Decode and Register Fetch, EXE ‐ Execution, MEM – Memory Transfer, WB – Write Back to Regsters

Clock

• More complicated control

Instr 1 Instr 4 Instr 3 Instr 2

Time needed

Time allotted

Clock

Time

Instr 2 Instr 1 Instr 3 Instr 4 3 cycles 3 cycles 4 cycles 5 cycles

Timesaved

Time needed

Time allotted

Multi Cycle Implementation

• FSM in control to implement variable cycleFSM in control to implement variable cycle time execution

• Option #1: Use the same datapath• Option #1: Use the same datapath

• Option #2: Reuse of resources, i.e. one ALU for b h PC i d ibranch, PC increment, and execution stage,  (also one memory on the figure next)– Cons and pros (smaller area so could be faster but need extra registers to keep intermediate values)

Option #2: Datapath and FSM Example

Note the extra registers to be able to reuse one ALU and memoryFor example, R‐type instruction use the same ALU at  EXE stage and WB to calculate PC+ 4

Option #2: FSM Example

AcknowledgmentsAcknowledgments

Some of the slides contain material developedSome of the slides contain material developed and copyrighted by M.J. Irwin (Penn state), B. Parhami (UCSB) and instructor material for theParhami (UCSB) and instructor material for the textbook