computer architecture and organization (cs-507) · computer architecture and organization (cs-507)...

Computer Architecture and Organization (CS-507)

Lecture 6

Instruction Set Architectures & AddressingModes

Muhammad Zeeshan Haider AliLecturer

ISP. [email protected]

https://zeeshanaliatisp.wordpress.com/

mailto:[email protected]

https://zeeshanaliatisp.wordpress.com/

Instruction Set Design Issues

Instruction set design issues

include:

Where are operands stored?

registers, memory, stack, accumulator

How many explicit operands are there?

0, 1, 2, or 3

How is the operand location specified?

register, immediate, indirect, . . .

What type & size of operands are supported?

byte, int, float, double, string, vector. . .

What operations are supported?

add, sub, mul, move, compare . . .

Accumulator Architecture

• Instruction set:add A, sub A, mult A, div A, . ..

load A, storeA

• Example: A*B - (A+C*B)load B

mul C

add A

store D

load A

mul B

sub D

B B*C A+B*C AA+B*C A*B result

acc = acc +,-,*,/ mem[A]

Accumulator Architecture

• Pros–Very low hardware requirements

–Easy to design and understand

• Cons–Accumulator becomes the bottleneck

– Little ability for parallelism or pipelining

–High memory traffic

Stack Architectures

Instruction set:

add, sub, mult, div, . . .

push A, popA

Example: A*B - (A+C*B)

pushA

push B

mul

pushA

push C

push B

mul

add

sub

A B

A

A*B

A*B

A*B

A A*B

A C

A

A*B

B

C

A

A*B

B*C A+B*C result

Stacks: Pros and Cons

ProsGood code density (implicit top of stack)

Low hardware requirements

Easy to write a simpler compiler for stack architectures

ConsStack becomes the bottleneck

Little ability for parallelism or pipelining

Data is not always at the top of stack when need, so additionalinstructions like TOP and SWAP are needed

Difficult to write an optimizing compiler for stack architectures

Memory – Memory Architecture

mul A, B, C

• Instruction set:(3 operands) add A, B, C

(2 operands) add A, B subA, B

sub A, B, C

mul A, B

• Example: A*B - (A+C*B)– 3 operands

mul D, A, B

mul E, C, B

add E, A, E

sub E, D, E

2 operands

mov D,A

mul D, B

mov E, C

mul E, B

add E, A

sub E, D

Memory - Memory Architecture

• Pros– Requires fewer instructions (especially if 3 operands)

– Easy to write compilers for (especially if 3 operands)

• Cons– Very high memory traffic (especially if 3 operands)

– Variable number of clocks per instruction

– With two operands, more data movements are required

Register – Memory Architectures

mul R1, B

• Instruction set:add R1, A

load R1,A

sub R1,A

store R1,A

• Example: A*B - (A+C*B)load R1, A

/* A*B */mul R1, B

store R1, D

load R2, C

mul R2, B

add R2, A

sub R2, D

/*

/*

/*

C*B */

A + CB */

AB - (A + C*B) */

R1 = R1 +,-,*,/ mem[B]

Register – Memory Architectures

• Pros– Some data can be accessed without loading first

– Instruction format easy to encode

– Good code density

• Cons– Operands are not equivalent (poor orthogonal)

– Variable number of clocks per instruction

– May limit number of registers

Load – Store Architectures

• Instruction set:add R1, R2, R3 sub R1, R2, R3 mul R1, R2, R3

load R1, &Astore R1, &A move R1, R2

• Example: A*B - (A+C*B)load R1, &A

load R2, &B

load R3, &C

mul R7, R3, R2

add R8, R7, R1

mul R9, R1, R2

sub R10, R9, R8

/*

/*

/*

/*

C*B

A + C*B

A*B

A*B - (A+C*B)

R3 = R1 +,-,*,/ R2

*/

*/

*/

*/

Load – Store Architectures

• Pros– Simple, fixed length instruction encodings

– Instructions take similar number of cycles

– Relatively easy to pipeline and make superscalar

• Cons– Higher instruction count

– Not all instructions need three operands

– Dependent on good compiler

Code Sequence C = A +B for Four Instruction Sets

Stack Accumulator Register

(register-memory)

Register (load-store)

Push A

Push B

Add

Pop C

Load A

Add B

Store C

Load R1, A

Add R1, B

Store C, R1

Load R1,A

Load R2, B

Add R3, R1, R2

Store C, R3

memory

acc = acc + mem[C]

memoryR1 = R1 + mem[C] R3 = R1 + R2

Addressing Modes

Immediate

Direct

Indirect

Register

Register Indirect

Displacement (Indexed)

Stack

Immediate Addressing

Operand is part of instruction

Operand = address field

e.g. ADD 5

Add 5 to contents of accumulator

5 is operand

No memory reference to fetch data

Fast

Limited range

Immediate Addressing Diagram

OperandOpcode

Instruction

Direct Addressing

• Address field contains address of operand

• Effective address (EA) = address field (A)

• e.g. ADD A

• Add contents of cell A to accumulator

• Look in memory at address A for operand

• Single memory reference to access data

• No additional calculations to work out• effective address

• Limited address space

Direct Addressing Diagram

Address AOpcode

Instruction

Memory

Operand

Indirect Addressing (1)

Memory cell pointed to by address field

contains the address of (pointer to) the

operand

EA = (A)

Look in A, find address (A) and look there for

operand

e.g. ADD (A)Add contents of cell pointed to by contents of

A to accumulator

Indirect Addressing (2)

Large address space

2n where n = word length

May be nested, multilevel,

cascaded

e.g. EA =(((A)))

Draw the diagramyourself

Multiple memory accesses to

find operand

Hence slower

Indirect Addressing Diagram

Address AOpcode

Instruction

Memory

Pointer to operand

Operand

Register Addressing (1)

Operand is held in register named in

address filed

EA =R

Limited number of registers

Very small address field needed

Shorter instructions

Faster instruction fetch

Register Addressing (2)

No memory access

Very fast execution

Very limited address space

Multiple registers helps performance

Requires good assembly programming or

compiler writing

N.B. C programming

register int a;

c.f. Direct addressing

Register Addressing Diagram

Register Address ROpcode

Instruction

Registers

Operand

Register Indirect Addressing

C.f. indirect addressing

EA = (R)

Operand is in memory cell

pointed to by contents of register

R

Large address space (2n)

One fewer memory access than

indirect addressing

Register Indirect Addressing Diagram

Instruction

Memory

OperandPointer to Operand

Opcode Register Address R

Registers

Displacement Addressing

EA = A + (R)

Address field hold twovalues

A = base value

R = register that

holds displacement

or vice versa

Displacement Addressing Diagram

Instruction

Memory

OperandPointer to Operand

Registers

Opcode Register R Address A

+

Relative Addressing

A version ofdisplacement

addressing

R = Program counter, PC

EA = A + (PC)

i.e. get operand from A cells

from current location pointed to

by PC

c.f locality of reference &

cache usage

Base-Register Addressing

A holds displacement

R holds pointer to base

address

R may be explicit orimplicit

e.g. segment registers in80x86

Indexed Addressing

A = base

R = displacement

EA = A + (R)

Good for accessingarrays

EA = A + (R)

R++

Combinations

Postindex

EA = (A) + (R)

Preindex

EA = (A+(R))

(Draw the diagrams)

Stack Addressing

Operand is (implicitly) on top

of stack

e.g.

ADD

stack

Pop top two items from

and add

Types of Addressing Modes (VAX)

Example

Add R4, R3

Add R4, #3

Action

R4 <- R4 + R3

R4 <- R4 + 3

Addressing Mode

1. Register direct

2. Immediate

3. Displacement

4. Register indirect

5. Indexed

6. Direct

7. Memory Indirect

8. Autoincrement

Add R4, 100(R1)

Add R4, (R1)

Add R4, (R1 + R2)

Add R4, (1000)

Add R4, @(R3)

Add R4, (R2)+

R4 <- R4 + M[100 + R1]

R4 <- R4 + M[R1]

R4 <- R4 + M[R1 + R2]

R4 <- R4 + M[1000]

R4 <- R4 + M[M[R3]]

R4 <- R4 + M[R2]

9. Autodecrement Add R4, (R2)-

R2 <- R2 + d

R4 <- R4 + M[R2]

R2 <- R2 - d

10. Scaled Add R4, 100(R2)[R3] R4 <- R4 +

M[100 + R2 + R3*d]

Studies by [Clark and Emer] indicate that modes 1-4 account for 93% of all operands on the VAX.

CISC

Pros

Complex instructions operate directly on mainmemory.

Programmer is no longer required to do a direct call to LOADand

STORE operations as they are now handled by hardware.

Compiler has less work to translate statements in a high level

language to assembly language.

Cons

Microcode became more difficult to test and debug as systems

became more complex requiring numerous patches to fix bugs.

Programmers weren’t using the more complex instructions sets in

favor of smaller instructions that accomplished the same result.

The use of memory operands caused structuralhazards

preventing concurrent execution of instructions. (pipelining)

RISCReduced Instruction Set Computer

RISC is a CPU design that recognizes only a limited number of instructions

Simple instructions

Instructions are executed quickly

Executes a series of simple instruction instead of a complexinstruction

Instructions are executed within one clock cycle

Incorporates a large number of general registers

for arithmetic operations to avoid storing

variables on a stack in memory

Only the load and store instructions operate

directly onto memory

Pipelining = speed

RISC

Pros

Reduced instruction set.

Less complex, simple instructions.

Hardwired control unit and machine

instructions.

Few addressing schemes for memory operands

with only two basic instructions, LOAD and

STORE

Many symmetric registers which are organised

into a register file.

Cons

Greater burden on the software.

VLIW

A typical VLIW (very long

instruction word) machine has

instruction words hundreds of bits in

length.

Multiple functional units are used

concurrently in a VLIW processor.

All functional units share the use of a

common large register file.

VLIW - Advantages

Compiler prepares fixed packets of multiple operations that give the full "plan of execution"

dependencies are determined by compiler and used to schedule according to function unit latencies

function units are assigned by compiler and correspond to the position within the instruction packet ("slotting")

compiler produces fully-scheduled, hazard-free code => hardware doesn't have to "rediscover" dependencies or schedule

VLIW - Disadvantages

Compatibility acrossimplementations is a major problem

VLIW code won't run properly with different number of function units or different latencies

unscheduled events (e.g., cache miss) stall entire processor

Code density is another problemlow slot utilization

Comparison

computer architecture and organization (cs-507) · computer architecture and organization (cs-507)...

Documents