chapter 2 instruction sets

Post on 15-Jan-2016

38 Views

Category:

Documents

1 Downloads

Preview:

Click to see full reader

DESCRIPTION

Chapter 2 Instruction Sets. 金仲達教授 清華大學資訊工程學系 (Slides are taken from the textbook slides). Outline. Computer Architecture Introduction ARM Processor SHARC Processor. von Neumann Architecture. Memory holds data and instructions CPU fetches instructions from memory - PowerPoint PPT Presentation

TRANSCRIPT

Chapter 2

Instruction Sets

金仲達教授清華大學資訊工程學系

(Slides are taken from the textbook slides)

Instruction Sets-2

Outline Computer Architecture Introduction ARM Processor SHARC Processor

Instruction Sets-3

von Neumann Architecture Memory holds data and instructions CPU fetches instructions from memory

Separate CPU and memory distinguishes programmable computer

CPU registers help out: program counter (PC), instruction register (IR), general-purpose registers, etc.

Instruction Sets-4

Von Neumann Architecture

memoryCPU

PC

address

data

IRADD r5,r1,r3200

200

ADD r5,r1,r3

Instruction Sets-5

Harvard Architecture (NOT von Neumann Architecture)

CPU

PCdata memory

program memory

address

data

address

data

Instruction Sets-6

von Neumann vs. Harvard Harvard can’t use self-modifying code Harvard allows two simultaneous memory fetc

hes Most DSPs use Harvard architecture for stream

ing data: greater memory bandwidth more predictable bandwidth

Instruction Sets-7

RISC vs. CISC Complex instruction set computer (CISC):

many addressing modes many operations

Reduced instruction set computer (RISC): load/store pipelinable instructions

Instruction Sets-8

Instruction Set Characteristics Fixed vs. variable length Addressing modes Number of operands Types of operands

Instruction Sets-9

Programming model Programming model: registers visible to

the programmer. Some registers are not visible (IR).

Instruction Sets-10

Multiple implementations Successful architectures have several

implementations: varying clock speeds; different bus widths; different cache sizes; etc.

Instruction Sets-11

Assembly language One-to-one with instructions (more or

less). Basic features:

One instruction per line. Labels provide names for addresses (usually in

first column). Instructions often start in later columns. Columns run to end of line.

Instruction Sets-12

ARM Assembly Language Examplelabel1 ADR r4,c

LDR r0,[r4] ; a commentADR r4,dLDR r1,[r4]SUB r0,r0,r1 ; comment

Instruction Sets-13

Pseudo-ops Some assembler directives don’t correspond

directly to instructions: Define current address. Reserve storage. Constants.

Examples: In ARM:

BIGBLOCK %10 ; allocate a block of 10-bytes ; memory and initialize to 0

In SHARC.global BIGBLOCK.var BIGBLOCK[10] = 0, 0, 0, 0, 0, 0, 0, 0, 0, 0

Instruction Sets-14

Outline Computer Architecture Introduction ARM Processor

ARM assembly language ARM programming model ARM memory organization ARM data operations ARM flow of control

SHARC Processor

Instruction Sets-15

ARM Versions ARM architecture has been extended over

several versions We will concentrate on ARM7 ARM7 is a von Neumann architecture ARM9 is a Harvard architecture

Instruction Sets-16

ARM assembly language Fairly standard assembly language:

LDR r0,[r8] ; a commentlabel ADD r4,r0,r1

Instruction Sets-17

r0r1r2r3r4r5r6r7

r8r9r10r11r12r13r14

r15 (PC)

CPSR

31 0

N Z C V

ARM programming model 16 general-purpose registers (including

PC) One status register

Instruction Sets-18

Endianness Relationship between bit and byte/word order

ing defines endianness:

byte 3 byte 2 byte 1 byte 0

byte 0 byte 1 byte 2 byte 3

MSB LSB

MSB LSBlittle-endian

big-endian

word 4

word 0

word 4

word 0

Instruction Sets-19

ARM data types Word is 32 bits long Word can be divided into four 8-bit bytes ARM addresses can be 32 bits long Address refers to byte

Address 4 starts at byte 4 Can be configured at power-up as either little-

or bit-endian mode

Instruction Sets-20

ARM status bits Every arithmetic, logical, or shifting

operation sets CPSR bits: N (negative), Z (zero), C (carry), V (overflow).

Examples: -1 + 1 = 0

0xffffffff + 0x1 = 0x0 NZCV = 0110 231-1+1 = -231

0x7fffffff + 0x1 = 0x80000000 NZCV = 0101

Instruction Sets-21

ARM data instructions Basic format:

ADD r0,r1,r2 Computes r1+r2, stores in r0

Immediate operand:ADD r0,r1,#2 Computes r1+2, stores in r0

Instruction Sets-22

ARM data instructions ADD, ADC : add (w.

carry) SUB, SBC : subtract

(w. carry) RSB, RSC : reverse

subtract (w. carry) MUL, MLA : multiply

(and accumulate)

AND, ORR, EOR BIC : bit clear LSL, LSR : logical shift

left/right ASL, ASR : arithmetic

shift left/right ROR : rotate right RRX : rotate right

extended with C

Instruction Sets-23

Data operation varieties Logical shift:

fills with zeroes Arithmetic shift:

fills with sign bit on shift right RRX performs 33-bit rotate, including C bit

from CPSR above sign bit.

Instruction Sets-24

ARM comparison instructions CMP : compare CMN : negated compare TST : bit-wise test (AND) TEQ : bit-wise negated test (XOR) These instructions set only the NZCV bits

of CPSR.

Instruction Sets-25

ARM move instructions MOV, MVN : move (negated)

MOV r0, r1 ; sets r0 to r1

Instruction Sets-26

ARM load/store instructions LDR, LDRH, LDRB : load (half-word, byte) STR, STRH, STRB : store (half-word, byte) Addressing modes:

register indirect : LDR r0,[r1] with second register : LDR r0,[r1,-r2] with constant : LDR r0,[r1,#4]

Cannot refer to address directly in an instruction Generate value by performing arithmetic on PC

(r15) ADR pseudo-op generates instruction required to

calculate address:ADR r1,FOO

Instruction Sets-27

Example: C assignments C:

x = (a + b) - c; Assembler:

ADR r4,a ; get address for aLDR r0,[r4] ; get value of aADR r4,b ; get address for b, reusing r4LDR r1,[r4] ; get value of bADD r3,r0,r1 ; compute a+bADR r4,c ; get address for cLDR r2,[r4] ; get value of cSUB r3,r3,r2 ; complete computation of xADR r4,x ; get address for xSTR r3[r4] ; store value of x

Instruction Sets-28

Example: C assignment C:

y = a*(b+c); Assembler:

ADR r4,b ; get address for bLDR r0,[r4] ; get value of bADR r4,c ; get address for cLDR r1,[r4] ; get value of cADD r2,r0,r1 ; compute partial resultADR r4,a ; get address for aLDR r0,[r4] ; get value of aMUL r2,r2,r0 ; compute final value for yADR r4,y ; get address for ySTR r2,[r4] ; store y Register reuse

Instruction Sets-29

Example: C assignment C:

z = (a << 2) | (b & 15); Assembler: (register reuse)ADR r4,a ; get address for aLDR r0,[r4] ; get value of aMOV r0,r0,LSL 2 ; perform shiftADR r4,b ; get address for bLDR r1,[r4] ; get value of bAND r1,r1,#15 ; perform ANDORR r1,r0,r1 ; perform ORADR r4,z ; get address for zSTR r1,[r4] ; store value for z

Instruction Sets-30

Additional addressing modes Base-plus-offset addressing:

LDR r0,[r1,#16] Loads from location r1+16

Auto-indexing increments base register:LDR r0,[r1,#16]! Adds 16 to r1, then use new value as address

Post-indexing fetches, then does offset:LDR r0,[r1],#16 Loads r0 from r1, then adds 16 to r1.

Instruction Sets-31

ARM flow of control Branch operation:

B #100 PC-relative: add 400 to PC Can be performed conditionally.

All operations can be performed conditionally, testing CPSR: EQ, NE, CS, CC, MI, PL, VS, VC, HI, LS, GE, LT,

GT, LE

Instruction Sets-32

Example: if statement C:

if (a > b) { x = 5; y = c + d; } else x = c - d; Assembler:; compute and test condition

ADR r4,a ; get address for aLDR r0,[r4] ; get value of aADR r4,b ; get address for bLDR r1,[r4] ; get value for bCMP r0,r1 ; compare a < bBGE fblock ; if a >= b, branch to false block

; true blockMOV r0,#5 ; generate value for xADR r4,x ; get address for xSTR r0,[r4] ; store xADR r4,c ; get address for c

Instruction Sets-33

If statement, cont’dLDR r0,[r4] ; get value of cADR r4,d ; get address for dLDR r1,[r4] ; get value of dADD r0,r0,r1 ; compute yADR r4,y ; get address for ySTR r0,[r4] ; store yB after ; branch around false block

; false blockfblock ADR r4,c ; get address for c

LDR r0,[r4] ; get value of cADR r4,d ; get address for dLDR r1,[r4] ; get value for dSUB r0,r0,r1 ; compute a-bADR r4,x ; get address for xSTR r0,[r4] ; store value of x

after ...

Instruction Sets-34

Example: conditional execution Use predicates to control which instructions ar

e executed:; true block, condition codes updated only by CMP; no need for “BGE fblock” and “B after”

MOVLT r0,#5 ; generate value for xADRLT r4,x ; get address for xSTRLT r0,[r4] ; store xADRLT r4,c ; get address for cLDRLT r0,[r4] ; get value of cADRLT r4,d ; get address for dLDRLT r1,[r4] ; get value of dADDLT r0,r0,r1 ; compute yADRLT r4,y ; get address for ySTRLT r0,[r4] ; store y

Instruction Sets-35

Conditional execution, cont’d; false block

ADRGE r4,c ; get address for cLDRGE r0,[r4] ; get value of cADRGE r4,d ; get address for dLDRGE r1,[r4] ; get value for dSUBGE r0,r0,r1 ; compute a-bADRGE r4,x ; get address for xSTRGE r0,[r4] ; store value of x

Conditional execution works best for small conditionals

Instruction Sets-36

Example: switch statement C:

switch (test) { case 0: … break; case 1: … } Assembler:

ADR r2,test ; get address for testLDR r0,[r2] ; load value for testADR r1,switchtab ; load address for switch tableLDR r15,[r1,r0,LSL #2] ; index switch table

switchtab DCD case0DCD case1...

LDR: Shift r0 2 bits to get word address Load content of M[r0+r1] to r15 (PC)

Instruction Sets-37

Example: FIR filter C for finite impulse response (FIR) filter:

for (i=0, f=0; i<N; i++)f = f + c[i]*x[i]; /* x[i]: periodic samples */

Assembler; loop initiation code

MOV r0,#0 ; use r0 for IMOV r8,#0 ; use separate index for arraysADR r2,N ; get address for NLDR r1,[r2] ; get value of NMOV r2,#0 ; use r2 for fADR r3,c ; load r3 with base of cADR r5,x ; load r5 with base of x

Instruction Sets-38

FIR filter, cont’d; loop bodyloop LDR r4,[r3,r8] ; get c[i]

LDR r6,[r5,r8] ; get x[i]MUL r4,r4,r6 ; compute c[i]*x[i]ADD r2,r2,r4 ; add into running sumADD r8,r8,#4 ; add 1 word offset to array indexADD r0,r0,#1 ; add 1 to iCMP r0,r1 ; exit?BLT loop ; if i < N, continue

Instruction Sets-39

ARM subroutine linkage Branch and link instruction:

BL foo Copies current PC to r14.

To return from subroutine:MOV r15,r14

Instruction Sets-40

Nested subroutine calls Nesting/recursion requires coding convention: C:

void f1(int a) { f2(a); } Assembly:f1 LDR r0,[r13] ; load arg into r0 from stack

; r13 is stack pointer; call f2()STR r13!,[r14] ; store f1’s return adrsSTR r13!,[r0] ; store arg to f2 on stackBL f2 ; branch and link to f2; return from f1()SUB r13,#4 ; pop f2’s arg off stackLDR r13!,r15 ; restore reg and return

Instruction Sets-41

Summary of ARM Load/store architecture Most instructions are RISC, operate in

single cycle Some multi-register operations take longer

All instructions can be executed conditionally

Details: please refer to Chapter 2 of the textbook

Instruction Sets-42

Outline Computer Architecture Introduction ARM Processor SHARC Processor

SHARC programming model SHARC assembly language SHARC memory organization SHARC data operations SHARC flow of control

Pads Lab

SHARC programming model Register files:

R0-R15 (aliased as F0-F15 for floating point) Status registers.

ASTAT: arithmetic status. STKY: sticky. MODE 1: mode 1.

Loop registers. Data address generator registers. Interrupt registers.

Instruction Sets-44

SHARC assembly language Algebraic notation terminated by

semicolon:

R1=DM(M0,I0), R2=PM(M8,I8); ! comment

label: R3=R1+R2;data memory access program memory access

Instruction Sets-45

SHARC data types 32-bit IEEE single-precision floating-point. 40-bit IEEE extended-precision floating-

point. 32-bit integers. Memory organized internally as 32-bit

words with a 32-bit address. An instruction is 48 bits. Floating-point can be:

rounded toward zero or nearest. ALU supports saturation arithmetic

(ALUSAT bit in MODE1). Overflow results in max value, not rollover.

Pads Lab

SHARC microarchitecture Modified Harvard architecture.

Program memory can be used to store some data.

Register file connects to: multiplier shifter; ALU.

Pads Lab

Multiplier Fixed-point operations can accumulate

into local MR registers or be written to register file. Fixed-point result is 80 bits.

Floating-point results always go to register file.

Status bits: negative, under/overflow, invalid, fixed-point underflow, floating-point underflow, floating-point invalid.

Pads Lab

ALU/shifter status flags ALU:

zero, overflow, negative, fixed-point carry, inputsign, floating-point invalid, last op was floating-point, compare accumulation registers, floating-point under/overflow, fixed-point overflow, floating-point invalid

Shifter: zero, overflow, sign

Instruction Sets-49

Flag operations All ALU operations set AZ (zero), AN

(negative), AV (overflow), AC (fixed-point carry), AI (floating-point invalid) bits in ASTAT.

STKY is sticky version of some ASTAT bits.

Instruction Sets-50

Example: data operations Fixed-point -1 + 1 = 0:

AZ = 1, AU = 0, AN = 0, AV = 0, AC = 1, AI = 0.

STKY bit AOS (fixed point underflow) not set. Fixed-point -2*3:

MN = 1, MV = 0, MU = 1, MI = 0. Four STKY bits, none of them set.

LSHIFT 0x7fffffff BY 3: SZ=0,SV=1,SS=0.

Pads Lab

Multifunction computations Can issue some computations in parallel:

dual add-subtract; fixed-point multiply/accumulate and

add,subtract,average floating-point multiply and ALU operation multiplication and dual add/subtract

Multiplier operand from R0-R7, ALU operand from R8-R15.

Instruction Sets-52

SHARC load/store Load/store architecture: no memory-direct op

erations. Two data address generators (DAGs):

program memory; data memory.

Must set up DAG registers to control loads/stores.

Instruction Sets-53

DAG1 registers

I0

I1

I2

I3

I4

I5

I6

I7

M0

M1

M2

M3

M4

M5

M6

M7

L0

L1

L2

L3

L4

L5

L6

L7

B0

B1

B2

B3

B4

B5

B6

B7

Pads Lab

Data address generators Provide indexed, modulo, bit-reverse

indexing. MODE1 bits determine whether primary or

alternate registers are active.

Instruction Sets-55

Basic addressing Immediate value:

R0 = DM(0x20000000); Direct load:

R0 = DM(_a); ! Loads contents of _a Direct store:

DM(_a)= R0; ! Stores R0 at _a

Instruction Sets-56

Post-modify with update I register specify base address. M register/immediate holds modifier value.R0 = DM(I3,M3) ! LoadDM(I2,1) = R1 ! Store I register is updated by the modifier value

Base-plus offset:R0 = DM(M1,I0) ! Load from M1+I0

Circular buffer: L register is buffer start index, B is buffer base address.

Instruction Sets-57

Data in program memory Can put data in program memory to read

two values per cycle:F0 = DM(M0,I0), F1 = PM(M8,I9);

Compiler allows programmer to control which memory values are stored in.

Instruction Sets-58

Example: C assignments C:

x = (a + b) - c; Assembler:

R0 = DM(_a); ! Load aR1 = DM(_b); ! Load bR3 = R0 + R1;R2 = DM(_c); ! Load cR3 = R3-R2;DM(_x) = R3; ! Store result in x

Instruction Sets-59

Example, cont’d. C:

y = a*(b+c); Assembler:

R1 = DM(_b); ! Load bR2 = DM(_c); ! Load cR2 = R1 + R2;R0 = DM(_a); ! Load aR2 = R2*R0;DM(_y) = R23; ! Store result in y

Instruction Sets-60

Example, cont’d. Shorter version using pointers:! Load b, cR2=DM(I1,M5), R1=PM(I8,M13);R0 = R2+R1, R12=DM(I0,M5);R6 = R12*R0(SSI);DM(I0,M5)=R8; ! Store in y

Instruction Sets-61

Example, cont’d. C:

z = (a << 2) | (b & 15); Assembler:R0=DM(_a); ! Load aR0=LSHIFT R0 by #2; ! Left shiftR1=DM(_b); R3=#15; ! Load immediateR1=R1 AND R3;R0 = R1 OR R0;DM(_z) = R0;

Pads Lab

SHARC program sequencer Features:

instruction cache; PC stack; status registers; loop logic; data address generator;

Pads Lab

Conditional instructions Instructions may be executed

conditionally. Conditions come from:

arithmetic status (ASTAT); mode control 1 (MODE1); flag inputs; loop counter.

Instruction Sets-64

SHARC jump Unconditional flow of control change:

JUMP foo Three addressing modes:

Direct: 24-bit address in immediate to set PC Indirect: address from DAG2 PC-relative: immediate plus PC to give new address

Pads Lab

Branches Types: CALL, JUMP, RTS, RTI. Can be conditional. Address can be direct, indirect, PC-

relative. Can be delayed or non-delayed. JUMP causes automatic loop abort.

Instruction Sets-66

Example: C if statement C:

if (a > b) { x = 5; y = c + d; }else x = c - d;

Assembler:! TestR0 = DM(_a);R1 = DM(_b);COMP(R0,R1); ! CompareIF GE JUMP fblock;

Instruction Sets-67

C if statement, cont’d.! True blocktblock: R0 = 5; ! Get value for xDM(_x) = R0;R0 = DM(_c); R1 = DM(_d);R1 = R0+R1;DM(_y)=R1;JUMP other; ! Skip false block

! False blockfblock: R0 = DM(_c);R1 = DM(_d);R1 = R0-R1;DM(_x) = R1;

other: ! Code after if

Instruction Sets-68

Fancy if implementation C:if (a>b)

y = c-d; else

y = c+d; Use parallelism to speed it up---compute

both cases, then choose which one to store.

Instruction Sets-69

Fancy if implementation, cont’d.! Load valuesR1=DM(_a); R2=DM(_b);R3=DM(_c); R4=DM(_d);

! Compute both sum and differenceR12 = r2+r4, r0 = r2-r4;

! Choose which one to savecomp(r8,r1);if ge r0=r12;dm(_y) = r0 ! Write to y

Pads Lab

DO UNTIL loopsDO UNTIL instruction provides efficient

looping:

LCNTR=30, DO label UNTIL LCE;R0=DM(I0,M0), F2=PM(I8,M8);R1=R0-R15;

label: F4=F2+F3;

Loop length Last instruction in loopTerminationcondition

Instruction Sets-71

Example: FIR filter C:

for (i=0, f=0; i<N; i++)f = f + c[i]*x[i];

! setupI0=_a; I8=_b; ! a[0] (DAG0), b[0] (DAG1)M0=1; M8=1 ! Set up increments

! Loop bodyLCNTR=N, DO loopend UNTIL LCE;! Use postincrement modeR1=DM(I0,M0), R2=PM(I8,M8);R8=R1*R2;

loopend: R12=R12+R8;

Instruction Sets-72

Optimized FIR filter codeI4=_a; I12=_b;R4 = R4 xor R4, R1=DM(I4,M6), R2=PM(I12,M14);MR0F = R4, MODIFY(I7,M7);

! Start loopLCNTR=20, DO(PC,loop) UNTIL LCE;

loop: MR0F=MR0F+42*R1 (SSI), R1=DM(I4,M6), R2=PM(I12,M14);

! Loop cleanupR0=MR0F;

Instruction Sets-73

SHARC subroutine calls Use CALL instruction:

CALL foo; Can use absolute, indirect, PC-relative address

ing modes. Return using RTS instruction.

Pads Lab

PC stack PC stack: 30 locations X 24 instructions. Return addresses for subroutines,

interrupt service routines, loops held in PC stack.

Instruction Sets-75

Example: C function C:

void f1(int a) { f2(a); } Assembler:f1: R0=DM(I1,-1); ! Load arg into R0DM(I1,M1)=R0; ! Push f2’s argCALL f2;MODIFY(I1,-1); ! Pop elementRTS;

top related