arm chap 3_last

7/31/2019 ARM Chap 3_last

1/51

ARM Instruction Set

ARM

Advanced RISC Machines

1


2/51

83

Stack Processing A stack is usually implemented as a linear data structure which

grows up (an ascending stack) or down (a descending stack)memory

A stack pointer holds the address of the current top of the stack,either by pointing to the last valid data item pushed onto the stack(a full stack), or by pointing to the vacant slot where the next dataitem will be placed (an empty stack)

ARM multiple register transfer instructions support all four forms ofstacks Full ascending: grows up; base register points to the highest address

containing a valid item empty ascending: grows up; base register points to the first empty

location above the stack

Full descending: grows down; base register points to the lowest addresscontaining a valid data

empty descending: grows down; base register points to the first empty

location below the stack


3/51

The ARM architecture uses the load-store multiple instructions to carry outstack operations.

Thepop

operation (removing data from a stack) uses a load multipleinstruction; similarly, thepush operation (placing data onto the stack) uses astore multiple instruction.

When using a stack you have to decide whether the stack will grow up ordown in memory. A stack is either ascending (A) or descending (D).Ascending stacks grow towards higher memory addresses; in contrast,

descending stacks grow towards lower memory addresses.

When you use afull stack(F), the stack pointer sp points to an address that isthe last used or full location (i.e., sp points to the last item on the stack). Incontrast, if you use an empty stack(E) the sp points to an address that is thefirst unused or empty location (i.e., it points after the last item on the stack).

There are a number of load-store multiple addressing mode aliases availableto support stack operations (see Table). Next to the pop column is the actualload multiple instruction equivalent.

84


4/51

85

For example, a full ascending stack would have the notation FA appended to the load

multiple instructionLDMFA. This would be translated into an LDMDA instruction.


5/51

86

Example 20

The STMFD instruction pushes registers onto the stack, updating the sp. Figure shows a

push onto a full descending stack. You can see that when the stack grows the stack

pointer points to the last full entry in the stack.

PRE r1 = 0x00000002r4 = 0x00000003

sp = 0x00080014

STMFD sp!, {r1,r4}

POST r1 = 0x00000002r4 = 0x00000003

sp = 0x0008000c

NOTE : Stack pointer points to the last full entry in the stack.


6/51

87

Example 21

In contrast, Next figure shows a push operation on an empty stack using the STMED

instruction. The STMED instruction pushes the registers onto the stack but updates

register sp to point to the next empty location.

PRE r1 = 0x00000002r4 = 0x00000003

sp = 0x00080010

STMED sp!, {r1,r4}

POST r1 = 0x00000002r4 = 0x00000003

sp = 0x00080008

NOTE : SPto point to the next empty location.


7/51

88

Block Copy Addressing


8/51

Stack Examples

STMFD sp!,

{r0,r1,r3-r5}

r5

r4

r3

r1r0S

P

Old SP

STMED sp!,

{r0,r1,r3-r5}

r5

r4

r3

r1

r0

S

P

Old SP

r5

r4

r3

r1r0

STMFA sp!,{r0,r1,r3-r5}

S

P

Old SP 0x400

0x418

0x3e8

STMEA sp!,

{r0,r1,r3-r5}

r5

r4

r3r1

r0

S

P

Old SP


9/51

Load-StoreInstructions

Three basic forms to move data between ARM registers

and memory

Single register load and store instruction

A byte, a 16-bit half word, a 32-bit word Multiple register load and store instruction

To save or restore workspace registers for procedure entry and exit

To copy clocks of data

Single register swap instruction A value in a register to be exchanged with a value in memory

To implement semaphores to ensure mutual exclusion on accesses

90


10/51

The swap instruction is a special case of a load-store

instruction. It swaps the contents of memory with thecontents of a register.

This instruction is an atomic operationit reads and writes a

location in the same bus operation, preventing any otherinstruction from reading or writing to that location until it

completes.

Swap cannot be interrupted by any other instruction or anyother bus access. We say the system holds the busuntil the

transaction is complete.

91

Single Register Swap Instruction


11/51

92

Syntax: SWP{B}{} Rd,Rm,[Rn]

Rd


12/51

93

Example 21

The swap instruction loads a word from memory into register r0 and overwrites the

memory with register r1.

PRE mem32[0x9000] = 0x12345678

r0 = 0x00000000

r1 = 0x11112222

r2 = 0x00009000

SWP r0, r1, [r2]

POST mem32[0x9000] = 0x11112222

r0 = 0x12345678

r1 = 0x11112222

r2 = 0x00009000

This instruction is particularly useful when implementing semaphores and mutual

exclusion in an operating system. You can see from the syntax that this instruction can

also have a byte size qualifier B, so this instruction allows for both a word and a byte

swap.


13/51

94

0X36197488

0X09059945

0X12345678 0X00009000

0X00009004

0X00009008

PRE mem32[0x9000] = 0x12345678

r0 = 0x00000000

r1 = 0x11112222

r2 = 0x00009000

0X54233083 0X00009008

POST mem32[0x9000] = 0x11112222

r0 = 0x12345678r1 = 0x11112222

r2 = 0x00009000

0X36197488

SWP r0, r1, [r2]

0X09059945

0X11112222 0X00009000

0X00009004

0X00009008

0X54233083 0X00009008

0X00000000r0

0X11112222r1

0X00009000r2

0x12345678r0

0X11112222r1

0X00009000r2

LOAD

STORE


14/51

Concept ofSEMAPHORE

In computer science, a semaphore is a variable or abstract data type that

provides a simple but useful abstraction for controlling access by

multiple processes to a common resource in a parallel

programming environment.

A semaphore, in its most basic form, is a protected integer variable thatcan facilitate and restrict access to shared resources in a multi-processing

environment.

The two most common kinds of semaphores are counting

semaphores and binary semaphores. Counting semaphores representmultiple resources, while binary semaphores, as the name implies,

represents two possible states (generally 0 or 1; locked or unlocked).

95


15/51

96

A semaphore can only be accessed using the following

operations: wait() and release().

wait() is called when a process wants access to a resource. This would be equivalent

to the arriving customer trying to get an open table. If there is an open table, or the

semaphore is greater than zero, then he can take that resource and sit at the table.If there is no open table and the semaphore is zero, that process must wait until it

becomes available. signal() is called when a process is done using a resource, or

when the patron is finished with his meal.

The following is an implementation of this counting semaphore (where the value

can be greater than 1):


16/51

In this implementation, a process wanting to enter its critical section it hasto acquire the binary semaphore which will then give it mutual exclusionuntil it signals that it is done.

For example, we have semaphore s, and two processes, P1 and P2 thatwant to enter their critical sections at the same time. P1 first calls wait(s).The value ofs is decremented to 0 and P1 enters its critical section. WhileP1 is in its critical section, P2 calls wait(s), but because the value of s iszero, it must wait until P1 finishes its critical section and executes signal(s).

When P1 calls signal, the value of s is incremented to 1, and P2 can thenproceed to execute in its critical section (after decrementing thesemaphore again). Mutual exclusion is achieved because only one processcan be in its critical section at any time.

97

E l 22


17/51

98

Example 22

This example shows a simple data guard that can be used to protect data from being

written by another task. The SWP instruction holds the bus until the transaction is

complete.

loop

MOV r1, =semaphore

MOV r2, #1

SWP r3, r2, [r1] ; hold the bus until complete

CMP r3, #1

BEQ loop

The address pointed to by the semaphore either contains the value 0 or 1. When the

semaphore equals 1, then the service in question is being used by another process. The

routine will continue to loop around until the service is released by the other process

in other words, when the semaphore address location contains the value 0.


18/51

ARM instructions by instruction class

1. Data Processing Instructions

2. Branch Instructions

3. Load-Store Instructions

4. Software Interrupt Instruction

5. Program Status Register Instructions99


19/51

Software Interrupt Instruction

The software interrupt instruction is used for calls to the operating system

and is often called a 'supervisor call'.

It puts the processor into supervisor mode and begins executing

instructions from address 0x08.

100

Binary encoding

Introduction

COND OPCODE 24-BIT (INTERPRETED) IMMEDIATE

31 28 27 24 23 0


20/51

The 24-bit immediate field does not influence the operation of the instruction

but may be interpreted by the system code.

If the condition is passed the instruction enters supervisor mode using the

standard ARM exception entry sequence. In detail, the processor actions are:

1. Save the address of the instruction after the SWI in r14_svc.

2. Save the CPSR in SPSR_svc.

3. Enter supervisor mode and disable IRQs (but not FIQs) by setting CPSR[4:0]

to 100112 and CPSR[7] tol.

4. Set the PC to and begin executing the instructions there.

101

Binary encoding

COND OPCODE 24-BIT (INTERPRETED) IMMEDIATE

31 28 27 24 23 0

Description

To return to the instruction after the SWI the system routine must not only copy r14_svc

back into the PC, but it must also restore the CPSR from SPSR_svc.


21/51

102

Syntax: SWI{} SWI_number

Example 23


22/51

103

Example 23

Here we have a simple example of an SWI call with SWI number 0x123456, used by ARM

toolkits as a debugging SWI. Typically the SWI instruction is executed in user mode.

PRE cpsr = nzcVqift_USER

pc = 0x00008000

lr = 0x003fffff; lr = r14r0 = 0x12

0x00008000 SWI 0x123456

POST cpsr = nzcVqIft_SVC

spsr = nzcVqift_USERpc = 0x00000008

lr = 0x00008004

r0 = 0x12

Since SWI instructions are used to call operating system routines, you need some form of

parameter passing. This is achieved using registers. In this example, register r0 is used to passthe parameter 0x12. The return values are also passed back via registers.

Code called the SWI handleris required to process the SWI call. The handler obtains the SWI

number using the address of the executed instruction, which is calculated from the link

register lr.


23/51

ARM instructions by instruction class

1. Data Processing Instructions

2. Branch Instructions

3. Load-Store Instructions

4. Software Interrupt Instruction

5. Program Status Register Instructions(MSR, MRS)

(Self Study!!!) Refer Steve Furber 104


24/51

Byte organizations

Little-endian mode:

- with the lowest-order byte residing in the low-

order bits of the word Big-endian mode:

- the lowest-order byte stored in the highest bits

of the word


25/51

Byte organizations


26/51

107

Thumb Mode

Thumb is a 16-bit instruction set Optimized for code density from C code

Improved performance form narrow memory

Subset of the functionality of the ARM instruction set

Core has two execution states ARM and Thumb Switch between them using BXinstruction

Thumb has characteristic features: Most Thumb instruction are executed unconditionally

Many Thumb data process instruction use a 2-addressformat

Thumb instruction formats are less regular than ARMinstruction formats, as a result of the dense encoding.

Th b h hi h d d it !


27/51

Thumb has higher code density !

Code density: it is define as the space taken up in memory by an executableprogram.

On average, a Thumb implementation of the same code takes up around 30%

less memory than the equivalent ARM implementation.

Figure 4.1 shows the same divide code routine implemented in ARM and Thumb

assembly code. Even though the Thumb implementation uses more instructions,

the overall memory footprint is reduced. Code density was the main driving

force for the Thumb instruction set.

108


28/51

109

Thumb implementation uses more instructions, the overall memory footprint is

reduced.

Code density was the main driving force for the Thumb instruction set. Because

it was also designed as a compiler target, rather than for hand-written assembly

code, we recommend that you write Thumb-targeted code in a high-level

language like C or C++.

h b i


29/51

Thumb Register Usage

In Thumb state, you do not have direct access to all registers.

Only the low registers r0 to r7 are fully accessible.

The higher registers r8 to r12 are only accessible with MOV, ADD, orCMP instructions.

CMP and all the data processing instructions that operate on low

registers update the condition flags in the cpsr.

110


30/51

111

Thumb Instruction Set (1/3)

/


31/51

112



32/51

113


Th b I t ti E t d E it


33/51

114

Thumb Instruction Entry and Exit

T bit, bit 5 of CPSR

If T = 1, the processor interprets the instruction stream as 16-bit Thumb

instruction

If T = 0, the processor interprets if as standard ARM instructions

Thumb Entry

ARM cores startup, after reset, execution ARM instructions

Executing a branch and Exchange instruction (BX)

Set the T bit if the bottom bit of the specified register was set

Switch the PC to the address given in the remainder of the register

Thumb Exit Executing a thumb BX instruction

ARM Th b I t ki


34/51

ARM-Thumb Interworking

ARM-Thumb interworking is the name given to the method of

linking ARM and Thumb code together for both assembly andC/C++.

To call a Thumb routine from an ARM routine, the core has to

change state. This state change is shown in the T bit of thecpsr.

The BX and BLXbranch instructions cause a switch betweenARM and Thumb state while branching to a routine.

The BX lr instruction returns from a routine, also with a stateswitch if necessary.

115


35/51

There are two versions of the BX or BLX instructions: an ARM

instruction and a Thumb equivalent.

The ARM BX instruction enters Thumb state only if bit 0 of the

address in Rn is set to binary 1; otherwise it enters ARM state. The

Thumb BX instruction does the same.

116

Syntax: BX Rn

BLX Rn | label


36/51

117

Interworking Instructions

Interworking is achieved using the Branch Exchange instructions

In Thumb state

BX Rn

In ARM state (on Thumb-aware cores only)

BX Rn

Where Rn can be any registers (R0 to R15)

The performs a branch to an absolute address in 4GB address space

by copying Rn to the program counter

Bit 0 of Rn specifies the state to change to

i hi b


37/51

118

Switching between States


38/51

119

Example 24

;Start off in ARM state

CODE32ADR r0,Into_Thumb+1 ;generate branch target

;address & set bit 0

;hence arrive Thumb state

BX r0 ;branch exchange to Thumb

CODE16 ;assemble subsequent as Thumb

Into_Thumb

ADR r5,Back_to_ARM ;generate branch target to

;word-aligned address,

;hence bit 0 is cleared.

BX r5 ;branch exchange to ARM

CODE32 ;assemble subsequent as ARM

Back_to_ARM


39/51

Summary

120


40/51

ARM data instructions

MLA r0,rl,r2,r3 ,r0=r1 x r2 + r3

ADD

ADC

SUB

SBCRSB

RSC

MULMLA

Add

Add with carry

Subtract

Subtract with carryReverse subtract ,RSB r0,r1,r2, r0=r2 r1

Reverse subtract with carry

MultiplyMultiply and accumulate


41/51


BIC r0,r1,r2 sets r0 to r1 and not r2

- uses the second source operand as a mask, a bit inmask is 1, the corresponding bit in first source

operand is cleared

AND

ORR

EOR

BIC

Bit-wise and

Bit-wise or

Bit-wise exclusive-or

Bit clear


42/51


LSLLSR

ASL

ASRROR

RRX

Logical shift left (zero fill)Logical shift right (zero fill)

Arithmetic shift left

Arithmetic shift right, copies the sign bitRotate right

Rotate right extended with C, performs a 33-bit rotate


43/51

ARM comparison instructions

only set the values of the NZCV bits

CMPCMN

TSTTEQ

CompareNegated compare,

uses an addition to set the status bits

Bit-wise test, a bit-wise ANDBit-wise negated test, an exclusive-or


44/51

ARM move instructions

MOV

MVN

Move

MOV r0,r1 ; r0=r1

Move negated

Mvn r0,r1 ; r0=not(r1)


45/51

ARM load-store instructions

LDR

STR

LDRH

STRH

LDRSH

LDRB

STRBADR

Load

Store

Load half-word

Store half-word

Load half-word signed

Load byte

Store byteSet register to address


46/51

CAssignments in ARM Instructions

x = (a + b) - c;

using r0 for a, r1 for b, r2 for c, and r3 for x.

registers for indirect addressing. Indirect r4

load values of a, b, and c into registers

store value of x back to memory

C Assignments in ARM Instructions


47/51


x = (a + b) - c;

ADR r4,a ; get address for aLDR r0,[r4] ; get value of a

ADR r4,b ; get address for b, using r4

LDR r1,[r4] ; load value of b

ADD r3,r0,r1 ; set result for x to a + b

ADR r4,c ; get address for c

LDR r2,[r4] ; get value of c

SUB r3,r3,r2 ; complete computation of xADR r4,x ; get address for x

STR r3,[r4] ; store x at proper location


48/51


y = a*(b + c);

using r0 for both a and b, r1 for c, and r2 for y

use r4 to store addresses for indirect

addressing

C A i i ARM I i


49/51


y = a*(b + c);

ADR r4,b ; get address for b

LDR r0,[r4] ;get value of b

ADR r4,c ; get address for c

LDR r1,[r4] ; get value of c

ADD r2,r0,r1 ; compute partial result of y=b+c

ADR r4,a ; get address for a

LDR r0,[r4] ; get value of a

MUL r2,r2,r0 ; compute final value of y=a*(b+c)

ADR r4,y ; get address for y

STR r2,[r4] ; store value of y at proper location

C A i i ARM I i


50/51


z = (a 2) | (b & 15);

using r0 for a and z, r1 for b,

r4 for addresses



51/51


z = (a 2) | (b & 15);

ADR r4,a ; get address for a

LDR r0,[r4] ; get value of a

MOV r0,r0,LSL 2 ; perform shift (a 2)

ADR r4,b ; get address for bLDR r1,[r4] ; get value of b

AND r1,r1,#15 ; perform logical AND (b & 15)

ORR r1,r0,r1 ; compute final value of z

ADR r4,z ; get address for z

STR r1,[r4] ; store value of z

arm chap 3_last

Documents