arm chap 3_last

Upload: apurv-modi

Post on 04-Apr-2018

218 views

Category:

Documents


0 download

TRANSCRIPT

  • 7/31/2019 ARM Chap 3_last

    1/51

    ARM Instruction Set

    ARM

    Advanced RISC Machines

    1

  • 7/31/2019 ARM Chap 3_last

    2/51

    83

    Stack Processing A stack is usually implemented as a linear data structure which

    grows up (an ascending stack) or down (a descending stack)memory

    A stack pointer holds the address of the current top of the stack,either by pointing to the last valid data item pushed onto the stack(a full stack), or by pointing to the vacant slot where the next dataitem will be placed (an empty stack)

    ARM multiple register transfer instructions support all four forms ofstacks Full ascending: grows up; base register points to the highest address

    containing a valid item empty ascending: grows up; base register points to the first empty

    location above the stack

    Full descending: grows down; base register points to the lowest addresscontaining a valid data

    empty descending: grows down; base register points to the first empty

    location below the stack

  • 7/31/2019 ARM Chap 3_last

    3/51

    The ARM architecture uses the load-store multiple instructions to carry outstack operations.

    Thepop

    operation (removing data from a stack) uses a load multipleinstruction; similarly, thepush operation (placing data onto the stack) uses astore multiple instruction.

    When using a stack you have to decide whether the stack will grow up ordown in memory. A stack is either ascending (A) or descending (D).Ascending stacks grow towards higher memory addresses; in contrast,

    descending stacks grow towards lower memory addresses.

    When you use afull stack(F), the stack pointer sp points to an address that isthe last used or full location (i.e., sp points to the last item on the stack). Incontrast, if you use an empty stack(E) the sp points to an address that is thefirst unused or empty location (i.e., it points after the last item on the stack).

    There are a number of load-store multiple addressing mode aliases availableto support stack operations (see Table). Next to the pop column is the actualload multiple instruction equivalent.

    84

  • 7/31/2019 ARM Chap 3_last

    4/51

    85

    For example, a full ascending stack would have the notation FA appended to the load

    multiple instructionLDMFA. This would be translated into an LDMDA instruction.

  • 7/31/2019 ARM Chap 3_last

    5/51

    86

    Example 20

    The STMFD instruction pushes registers onto the stack, updating the sp. Figure shows a

    push onto a full descending stack. You can see that when the stack grows the stack

    pointer points to the last full entry in the stack.

    PRE r1 = 0x00000002r4 = 0x00000003

    sp = 0x00080014

    STMFD sp!, {r1,r4}

    POST r1 = 0x00000002r4 = 0x00000003

    sp = 0x0008000c

    NOTE : Stack pointer points to the last full entry in the stack.

  • 7/31/2019 ARM Chap 3_last

    6/51

    87

    Example 21

    In contrast, Next figure shows a push operation on an empty stack using the STMED

    instruction. The STMED instruction pushes the registers onto the stack but updates

    register sp to point to the next empty location.

    PRE r1 = 0x00000002r4 = 0x00000003

    sp = 0x00080010

    STMED sp!, {r1,r4}

    POST r1 = 0x00000002r4 = 0x00000003

    sp = 0x00080008

    NOTE : SPto point to the next empty location.

  • 7/31/2019 ARM Chap 3_last

    7/51

    88

    Block Copy Addressing

  • 7/31/2019 ARM Chap 3_last

    8/51

    Stack Examples

    STMFD sp!,

    {r0,r1,r3-r5}

    r5

    r4

    r3

    r1r0S

    P

    Old SP

    STMED sp!,

    {r0,r1,r3-r5}

    r5

    r4

    r3

    r1

    r0

    S

    P

    Old SP

    r5

    r4

    r3

    r1r0

    STMFA sp!,{r0,r1,r3-r5}

    S

    P

    Old SP 0x400

    0x418

    0x3e8

    STMEA sp!,

    {r0,r1,r3-r5}

    r5

    r4

    r3r1

    r0

    S

    P

    Old SP

  • 7/31/2019 ARM Chap 3_last

    9/51

    Load-StoreInstructions

    Three basic forms to move data between ARM registers

    and memory

    Single register load and store instruction

    A byte, a 16-bit half word, a 32-bit word Multiple register load and store instruction

    To save or restore workspace registers for procedure entry and exit

    To copy clocks of data

    Single register swap instruction A value in a register to be exchanged with a value in memory

    To implement semaphores to ensure mutual exclusion on accesses

    90

  • 7/31/2019 ARM Chap 3_last

    10/51

    The swap instruction is a special case of a load-store

    instruction. It swaps the contents of memory with thecontents of a register.

    This instruction is an atomic operationit reads and writes a

    location in the same bus operation, preventing any otherinstruction from reading or writing to that location until it

    completes.

    Swap cannot be interrupted by any other instruction or anyother bus access. We say the system holds the busuntil the

    transaction is complete.

    91

    Single Register Swap Instruction

  • 7/31/2019 ARM Chap 3_last

    11/51

    92

    Syntax: SWP{B}{} Rd,Rm,[Rn]

    Rd

  • 7/31/2019 ARM Chap 3_last

    12/51

    93

    Example 21

    The swap instruction loads a word from memory into register r0 and overwrites the

    memory with register r1.

    PRE mem32[0x9000] = 0x12345678

    r0 = 0x00000000

    r1 = 0x11112222

    r2 = 0x00009000

    SWP r0, r1, [r2]

    POST mem32[0x9000] = 0x11112222

    r0 = 0x12345678

    r1 = 0x11112222

    r2 = 0x00009000

    This instruction is particularly useful when implementing semaphores and mutual

    exclusion in an operating system. You can see from the syntax that this instruction can

    also have a byte size qualifier B, so this instruction allows for both a word and a byte

    swap.

  • 7/31/2019 ARM Chap 3_last

    13/51

    94

    0X36197488

    0X09059945

    0X12345678 0X00009000

    0X00009004

    0X00009008

    PRE mem32[0x9000] = 0x12345678

    r0 = 0x00000000

    r1 = 0x11112222

    r2 = 0x00009000

    0X54233083 0X00009008

    POST mem32[0x9000] = 0x11112222

    r0 = 0x12345678r1 = 0x11112222

    r2 = 0x00009000

    0X36197488

    SWP r0, r1, [r2]

    0X09059945

    0X11112222 0X00009000

    0X00009004

    0X00009008

    0X54233083 0X00009008

    0X00000000r0

    0X11112222r1

    0X00009000r2

    0x12345678r0

    0X11112222r1

    0X00009000r2

    LOAD

    STORE

  • 7/31/2019 ARM Chap 3_last

    14/51

    Concept ofSEMAPHORE

    In computer science, a semaphore is a variable or abstract data type that

    provides a simple but useful abstraction for controlling access by

    multiple processes to a common resource in a parallel

    programming environment.

    A semaphore, in its most basic form, is a protected integer variable thatcan facilitate and restrict access to shared resources in a multi-processing

    environment.

    The two most common kinds of semaphores are counting

    semaphores and binary semaphores. Counting semaphores representmultiple resources, while binary semaphores, as the name implies,

    represents two possible states (generally 0 or 1; locked or unlocked).

    95

  • 7/31/2019 ARM Chap 3_last

    15/51

    96

    A semaphore can only be accessed using the following

    operations: wait() and release().

    wait() is called when a process wants access to a resource. This would be equivalent

    to the arriving customer trying to get an open table. If there is an open table, or the

    semaphore is greater than zero, then he can take that resource and sit at the table.If there is no open table and the semaphore is zero, that process must wait until it

    becomes available. signal() is called when a process is done using a resource, or

    when the patron is finished with his meal.

    The following is an implementation of this counting semaphore (where the value

    can be greater than 1):

  • 7/31/2019 ARM Chap 3_last

    16/51

    In this implementation, a process wanting to enter its critical section it hasto acquire the binary semaphore which will then give it mutual exclusionuntil it signals that it is done.

    For example, we have semaphore s, and two processes, P1 and P2 thatwant to enter their critical sections at the same time. P1 first calls wait(s).The value ofs is decremented to 0 and P1 enters its critical section. WhileP1 is in its critical section, P2 calls wait(s), but because the value of s iszero, it must wait until P1 finishes its critical section and executes signal(s).

    When P1 calls signal, the value of s is incremented to 1, and P2 can thenproceed to execute in its critical section (after decrementing thesemaphore again). Mutual exclusion is achieved because only one processcan be in its critical section at any time.

    97

    E l 22

  • 7/31/2019 ARM Chap 3_last

    17/51

    98

    Example 22

    This example shows a simple data guard that can be used to protect data from being

    written by another task. The SWP instruction holds the bus until the transaction is

    complete.

    loop

    MOV r1, =semaphore

    MOV r2, #1

    SWP r3, r2, [r1] ; hold the bus until complete

    CMP r3, #1

    BEQ loop

    The address pointed to by the semaphore either contains the value 0 or 1. When the

    semaphore equals 1, then the service in question is being used by another process. The

    routine will continue to loop around until the service is released by the other process

    in other words, when the semaphore address location contains the value 0.

  • 7/31/2019 ARM Chap 3_last

    18/51

    ARM instructions by instruction class

    1. Data Processing Instructions

    2. Branch Instructions

    3. Load-Store Instructions

    4. Software Interrupt Instruction

    5. Program Status Register Instructions99

  • 7/31/2019 ARM Chap 3_last

    19/51

    Software Interrupt Instruction

    The software interrupt instruction is used for calls to the operating system

    and is often called a 'supervisor call'.

    It puts the processor into supervisor mode and begins executing

    instructions from address 0x08.

    100

    Binary encoding

    Introduction

    COND OPCODE 24-BIT (INTERPRETED) IMMEDIATE

    31 28 27 24 23 0

  • 7/31/2019 ARM Chap 3_last

    20/51

    The 24-bit immediate field does not influence the operation of the instruction

    but may be interpreted by the system code.

    If the condition is passed the instruction enters supervisor mode using the

    standard ARM exception entry sequence. In detail, the processor actions are:

    1. Save the address of the instruction after the SWI in r14_svc.

    2. Save the CPSR in SPSR_svc.

    3. Enter supervisor mode and disable IRQs (but not FIQs) by setting CPSR[4:0]

    to 100112 and CPSR[7] tol.

    4. Set the PC to and begin executing the instructions there.

    101

    Binary encoding

    COND OPCODE 24-BIT (INTERPRETED) IMMEDIATE

    31 28 27 24 23 0

    Description

    To return to the instruction after the SWI the system routine must not only copy r14_svc

    back into the PC, but it must also restore the CPSR from SPSR_svc.

  • 7/31/2019 ARM Chap 3_last

    21/51

    102

    Syntax: SWI{} SWI_number

    Example 23

  • 7/31/2019 ARM Chap 3_last

    22/51

    103

    Example 23

    Here we have a simple example of an SWI call with SWI number 0x123456, used by ARM

    toolkits as a debugging SWI. Typically the SWI instruction is executed in user mode.

    PRE cpsr = nzcVqift_USER

    pc = 0x00008000

    lr = 0x003fffff; lr = r14r0 = 0x12

    0x00008000 SWI 0x123456

    POST cpsr = nzcVqIft_SVC

    spsr = nzcVqift_USERpc = 0x00000008

    lr = 0x00008004

    r0 = 0x12

    Since SWI instructions are used to call operating system routines, you need some form of

    parameter passing. This is achieved using registers. In this example, register r0 is used to passthe parameter 0x12. The return values are also passed back via registers.

    Code called the SWI handleris required to process the SWI call. The handler obtains the SWI

    number using the address of the executed instruction, which is calculated from the link

    register lr.

  • 7/31/2019 ARM Chap 3_last

    23/51

    ARM instructions by instruction class

    1. Data Processing Instructions

    2. Branch Instructions

    3. Load-Store Instructions

    4. Software Interrupt Instruction

    5. Program Status Register Instructions(MSR, MRS)

    (Self Study!!!) Refer Steve Furber 104

  • 7/31/2019 ARM Chap 3_last

    24/51

    Byte organizations

    Little-endian mode:

    - with the lowest-order byte residing in the low-

    order bits of the word Big-endian mode:

    - the lowest-order byte stored in the highest bits

    of the word

  • 7/31/2019 ARM Chap 3_last

    25/51

    Byte organizations

  • 7/31/2019 ARM Chap 3_last

    26/51

    107

    Thumb Mode

    Thumb is a 16-bit instruction set Optimized for code density from C code

    Improved performance form narrow memory

    Subset of the functionality of the ARM instruction set

    Core has two execution states ARM and Thumb Switch between them using BXinstruction

    Thumb has characteristic features: Most Thumb instruction are executed unconditionally

    Many Thumb data process instruction use a 2-addressformat

    Thumb instruction formats are less regular than ARMinstruction formats, as a result of the dense encoding.

    Th b h hi h d d it !

  • 7/31/2019 ARM Chap 3_last

    27/51

    Thumb has higher code density !

    Code density: it is define as the space taken up in memory by an executableprogram.

    On average, a Thumb implementation of the same code takes up around 30%

    less memory than the equivalent ARM implementation.

    Figure 4.1 shows the same divide code routine implemented in ARM and Thumb

    assembly code. Even though the Thumb implementation uses more instructions,

    the overall memory footprint is reduced. Code density was the main driving

    force for the Thumb instruction set.

    108

  • 7/31/2019 ARM Chap 3_last

    28/51

    109

    Thumb implementation uses more instructions, the overall memory footprint is

    reduced.

    Code density was the main driving force for the Thumb instruction set. Because

    it was also designed as a compiler target, rather than for hand-written assembly

    code, we recommend that you write Thumb-targeted code in a high-level

    language like C or C++.

    h b i

  • 7/31/2019 ARM Chap 3_last

    29/51

    Thumb Register Usage

    In Thumb state, you do not have direct access to all registers.

    Only the low registers r0 to r7 are fully accessible.

    The higher registers r8 to r12 are only accessible with MOV, ADD, orCMP instructions.

    CMP and all the data processing instructions that operate on low

    registers update the condition flags in the cpsr.

    110

  • 7/31/2019 ARM Chap 3_last

    30/51

    111

    Thumb Instruction Set (1/3)

    /

  • 7/31/2019 ARM Chap 3_last

    31/51

    112

    Thumb Instruction Set (2/3)

  • 7/31/2019 ARM Chap 3_last

    32/51

    113

    Thumb Instruction Set (3/3)

    Th b I t ti E t d E it

  • 7/31/2019 ARM Chap 3_last

    33/51

    114

    Thumb Instruction Entry and Exit

    T bit, bit 5 of CPSR

    If T = 1, the processor interprets the instruction stream as 16-bit Thumb

    instruction

    If T = 0, the processor interprets if as standard ARM instructions

    Thumb Entry

    ARM cores startup, after reset, execution ARM instructions

    Executing a branch and Exchange instruction (BX)

    Set the T bit if the bottom bit of the specified register was set

    Switch the PC to the address given in the remainder of the register

    Thumb Exit Executing a thumb BX instruction

    ARM Th b I t ki

  • 7/31/2019 ARM Chap 3_last

    34/51

    ARM-Thumb Interworking

    ARM-Thumb interworking is the name given to the method of

    linking ARM and Thumb code together for both assembly andC/C++.

    To call a Thumb routine from an ARM routine, the core has to

    change state. This state change is shown in the T bit of thecpsr.

    The BX and BLXbranch instructions cause a switch betweenARM and Thumb state while branching to a routine.

    The BX lr instruction returns from a routine, also with a stateswitch if necessary.

    115

  • 7/31/2019 ARM Chap 3_last

    35/51

    There are two versions of the BX or BLX instructions: an ARM

    instruction and a Thumb equivalent.

    The ARM BX instruction enters Thumb state only if bit 0 of the

    address in Rn is set to binary 1; otherwise it enters ARM state. The

    Thumb BX instruction does the same.

    116

    Syntax: BX Rn

    BLX Rn | label

  • 7/31/2019 ARM Chap 3_last

    36/51

    117

    Interworking Instructions

    Interworking is achieved using the Branch Exchange instructions

    In Thumb state

    BX Rn

    In ARM state (on Thumb-aware cores only)

    BX Rn

    Where Rn can be any registers (R0 to R15)

    The performs a branch to an absolute address in 4GB address space

    by copying Rn to the program counter

    Bit 0 of Rn specifies the state to change to

    i hi b

  • 7/31/2019 ARM Chap 3_last

    37/51

    118

    Switching between States

  • 7/31/2019 ARM Chap 3_last

    38/51

    119

    Example 24

    ;Start off in ARM state

    CODE32ADR r0,Into_Thumb+1 ;generate branch target

    ;address & set bit 0

    ;hence arrive Thumb state

    BX r0 ;branch exchange to Thumb

    CODE16 ;assemble subsequent as Thumb

    Into_Thumb

    ADR r5,Back_to_ARM ;generate branch target to

    ;word-aligned address,

    ;hence bit 0 is cleared.

    BX r5 ;branch exchange to ARM

    CODE32 ;assemble subsequent as ARM

    Back_to_ARM

  • 7/31/2019 ARM Chap 3_last

    39/51

    Summary

    120

  • 7/31/2019 ARM Chap 3_last

    40/51

    ARM data instructions

    MLA r0,rl,r2,r3 ,r0=r1 x r2 + r3

    ADD

    ADC

    SUB

    SBCRSB

    RSC

    MULMLA

    Add

    Add with carry

    Subtract

    Subtract with carryReverse subtract ,RSB r0,r1,r2, r0=r2 r1

    Reverse subtract with carry

    MultiplyMultiply and accumulate

  • 7/31/2019 ARM Chap 3_last

    41/51

    ARM data instructions

    BIC r0,r1,r2 sets r0 to r1 and not r2

    - uses the second source operand as a mask, a bit inmask is 1, the corresponding bit in first source

    operand is cleared

    AND

    ORR

    EOR

    BIC

    Bit-wise and

    Bit-wise or

    Bit-wise exclusive-or

    Bit clear

  • 7/31/2019 ARM Chap 3_last

    42/51

    ARM data instructions

    LSLLSR

    ASL

    ASRROR

    RRX

    Logical shift left (zero fill)Logical shift right (zero fill)

    Arithmetic shift left

    Arithmetic shift right, copies the sign bitRotate right

    Rotate right extended with C, performs a 33-bit rotate

  • 7/31/2019 ARM Chap 3_last

    43/51

    ARM comparison instructions

    only set the values of the NZCV bits

    CMPCMN

    TSTTEQ

    CompareNegated compare,

    uses an addition to set the status bits

    Bit-wise test, a bit-wise ANDBit-wise negated test, an exclusive-or

  • 7/31/2019 ARM Chap 3_last

    44/51

    ARM move instructions

    MOV

    MVN

    Move

    MOV r0,r1 ; r0=r1

    Move negated

    Mvn r0,r1 ; r0=not(r1)

  • 7/31/2019 ARM Chap 3_last

    45/51

    ARM load-store instructions

    LDR

    STR

    LDRH

    STRH

    LDRSH

    LDRB

    STRBADR

    Load

    Store

    Load half-word

    Store half-word

    Load half-word signed

    Load byte

    Store byteSet register to address

  • 7/31/2019 ARM Chap 3_last

    46/51

    CAssignments in ARM Instructions

    x = (a + b) - c;

    using r0 for a, r1 for b, r2 for c, and r3 for x.

    registers for indirect addressing. Indirect r4

    load values of a, b, and c into registers

    store value of x back to memory

    C Assignments in ARM Instructions

  • 7/31/2019 ARM Chap 3_last

    47/51

    C Assignments in ARM Instructions

    x = (a + b) - c;

    ADR r4,a ; get address for aLDR r0,[r4] ; get value of a

    ADR r4,b ; get address for b, using r4

    LDR r1,[r4] ; load value of b

    ADD r3,r0,r1 ; set result for x to a + b

    ADR r4,c ; get address for c

    LDR r2,[r4] ; get value of c

    SUB r3,r3,r2 ; complete computation of xADR r4,x ; get address for x

    STR r3,[r4] ; store x at proper location

  • 7/31/2019 ARM Chap 3_last

    48/51

    C Assignments in ARM Instructions

    y = a*(b + c);

    using r0 for both a and b, r1 for c, and r2 for y

    use r4 to store addresses for indirect

    addressing

    C A i i ARM I i

  • 7/31/2019 ARM Chap 3_last

    49/51

    C Assignments in ARM Instructions

    y = a*(b + c);

    ADR r4,b ; get address for b

    LDR r0,[r4] ;get value of b

    ADR r4,c ; get address for c

    LDR r1,[r4] ; get value of c

    ADD r2,r0,r1 ; compute partial result of y=b+c

    ADR r4,a ; get address for a

    LDR r0,[r4] ; get value of a

    MUL r2,r2,r0 ; compute final value of y=a*(b+c)

    ADR r4,y ; get address for y

    STR r2,[r4] ; store value of y at proper location

    C A i i ARM I i

  • 7/31/2019 ARM Chap 3_last

    50/51

    CAssignments in ARM Instructions

    z = (a 2) | (b & 15);

    using r0 for a and z, r1 for b,

    r4 for addresses

    C Assignments in ARM Instructions

  • 7/31/2019 ARM Chap 3_last

    51/51

    CAssignments in ARM Instructions

    z = (a 2) | (b & 15);

    ADR r4,a ; get address for a

    LDR r0,[r4] ; get value of a

    MOV r0,r0,LSL 2 ; perform shift (a 2)

    ADR r4,b ; get address for bLDR r1,[r4] ; get value of b

    AND r1,r1,#15 ; perform logical AND (b & 15)

    ORR r1,r0,r1 ; compute final value of z

    ADR r4,z ; get address for z

    STR r1,[r4] ; store value of z