more arm assemblerweems/homepage/335-notes... · 2019-07-28 · fyi: assembly process • make...

48
More ARM Assembler Getting Down to Raw Bits

Upload: others

Post on 07-May-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

More ARM AssemblerGetting Down to Raw Bits

Line Syntax

• Each part is {optional}

• Comment ends at end of line (like C++ //)

• \ at end of line allows continuation

• String literals can’t be continued

• Line can be up to 4095 characters long

{symbol} {instruction | directive | pseudo} {; comment}

FYI: Assembly Process• Make first pass through source

• Identify all user-defined symbols and their locations

• Requires calculating size of instructions and data

• Generates symbol table

• Second pass through source uses symbol table to compute operand values

• Generate object code (binary instructions and linker information)

Assembly Process

Pass 1

Pass 2

Source Code

Symbol Table

Object Code

Back to Line Syntax

• First look at symbols

• Have seen instruction mnemonics, so will return to them later

• Will focus on operand syntax

• Directives and assembler values

{symbol} {instruction | directive | pseudo} {; comment}

Symbol

• Acts as a label for a line: usually an instruction that is a branch target

• Label is a name whose value is the address of the item on the line

• If nothing else is on the line, it takes the address of the next line with an item

• Can name a variable or constant being defined by a directive

Symbol Syntax

• Must begin in first column of line

• Must be unique within scope

• Uppercase, lowercase, number, _

• Don’t start with a number (special meaning)

• Convention is to use lowercase

Symbol Examplesload1 MOV R0, #123 ;load1 labels this load2 MOV R1, #0x1C ;load2 labels this outer_loop_start ;One line can have inner_loop_start ;multiple labels ADD R2, R0, R1

Putting the label on the preceding line makes a break that helps the label stand out. It also lets us keep instructions in a column, regardless of

the length of the label, improving readability

Instructions, Pseudo-Instructions, Directives

• Must be indented at least one space!

• Otherwise assembler treats as a symbol

• May be UPPERCASE or lowercase but not Mixed (convention is UPPERCASE)

• Followed by operands (literals, symbols, expressions), separated by commas

Instruction Examplesload1 MOV R0, #123 ;load1 labels this load2 MOV R1, #0x1C ;load2 labels this outer_loop_start ;One line can have inner_loop_start ;multiple labels ADD R2, R0, R1

The instruction is usually all upper case, and must NOT start in the first column

Operand Syntax

• Register names (R1)

• Literals (usually #0xFF, for LDR pseudo op: =186400)

• Defined variables and constants

• Operators

• String substitutions

Register Names• R0 - R15: The raw register file (many people just use this) • A1 - A4: Argument registers (R0 - R3) • V1 - V8: Variable registers (R4 - R11) • SB: Static base register (R9) • IP: Intra-procedure scratch register (R12) • SP: Stack pointer (R13) • LR: Link register for subroutine return (R14) • PC: Program counter (R15)

Register Operand Examplesload1 MOV R0, #123 ;load1 labels this load2 MOV R1, #0x1C ;load2 labels this outer_loop_start ;One line can have inner_loop_start ;multiple labels ADD R2, R0, R1

A register can be upper or lower case. One instruction can refer to multiple registers

Numeric Literals• decimal: 12345 • hexadecimal: 0xFF02A3b7, &ff02a3B7 • base 2 - 9: 2_10001001, 8_27056177 • char: ‘A’, ‘3’, ‘\’’. ‘\\’ • Range is 0 to 232-1, except

• DCQ and DCQU directives are 0 to 264-1 • Many assemblers require # preceding literals

• ARM doesn’t, but will give a warning, so use it anyway

Literal Operand Examples

LDR r0, #123 ;decimal value LDR r1, #0x1C ;hex value LDR r3, =2_11110000111100001111000011110000

A literal number is usually preceded by a # sign (which was known as the number sign before being co-opted

as the hash tag) and can be decimal or other bases. Note that the last form (=) is special for the LDR pseudo-op

Directives• Guide the assembler, provide meta-info

• Variable, constant, string definitions

• Conditional assembly (e.g., debug, model)

• Linker info (entry point, visible symbols)

• ALIGN pads out to full word, when 16-bit thumb instructions used

• END of code

AREA Directive

• Marks the start of an ELF section

• Supplies attributes for the section

• Example: AREA SEC1, CODE, READONLY

• So what’s an ELF section?

ELF• Executable and Linkable Format

• Object file format output by assembler for input to linker, or by linker for loader

• ELF header has file-wide info

• Program header lists executable segments

• Section header lists linkable sections

• Can use readelf, elfdump, objdump to view

Basic StructureELF Header Program Header Table Text segment Data segment BSS segment “.symtab” section “.strtab” section “.shstrtab” section Debug sections Section Header Table

ELF Components• ELF Header specifies machine type • Text segment holds executable code • Data segment holds initialized values • BSS segment reserves empty data space • .symtab section contains symbol table • .strtab has text names of symbols • .shstrtab has text names of sections • Debug sections contain other debug info

END directive

• Tells assembler where to stop reading

• Everything after END is ignored

• Assembler will place a literal pool in the space after END

• Literal pool is storage for implicit values that are generated for certain pseudo-instructions like LDR, and must be within 4KB of uses (LTORG can explicitly insert a pool area)

EXPORT Directive

• Declares a label to be visible externally

• Enables calling from other modules (including C code)

• EXPORT label_name

• Has various other options including protections, types, instruction set

EQU Directive• name EQU expression {, type}

• allOnes EQU #0xFFFFFFFF

• LDR r1, allOnes

• expression is a register relative address, PC-relative address, absolute address, or 32-bit integer constant

• type can be ARM, THUMB, CODE32, CODE16, DATA and applies only to an absolute address

• Defines a symbol in the symbol table and assigns it a value for later use in code

Many More

• Mostly for larger programs

• Macro definition

• Importing files, etc.

• See http://www.keil.com/support/man/docs/armasmref/armasmref_caccehia.htm

ARM Assembly ProgrammingUsing MBED to Access the Lowest Level of Hardware

New Program• Can use blinky template • Change code to:

New File• asm_func.s

You can omit the comments

Run It

• Compile and download

• Run your screen or terminal program

• push the button Before: 42After: 83

Basic Assembly Paradigm

• Get values into registers (pass in or load)

• Operate on values in registers

• Use scratch memory as necessary if not enough registers

• Store results in memory or pass back

Preserving Registers

• Assembly needs working registers

• C caller may have values in registers

• Need to preserve and restore via stack

Preserving RegistersPUSH {R2, R3, R4} ; on entry — now availablePOP {R2, R3, R4} ; before return — put them back

• If assembly also calls a subroutine, it must save LR upon entry, restore before return

PUSH {R2, R3, R4, LR} ; on entry also save LRBL subroutine ; overwrites LRPOP {R2, R3, R4, LR} ; before return restore LRBX LR

Calling Assembly• Need to put assembly in .s file

• Declare as extern in C, e.g.,

• extern "C" int my_asm(int value);

• Assembly has to export same name

• Define as many parameters as you need

• If 4 or less, will be in r0 to r3, else on stack

In C

• Call as usual

• int a = my_asm(somevalue);

• somevalue will be in R0

• Return address is in link register (LR, or R14)

• Result is returned in R0

In Assembly AREA asm_func, CODE, READONLY EXPORT my_asm my_asm RSB R0, R0, #0 ;Reverse sub BX LR ;Return ALIGN ;Directive to fill words END

Return via BX LR (branch indirect through link register) Assembler sometimes uses 16-bit instructions -- ALIGN tells

it to pad the last one to get 32-bit alignment

ARM ISA• 3-address RISC (destination, source, source)

• 32-bit word (16-bit version called Thumb)

• One instruction per word, except Thumb

• File of 16 general purpose registers (R0 - R15)

• Except LR (return), PC — (R14, R15)

• By convention, R13 is stack pointer

• Status register

Status Register

• Condition codes: Negative, Zero, Carry, Overflow (bits 31 - 28) NZCV

• Operating mode: Enable/disable interrupts, set endianness, set Thumb mode, user, supervisor, etc., model specific modes

• Don’t change this with an MSR instruction

Set Status Register

• Most instructions can optionally set the condition code (CC) register

• Add S to the end of the mnemonic

• SUB subtracts without setting CC

• SUBS subtracts and sets the CC bits

• Compare (CMP) and test (TST) instructions

Compare and Test

• No result generated - just does arithmetic and sets flags (no S tag)

• CMP subtracts operands

• CMN adds operands (subtracts negative)

• TEQ does XNOR (bitwise equality test)

• TST does AND, used for masked compare

Branch on Condition

• B (unconditional) add a code after:

• EQ (equal), NE (not equal), CS (carry set), CC (carry clear), VS (overflow), VC (no overflow), MI (minus), PL (plus)

• Signed GE (>=), LT (<), GT (>), LE (<=)

ExamplesSUBS R1, R0, R1 ;R1 = R0-R1 BEQ r0equalsr1 ;if Z=1 BCC nocarry ;if C=0 BLE r0lessequalR1 ;if V||Z=1

CMP R0, R1 ;R0-R1 BEQ r0equalsr1

TST R0, #2_0010 ;R0 bit 1 set BNE bit1set ;if Z=0

TEQ R0, #2_0100 ;XOR bit 2 BEQ onlybit2set ;if z=1

Conditional Execute• ARM ISA allows most instructions to execute conditionally (if a

specified condition code isn’t set, the results don’t commit)

• CMP R1, #0 ;Sets flags for R1 - 0• RSBLT R1, R1, #0 ;R1 = 0 - R1 if R1 negative

• Thumb ISA has an If-Then (IT) instruction that applies conditional execution to up to four following instructions

• Why would we want this?

(Predicated Execution)

Branch vs. CE• A branch changes the value in the PC

• In a pipelined system, results in some instructions being mis-issued, -- must be squashed, restarted

• CE doesn’t change the PC, so instructions flow smoothly through the pipe, with some cancelled

• CE adds condition bits to every instruction, but saves space by avoiding branch instructions

• Thumb saves space by using a CE block with IT

CE ExampleCMP R1, #0 RSBLT.W R1, R1, #0 ;if negative, negate ADDZ R1, R1, #1 ;if zero, add 1 SUBGT R1, R1, #1 ;if plus, subtract 1

Note .W extension -- some ARM instructions have Thumb equivalents, and specifying “wide” ensures that the ARM version is used. Otherwise,

assembler chooses. In this case, a Thumb equivalent exists but has to be within an IT block, so it might complain.

The earlier example didn’t show this, just to keep it simple.

IT blocks• If-Then (IT) specifies an execution condition for the next instruction

and up to three more (called an IT block)

• Conditions can be reversed per instruction

• Cannot be nested

• Can’t branch into an IT block

• Limitations on setting condition codes

IT Instruction• IT{x{y{z}}} condition

• x, y, z can each be T or E (then or else)

• ITTEE GT is equivalent toIF (>) instruction1 instruction2 ELSE instruction3 instruction4

IT Block Instructions• Must specify matching conditions

• Any arrangement allowed

• Can be 1 to 4 instructions in block

CMP R0, R1 ;Compare ITTEE GT ;Set up IT block ADDGT R0, R0, #1 ;Then R0++ MOVGT R1, #0 ;Then R1=0 SUBLE R0, R0, #1 ;Else R0-- MOVLE R1, R0 ;Else R1=R0

Try It Out

• Change asm_func.s to return absolute value

• Can use CMP to compare R0 to #0

• Can use IT or conditional execute on LT

• Discuss in pairs, ask questions

C Driver