cortex-m4 chapter architecture and asm...

48
Cortex-M4 Architecture and ASM Programming Introduction In this chapter programming the Cortex-M4 in assembly and C will be introduced. Preference will be given to explaining code development for the Cypress FM4 S6E2CC, STM32F4 Discov- ery, and LPC4088 Quick Start. The basis for the material pre- sented in this chapter is the course notes from the ARM LiB program 1 . Overview Cortex-M4 Memory Map Cortex-M4 Memory Map Bit-band Operations Cortex-M4 Program Image and Endianness ARM Cortex-M4 Processor Instruction Set ARM and Thumb Instruction Set Cortex-M4 Instruction Set 1. LiB Low-level Embedded NXP LPC4088 Quick Start Chapter 3

Upload: buidien

Post on 08-Apr-2018

254 views

Category:

Documents


7 download

TRANSCRIPT

Page 1: Cortex-M4 Chapter Architecture and ASM Programmingmwickert/ece5655/lecture_notes/ARM/ece5655_c… · Cortex-M4 Architecture and ASM Programming Introduction In this chapter programming

er

Cortex-M4 Architecture and ASM ProgrammingIntroduction

In this chapter programming the Cortex-M4 in assembly and Cwill be introduced. Preference will be given to explaining codedevelopment for the Cypress FM4 S6E2CC, STM32F4 Discov-ery, and LPC4088 Quick Start. The basis for the material pre-sented in this chapter is the course notes from the ARM LiBprogram1.

Overview

• Cortex-M4 Memory Map

– Cortex-M4 Memory Map

– Bit-band Operations

– Cortex-M4 Program Image and Endianness

• ARM Cortex-M4 Processor Instruction Set

– ARM and Thumb Instruction Set

– Cortex-M4 Instruction Set

1.LiB Low-level Embedded NXP LPC4088 Quick Start

Chapt

3

Page 2: Cortex-M4 Chapter Architecture and ASM Programmingmwickert/ece5655/lecture_notes/ARM/ece5655_c… · Cortex-M4 Architecture and ASM Programming Introduction In this chapter programming

Chapter 3 • Cortex-M4 Architecture and ASM Programming

Cortex-M4 Memory Map

• The Cortex-M4 processor has 4 GB of memory address space

– Support for bit-band operation (detailed later)

• The 4GB memory space is architecturally defined as a num-ber of regions

– Each region is given for recommended usage

– Easy for software programmer to port between differentdevices

• Nevertheless, despite of the default memory map, the actualusage of the memory map can also be flexibly defined by theuser, except some fixed memory addresses, such as internalprivate peripheral bus

3–2 ECE 5655/4655 Real-Time DSP

Page 3: Cortex-M4 Chapter Architecture and ASM Programmingmwickert/ece5655/lecture_notes/ARM/ece5655_c… · Cortex-M4 Architecture and ASM Programming Introduction In this chapter programming

M4 Memory Map (cont.)

M4 Memory Map (cont.)

Priv

ate

peri

pher

als

e.g.

NV

IC, S

CS

Mai

nly

used

for

exte

rnal

per

iphe

rals

e.g.

SD c

ard

Mai

nly

used

for

exte

rnal

mem

orie

se.

g. ex

tern

al D

DR

, FLA

SH, L

CD

Mai

nly

used

for

on-c

hip

peri

pher

als

e.g.

AH

B, A

PB p

erip

hera

ls

Mai

nly

used

for

data

mem

ory

e.g.

on-c

hip

SRA

M, S

DR

AM

Mai

nly

used

for

prog

ram

cod

e e.

g. on

-chi

p FL

ASH

Vend

or s

peci

ficM

emor

y

Exte

rnal

dev

ice

Exte

rnal

RA

M

Peri

pher

als

SRA

M

Cod

e

0xFF

FFFF

FF

0xE0

0000

00Pr

ivat

e Pe

riph

eral

Bus

(PPB

)0x

DFF

FFFF

F

0xA

0000

000

0x9F

FFFF

FF

0x60

0000

000x

5FFF

FFFF

0x40

0000

000x

3FFF

FFFF

0x1F

FFFF

FF0x

2000

0000

0x00

0000

00

512M

B

512M

B

512M

B

1GB

1GB

512M

B0x

E00F

FFFF

0xE0

1000

00R

eser

ved

for

othe

r pu

rpos

esRO

M t

able

Exte

rnal

PPB

Embe

dded

trac

e m

acro

cell

Trac

e po

rt in

terf

ace

unit

Res

erve

d

Syst

em C

ontr

ol S

pace

, inc

ludi

ngN

este

d Ve

ctor

ed In

terr

upt

Con

trol

ler

(NV

IC)

Res

erve

dFe

tch

patc

h an

d br

eakp

oint

uni

t

Dat

a w

atch

poin

tan

d tr

ace

unit

Inst

rum

enta

tion

trac

e m

acro

cell

Exte

rnal

PPB

Inte

rnal

PPB

ECE 5655/4655 Real-Time DSP 3–3

Page 4: Cortex-M4 Chapter Architecture and ASM Programmingmwickert/ece5655/lecture_notes/ARM/ece5655_c… · Cortex-M4 Architecture and ASM Programming Introduction In this chapter programming

Chapter 3 • Cortex-M4 Architecture and ASM Programming

M4 Memory Map (cont.)

• Code Region

– Primarily used to store program code

– Can also be used for data memory

– On-chip memory, such as on-chip FLASH

• SRAM Region

– Primarily used to store data, such as heaps and stacks

– Can also be used for program code

– On-chip memory; despite its name “SRAM”, the actualdevice could be SRAM, SDRAM or other types

• Peripheral Region

– Primarily used for peripherals, such as Advanced High-performance Bus (AHB) or Advanced Peripheral Bus(APB) peripherals

• External RAM Region

– Primarily used to store large data blocks, or memorycaches

– Off-chip memory, slower than on-chip SRAM region

• External Device Region

– Primarily used to map to external devices

– Off-chip devices, such as SD card

• Internal Private Peripheral Bus (PPB)

3–4 ECE 5655/4655 Real-Time DSP

Page 5: Cortex-M4 Chapter Architecture and ASM Programmingmwickert/ece5655/lecture_notes/ARM/ece5655_c… · Cortex-M4 Architecture and ASM Programming Introduction In this chapter programming

Cortex-M4 Memory Map Example

– Used inside the processor core for internal control

– Within PPB, a special range of memory is defined as Sys-tem Control Space (SCS)

– The Nested Vectored Interrupt Controller (NVIC) is part ofSCS

Cortex-M4 Memory Map Example

AHB bus

External SRAM,FLASH External LCD SD card

Cortex-M4 PPB SCS NVICDebug Ctrl

On-chip FLASH(Code Region)

On-chip SRAM(SRAM Region) Peripheral Region

External memory interface(External RAM Region)

External device interface(External Device Region)

Timer UART GPIO

Chip Silicon

ECE 5655/4655 Real-Time DSP 3–5

Page 6: Cortex-M4 Chapter Architecture and ASM Programmingmwickert/ece5655/lecture_notes/ARM/ece5655_c… · Cortex-M4 Architecture and ASM Programming Introduction In this chapter programming

Chapter 3 • Cortex-M4 Architecture and ASM Programming

Bit-band Operations

• Bit-band operation allows a single load/store operation toaccess a single bit in the memory, for example, to change asingle bit of one 32-bit data:

– Normal operation without bit-band (read-modify-write)

– Read the value of 32-bit data

– Modify a single bit of the 32-bit value (keep other bitsunchanged)

– Write the value back to the address

– Bit-band operation

– Directly write a single bit (0 or 1) to the “bit-band aliasaddress” of the data

• Bit-band alias address

– Each bit-band alias address is mapped to a real dataaddress

– When writing to the bit-band alias address, only a singlebit of the data will be changed

3–6 ECE 5655/4655 Real-Time DSP

Page 7: Cortex-M4 Chapter Architecture and ASM Programmingmwickert/ece5655/lecture_notes/ARM/ece5655_c… · Cortex-M4 Architecture and ASM Programming Introduction In this chapter programming

Bit-band Operation Example

Bit-band Operation Example

• For example, in order to set bit[3] in word data in address0x20000000:

• Read-Modify-Write operation

– Read the real data address (0x20000000)

– Modify the desired bit (retain other bits unchanged)

– Write the modified data back

• Bit-band operation

– Directly set the bit by writing ‘1’ to address 0x2200000C,which is the alias address of the fourth bit of the 32-bitdata at 0x20000000

– In effect, this single instruction is mapped to 2 bus trans-fers: read data from 0x20000000 to the buffer, and thenwrite to 0x20000000 from the buffer with bit [3] set

;Read-Modify-Write Operation

LDR R1, =0x20000000 ;Setup addressLDR R0, [R1] ;ReadORR.W R0, #0x8 ;Modify bitSTR R0, [R1] ;Write back

;Bit-band Operation

LDR R1, =0x2200000C ;Setup addressMOV R0, #1 ;Load dataSTR R0, [R1] ;Write

ECE 5655/4655 Real-Time DSP 3–7

Page 8: Cortex-M4 Chapter Architecture and ASM Programmingmwickert/ece5655/lecture_notes/ARM/ece5655_c… · Cortex-M4 Architecture and ASM Programming Introduction In this chapter programming

Chapter 3 • Cortex-M4 Architecture and ASM Programming

Bit-band Alias Address

Each bit of the 32-bit data is one-to-one mapped to the bit-bandalias address

– For example, the fourth bit (bit [3]) of the data at0x20000000 is mapped to the bit-band alias address at0x2200000C

– Hence, to set bit [3] of the data at 0x20000000, we onlyneed to write ‘1’ to address 0x2200000C

– In Cortex-M4, there are two pre-defined bit-band aliasregions: one for SRAM region, and one for peripheralsregion

0x20000000

0x20000004

0x20000008

Real 32-bit data address

0x22000000

0x22000080

0x22000100

0x2200000C

0x22000018

Bit-band alias address

3–8 ECE 5655/4655 Real-Time DSP

Page 9: Cortex-M4 Chapter Architecture and ASM Programmingmwickert/ece5655/lecture_notes/ARM/ece5655_c… · Cortex-M4 Architecture and ASM Programming Introduction In this chapter programming

Bit-band Alias Address (cont.)

Bit-band Alias Address (cont.)

• SRAM region

– 32MB memory space (0x22000000 – 0x23FFFFFF) isused as the bit-band alias region for 1MB data(0x20000000 – 0x200FFFFF)

• Peripherals region

– 32MB memory space (0x42000000 – 0x43FFFFFF) isused as the bit-band alias region for 1MB data(0x40000000 – 0x400FFFFF)

External RAM

Peripherals

SRAM

Code

0x600000000x5FFFFFFF

0x400000000x3FFFFFFF

0x1FFFFFFF0x20000000

0x00000000

512MB

512MB

512MB

1MB Bit-band region

32MB Bit-band alias

31MB non-bit-band region

0x20000000

0x20100000

0x22000000

0x23FFFFFF

0x21FFFFFF

1MB Bit-band region

32MB Bit-band alias

31MB non-bit-band region

0x40000000

0x40100000

0x42000000

0x43FFFFFF

0x41FFFFFF

ECE 5655/4655 Real-Time DSP 3–9

Page 10: Cortex-M4 Chapter Architecture and ASM Programmingmwickert/ece5655/lecture_notes/ARM/ece5655_c… · Cortex-M4 Architecture and ASM Programming Introduction In this chapter programming

Chapter 3 • Cortex-M4 Architecture and ASM Programming

Benefits of Bit-Band Operations

• Faster bit operations

• Fewer instructions

• Atomic operation, avoid hazards

– For example, if an interrupt is triggered and served duringthe Read-Modify-Write operations, and the interrupt ser-vice routine modifies the same data, a data conflict willoccur

Interrupt occurs

Read data at 0x00 Modify bit [1]

Read data at 0x00 Modify bit [1] Write data back

Write data back

Interrupt returns

Bit [1] modified by ISR is overwritten by the main program

Interrupt Service Routine

Main program

3–10 ECE 5655/4655 Real-Time DSP

Page 11: Cortex-M4 Chapter Architecture and ASM Programmingmwickert/ece5655/lecture_notes/ARM/ece5655_c… · Cortex-M4 Architecture and ASM Programming Introduction In this chapter programming

Cortex-M4 Program Image

Cortex-M4 Program Image

• The program image in Cortex-M4 contains

– Vector table -- includes the starting addresses of exceptions(vectors) and the value of the main stack point (MSP);

– C start-up routine;

– Program code – application code and data;

– C library code – program codes for C library functions

0x00000000

Code region

Start-up routine &Program code &C library code

Vector table

ProgramImage

Initial MSP valueReset vectorNMI vector

Hard fault vectorMemManage fault

ReservedPendSVSysTick

External Interrupts

Bus faultUsage fault

Reserved

SVCallDebug monitor

ECE 5655/4655 Real-Time DSP 3–11

Page 12: Cortex-M4 Chapter Architecture and ASM Programmingmwickert/ece5655/lecture_notes/ARM/ece5655_c… · Cortex-M4 Architecture and ASM Programming Introduction In this chapter programming

Chapter 3 • Cortex-M4 Architecture and ASM Programming

Cortex-M4 Program Image (cont)

• After Reset, the processor:

– First reads the initial MSP value;

– Then reads the reset vector;

– Branches to the start of the programme execution address(reset handler);

– Subsequently executes program instructions

Reset

Fetch initial value for MSP(Read address 0x00000000)

Fetch reset vector(Read address 0x00000004)

Fetch 1st instruction(Read address of reset vector)

Fetch 2nd instruction(Read subsequent instructions)

3–12 ECE 5655/4655 Real-Time DSP

Page 13: Cortex-M4 Chapter Architecture and ASM Programmingmwickert/ece5655/lecture_notes/ARM/ece5655_c… · Cortex-M4 Architecture and ASM Programming Introduction In this chapter programming

Cortex-M4 Endianness

Cortex-M4 Endianness

• Endian refers to the order of bytes stored in memory

– Little endian: lowest byte of a word-size data is stored inbit 0 to bit 7

– Big endian: lowest byte of a word-size data is stored in bit24 to bit 31

• Cortex-M4 supports both little endian and big endian

• However, Endianness only exists in the hardware level

Byte0 Byte1 Byte2 Byte3

Byte0 Byte1 Byte2 Byte3

Byte0 Byte1 Byte2 Byte3

[7:0][15:8][23:16][31:24]

Word 1

Word 2

Word 3

Big endian 32-bit memory

Byte0Byte1Byte2Byte3

Byte0Byte1Byte2Byte3

Byte0Byte1Byte2Byte3

0x00000000

0x00000004

0x00000008

Address [7:0][15:8][23:16][31:24]

Word 1

Word 2

Word 3

Little endian 32-bit memory

ECE 5655/4655 Real-Time DSP 3–13

Page 14: Cortex-M4 Chapter Architecture and ASM Programmingmwickert/ece5655/lecture_notes/ARM/ece5655_c… · Cortex-M4 Architecture and ASM Programming Introduction In this chapter programming

Chapter 3 • Cortex-M4 Architecture and ASM Programming

ARM and Thumb® Instruction Set

• Early ARM instruction set

– 32-bit instruction set, called the ARM instructions

– Powerful and good performance

– Larger program memory compared to 8-bit and 16-bit pro-cessors

– Larger power consumption

• Thumb-1 instruction set

– 16-bit instruction set, first used in ARM7TDMI processorin 1995

– Provides a subset of the ARM instructions, giving bettercode density compared to 32-bit RISC architecture

– Code size is reduced by ~30%, but performance is alsoreduced by ~20%

3–14 ECE 5655/4655 Real-Time DSP

Page 15: Cortex-M4 Chapter Architecture and ASM Programmingmwickert/ece5655/lecture_notes/ARM/ece5655_c… · Cortex-M4 Architecture and ASM Programming Introduction In this chapter programming

ARM and Thumb Instruction Set (cont.)

ARM and Thumb Instruction Set (cont.)

• Mix of ARM and Thumb-1 Instruction sets

– Benefit from both 32-bit ARM (high performance) and 16-bit Thumb-1 (high code density)

– A multiplexer is used to switch between two states: ARMstate (32-bit) and Thumb state (16-bit), which requires aswitching overhead

• Thumb-2 instruction set

• Consists of both 32-bit Thumb instructions and original 16-bit Thumb-1 instruction sets

• Compared to 32-bit ARM instructions set, code size isreduced by ~26%, while keeping a similar performance

• Capable of handling all processing requirements in one oper-ation state

IncomingInstructions

Thumb remapto ARM

ARMInstructiondecoder

InstructionsExecuting

T bit, 0: select ARM,1: select Thumb

0

1

ECE 5655/4655 Real-Time DSP 3–15

Page 16: Cortex-M4 Chapter Architecture and ASM Programmingmwickert/ece5655/lecture_notes/ARM/ece5655_c… · Cortex-M4 Architecture and ASM Programming Introduction In this chapter programming

Chapter 3 • Cortex-M4 Architecture and ASM Programming

Cortex-M4 Instruction Set

• Cortex-M4 processor

– ARMv7-M architecture

– Supports 32-bit Thumb-2 instructions

– Possible to handle all processing requirements in one oper-ation state (Thumb state)

– Compared with traditional ARM processors (use stateswitching), advantages include:

* No state switching overhead – both execution time and instruc-tion space are saved

* No need to separate ARM code and Thumb code source files,which makes the development and maintenance of softwareeasier

* Easier to get optimized efficiency and performance

3–16 ECE 5655/4655 Real-Time DSP

Page 17: Cortex-M4 Chapter Architecture and ASM Programmingmwickert/ece5655/lecture_notes/ARM/ece5655_c… · Cortex-M4 Architecture and ASM Programming Introduction In this chapter programming

Cortex-M4 Instruction Set (cont.)

Cortex-M4 Instruction Set (cont.)

• ARM assembly syntax:label

mnemonic operand1,operand2, …; Comments

– Label is used as a reference to an address location;

– Mnemonic is the name of the instruction;

– Operand1 is the destination of the operation;

– Operand2 is normally the source of the operation;

– Comments are written after “ ; ”, which does not affect theprogram;

– For exampleMOVSR3, #0x11;Set register R3 to 0x11

– Note that the assembly code can be assembled by eitherARM assembler (armasm) or assembly tools from a vari-ety of vendors (e.g. GNU tool chain). When using GNUtool chain, the syntax for labels and comments is slightlydifferent

ECE 5655/4655 Real-Time DSP 3–17

Page 18: Cortex-M4 Chapter Architecture and ASM Programmingmwickert/ece5655/lecture_notes/ARM/ece5655_c… · Cortex-M4 Architecture and ASM Programming Introduction In this chapter programming

Chapter 3 • Cortex-M4 Architecture and ASM Programming

Cortex-M4 Instruction Set Tables

Mne

mon

icO

pera

nds

Bri

ef d

escr

ipti

onFl

ags

AD

C, A

DC

S{R

d,} R

n, O

p2A

dd w

ith C

arry

N,Z

,C,V

AD

D, A

DD

S{R

d,} R

n, O

p2A

ddN

,Z,C

,V

AD

D, A

DD

W{R

d,} R

n, #

imm

12A

ddN

,Z,C

,V

AD

RR

d, la

bel

Load

PC

-rel

ativ

e A

ddre

ss

AN

D, A

ND

S{R

d,} R

n, O

p2Lo

gica

l AN

DN

,Z,C

ASR

, ASR

SR

d, R

m, <

Rs|

#n>

Ari

thm

etic

Shi

ft R

ight

N,Z

,C

Bla

bel

Bran

ch

BFC

Rd,

#ls

b, #

wid

thBi

t Fie

ld C

lear

BFI

Rd,

Rn,

#ls

b, #

wid

thBi

t Fie

ld In

sert

BIC

, BIC

S{R

d,} R

n, O

p2Bi

t Cle

arN

,Z,C

BKPT

#im

mBr

eakp

oint

BLla

bel

Bran

ch w

ith L

ink

BLX

Rm

Bran

ch in

dire

ct w

ith L

ink

BXR

mBr

anch

indi

rect

3–18 ECE 5655/4655 Real-Time DSP

Page 19: Cortex-M4 Chapter Architecture and ASM Programmingmwickert/ece5655/lecture_notes/ARM/ece5655_c… · Cortex-M4 Architecture and ASM Programming Introduction In this chapter programming

Cortex-M4 Instruction Set Tables

Cortex-M4 Instruction Set Tables (cont.)

Mne

mon

icO

pera

nds

Bri

ef d

escr

ipti

onFl

ags

CBN

ZR

n, la

bel

Com

pare

and

Bra

nch

if N

on Z

ero

CBZ

Rn,

labe

lC

ompa

re a

nd B

ranc

h if

Zer

o

CLR

EXC

lear

Exc

lusi

ve

CLZ

Rd,

Rm

Cou

nt L

eadi

ng Z

eros

CM

NR

n, O

p2C

ompa

re N

egat

ive

N,Z

,C,V

CM

PR

n, O

p2C

ompa

reN

,Z,C

,V

CPS

IDi

Cha

nge

Proc

esso

r St

ate,

Dis

able

Inte

rrup

ts

CPS

IEi

Cha

nge

Proc

esso

r St

ate,

Ena

ble

Inte

rrup

ts

DM

BD

ata

Mem

ory

Barr

ier

DSB

Dat

a Sy

nchr

oniz

atio

n Ba

rrie

r

EOR

, EO

RS

{Rd,

} R

n, O

p2Ex

clus

ive

OR

N,Z

,C

ISB

-In

stru

ctio

n Sy

nchr

oniz

atio

n Ba

rrie

r

ECE 5655/4655 Real-Time DSP 3–19

Page 20: Cortex-M4 Chapter Architecture and ASM Programmingmwickert/ece5655/lecture_notes/ARM/ece5655_c… · Cortex-M4 Architecture and ASM Programming Introduction In this chapter programming

Chapter 3 • Cortex-M4 Architecture and ASM Programming

Cortex-M4 Instruction Set Tables (cont.)

Mne

mon

icO

pera

nds

Bri

ef d

escr

ipti

onFl

ags

ITIf-

The

n co

nditi

on b

lock

LDM

Rn{

!}, r

eglis

tLo

ad M

ultip

le r

egis

ters

, inc

rem

ent a

fter

LDM

DB,

LD

MEA

Rn{

!}, r

eglis

tLo

ad M

ultip

le r

egis

ters

, dec

rem

ent b

efor

e

LDM

FD, L

DM

IAR

n{!}

, reg

list

Load

Mul

tiple

reg

iste

rs, i

ncre

men

t afte

r

LDR

Rt,

[Rn,

#of

fset

]Lo

ad R

egis

ter

with

wor

d

LDR

B, L

DR

BTR

t, [R

n, #

offs

et]

Load

Reg

iste

r w

ith b

yte

LDR

DR

t, R

t2, [

Rn,

#of

fset

]Lo

ad R

egis

ter

with

tw

o by

tes

LDR

EXR

t, [R

n, #

offs

et]

Load

Reg

iste

r Ex

clus

ive

LDR

EXB

Rt,

[Rn]

Load

Reg

iste

r Ex

clus

ive

with

Byt

e

LDR

EXH

Rt,

[Rn]

Load

Reg

iste

r Ex

clus

ive

with

Hal

fwor

d

LDR

H, L

DR

HT

Rt,

[Rn,

#of

fset

]Lo

ad R

egis

ter

with

Hal

fwor

d

3–20 ECE 5655/4655 Real-Time DSP

Page 21: Cortex-M4 Chapter Architecture and ASM Programmingmwickert/ece5655/lecture_notes/ARM/ece5655_c… · Cortex-M4 Architecture and ASM Programming Introduction In this chapter programming

Cortex-M4 Instruction Set Tables

Cortex-M4 Instruction Set Tables (cont.)

Mne

mon

icO

pera

nds

Bri

ef d

escr

ipti

onFl

ags

LDR

SB, L

DR

SBT

Rt,

[Rn,

#of

fset

]Lo

ad R

egis

ter

with

Sig

ned

Byte

LDR

SH, L

DR

SHT

Rt,

[Rn,

#of

fset

]Lo

ad R

egis

ter

with

Sig

ned

Hal

fwor

d

LDRT

Rt,

[Rn,

#of

fset

]Lo

ad R

egis

ter

with

wor

d

LSL,

LSL

SR

d, R

m, <

Rs|

#n>

Logi

cal S

hift

Left

N,Z

,C

LSR

, LSR

SR

d, R

m, <

Rs|

#n>

Logi

cal S

hift

Rig

htN

,Z,C

MLA

Rd,

Rn,

Rm

, Ra

Mul

tiply

with

Acc

umul

ate,

32-

bit r

esul

t

MLS

Rd,

Rn,

Rm

, Ra

Mul

tiply

and

Sub

trac

t, 32

-bit

resu

lt

MO

V, M

OV

SR

d, O

p2M

ove

N,Z

,C

MO

VT

Rd,

#im

m16

Mov

e To

p

MO

VW

, MO

VR

d, #

imm

16M

ove

16-b

it co

nsta

ntN

,Z,C

MR

SR

d, s

pec_

reg

Mov

e fr

om S

peci

al R

egis

ter

to g

ener

al r

egis

ter

MSR

spec

_reg

, Rm

Mov

e fr

om g

ener

al r

egis

ter

to S

peci

al R

egis

ter

N,Z

,C,V

ECE 5655/4655 Real-Time DSP 3–21

Page 22: Cortex-M4 Chapter Architecture and ASM Programmingmwickert/ece5655/lecture_notes/ARM/ece5655_c… · Cortex-M4 Architecture and ASM Programming Introduction In this chapter programming

Chapter 3 • Cortex-M4 Architecture and ASM Programming

Cortex-M4 Instruction Set Tables (cont.)

Mne

mon

icO

pera

nds

Bri

ef d

escr

ipti

onFl

ags

MU

L, M

ULS

{Rd,

} Rn,

Rm

Mul

tiply,

32-

bit r

esul

tN

,Z

MV

N, M

VN

SR

d, O

p2M

ove

NO

TN

,Z,C

NO

PN

o O

pera

tion

OR

N, O

RN

S{R

d,} R

n, O

p2Lo

gica

l OR

NO

TN

,Z,C

OR

R, O

RR

S{R

d,} R

n, O

p2Lo

gica

l OR

N,Z

,C

PKH

TB,

PKH

BT{R

d, }

Rn,

Rm

, Op2

Pack

Hal

fwor

d

POP

regl

ist

Pop

regi

ster

s fr

om s

tack

PUSH

regl

ist

Push

reg

iste

rs o

nto

stac

k

QA

DD

{Rd,

} R

n, R

mSa

tura

ting

doub

le a

nd A

ddQ

QA

DD

16{R

d, }

Rn,

Rm

Satu

ratin

gAdd

16

QA

DD

8{R

d, }

Rn,

Rm

Satu

ratin

gAdd

8

3–22 ECE 5655/4655 Real-Time DSP

Page 23: Cortex-M4 Chapter Architecture and ASM Programmingmwickert/ece5655/lecture_notes/ARM/ece5655_c… · Cortex-M4 Architecture and ASM Programming Introduction In this chapter programming

Cortex-M4 Instruction Set Tables

Cortex-M4 Instruction Set Tables (cont.)

Mne

mon

icO

pera

nds

Bri

ef d

escr

ipti

onFl

ags

QA

SX{R

d, }

Rn,

Rm

Sa

tura

ting A

dd a

nd S

ubtr

act w

ith E

xcha

nge

QD

AD

D{R

d, }

Rn,

Rm

Satu

ratin

g Add

Q

QD

SUB

{Rd,

} R

n, R

mSa

tura

ting

doub

le a

nd S

ubtr

act

Q

QSA

X{R

d, }

Rn,

Rm

Sa

tura

ting

Subt

ract

and

Add

with

Exc

hang

e

QSU

B{R

d,}

Rn,

Rm

Satu

ratin

g Su

btra

ctQ

QSU

B16

{Rd,

} R

n, R

mSa

tura

ting

Subt

ract

16

QSU

B8{R

d, }

Rn,

Rm

Satu

ratin

g Su

btra

ct 8

RBI

TR

d, R

nR

ever

se B

its

REV

Rd,

Rn

Rev

erse

byt

e or

der

in a

wor

d

REV

16R

d, R

nR

ever

se b

yte

orde

r in

eac

h ha

lfwor

d

REV

SHR

d, R

nR

ever

se b

yte

orde

r in

bot

tom

hal

fwor

dan

d si

gn e

xten

d

ROR

, RO

RS

Rd,

Rm

, <R

s|#n

>R

otat

e R

ight

N,Z

,C

ECE 5655/4655 Real-Time DSP 3–23

Page 24: Cortex-M4 Chapter Architecture and ASM Programmingmwickert/ece5655/lecture_notes/ARM/ece5655_c… · Cortex-M4 Architecture and ASM Programming Introduction In this chapter programming

Chapter 3 • Cortex-M4 Architecture and ASM Programming

Cortex-M4 Instruction Set Tables (cont.)

Mne

mon

icO

pera

nds

Bri

ef d

escr

ipti

onFl

ags

RR

X, R

RX

SR

d, R

mR

otat

e R

ight

with

Ext

end

N,Z

,C

RSB

, RSB

S{R

d,} R

n, O

p2R

ever

se S

ubtr

act

N,Z

,C,V

SAD

D16

{Rd,

} R

n, R

mSi

gned

Add

16

GE

SAD

D8

{Rd,

} R

n, R

mSi

gned

Add

8G

E

SASX

{Rd,

}R

n, R

mSi

gned

Add

and

Sub

trac

t with

Exc

hang

eG

E

SBC

, SBC

S{R

d,} R

n, O

p2Su

btra

ct w

ith C

arry

N,Z

,C,V

SBFX

Rd,

Rn,

#ls

b, #

wid

thSi

gned

Bit

Fiel

d Ex

trac

t

SDIV

{Rd,

} Rn,

Rm

Sign

ed D

ivid

e

SEV

Send

Eve

nt

SHA

DD

16{R

d,} R

n, R

mSi

gned

Hal

ving

Add

16

SHA

DD

8{R

d,} R

n, R

mSi

gned

Hal

ving

Add

8

SHA

SX{R

d,} R

n, R

mSi

gned

Hal

ving

Add

and

Sub

trac

t with

Exc

hang

e

3–24 ECE 5655/4655 Real-Time DSP

Page 25: Cortex-M4 Chapter Architecture and ASM Programmingmwickert/ece5655/lecture_notes/ARM/ece5655_c… · Cortex-M4 Architecture and ASM Programming Introduction In this chapter programming

Cortex-M4 Instruction Set Tables

Cortex-M4 Instruction Set Tables (cont.)

Mne

mon

icO

pera

nds

Bri

ef d

escr

ipti

onFl

ags

SHSA

X{R

d,}

Rn,

Rm

Sign

ed H

alvi

ng S

ubtr

act a

nd A

dd w

ith E

xcha

nge

SHSU

B16

{Rd,

} Rn,

Rm

Sign

ed H

alvi

ng S

ubtr

act 1

6

SHSU

B8{R

d,}

Rn,

Rm

Sign

ed H

alvi

ng S

ubtr

act 8

SMLA

BB, S

MLA

BT, S

MLA

TB,

SM

LAT

TR

d, R

n, R

m, R

aSi

gned

Mul

tiply

Acc

umul

ate

Long

(hal

fwor

ds)

Q

SMLA

D, S

MLA

DX

Rd,

Rn,

Rm

, Ra

Sign

ed M

ultip

ly A

ccum

ulat

e D

ual

Q

SMLA

LR

dLo,

RdH

i, R

n, R

mSi

gned

Mul

tiply

with

Acc

umul

ate

(32

x 32

+ 6

4), 6

4-bi

t re

sult

SMLA

LBB,

SM

LALB

T, SM

LALT

B,

SMLA

LTT

RdL

o, R

dHi,

Rn,

Rm

Sign

ed M

ultip

ly A

ccum

ulat

e Lo

ng, h

alfw

ords

SMLA

LD, S

MLA

LDX

RdL

o, R

dHi,

Rn,

Rm

Sign

ed M

ultip

ly A

ccum

ulat

e Lo

ng D

ual

SMLA

WB,

SM

LAW

TR

d, R

n, R

m, R

aSi

gned

Mul

tiply

Acc

umul

ate,

wor

d by

hal

fwor

dQ

SMLS

DR

d, R

n, R

m, R

aSi

gned

Mul

tiply

Sub

trac

t Dua

lQ

SMLS

LDR

dLo,

RdH

i, R

n, R

mSi

gned

Mul

tiply

Sub

trac

t Lon

g D

ual

SMM

LAR

d, R

n, R

m, R

aSi

gned

Mos

t sig

nific

ant w

ord

Mul

tiply

Acc

umul

ate

ECE 5655/4655 Real-Time DSP 3–25

Page 26: Cortex-M4 Chapter Architecture and ASM Programmingmwickert/ece5655/lecture_notes/ARM/ece5655_c… · Cortex-M4 Architecture and ASM Programming Introduction In this chapter programming

Chapter 3 • Cortex-M4 Architecture and ASM Programming

Cortex-M4 Instruction Set Tables (cont.)

Mne

mon

icO

pera

nds

Bri

ef d

escr

ipti

onFl

ags

SMM

LS, S

MM

LRR

d, R

n, R

m, R

aSi

gned

Mos

t sig

nific

ant w

ord

Mul

tiply

Sub

trac

t

SMM

UL,

SM

MU

LR{R

d,} R

n, R

mSi

gned

Mos

t sig

nific

ant w

ord

Mul

tiply

SMU

AD

{Rd,

} Rn,

Rm

Sign

ed d

ual M

ultip

ly A

ddQ

SMU

LBB,

SM

ULB

T S

MU

LTB,

SM

ULT

T{R

d,} R

n, R

mSi

gned

Mul

tiply

(hal

fwor

ds)

SMU

LLR

dLo,

RdH

i, R

n, R

mSi

gned

Mul

tiply

(32

x 32

), 64

-bit

resu

lt

SMU

LWB,

SM

ULW

T{R

d,} R

n, R

mSi

gned

Mul

tiply

wor

d by

hal

fwor

d

SMU

SD, S

MU

SDX

{Rd,

} Rn,

Rm

Sign

ed d

ual M

ultip

ly S

ubtr

act

SSAT

Rd,

#n,

Rm

{,shi

ft #s

}Si

gned

Sat

urat

eQ

SSAT

16R

d, #

n, R

mSi

gned

Sat

urat

e 16

Q

SSA

X{R

d,} R

n, R

mSi

gned

Sub

trac

t and

Add

with

Exc

hang

eG

E

SSU

B16

{Rd,

} Rn,

Rm

Sign

ed S

ubtr

act 1

6

SSU

B8{R

d,} R

n, R

mSi

gned

Sub

trac

t 8

3–26 ECE 5655/4655 Real-Time DSP

Page 27: Cortex-M4 Chapter Architecture and ASM Programmingmwickert/ece5655/lecture_notes/ARM/ece5655_c… · Cortex-M4 Architecture and ASM Programming Introduction In this chapter programming

Cortex-M4 Instruction Set Tables

Cortex-M4 Instruction Set Tables (cont.)

Mne

mon

icO

pera

nds

Bri

ef d

escr

ipti

onFl

ags

STM

Rn{

!}, r

eglis

tSt

ore

Mul

tiple

reg

iste

rs, i

ncre

men

t afte

r

STM

DB,

ST

MEA

Rn{

!}, r

eglis

tSt

ore

Mul

tiple

reg

iste

rs, d

ecre

men

t bef

ore

STM

FD, S

TM

IAR

n{!}

, reg

list

Stor

e M

ultip

le r

egis

ters

, inc

rem

ent a

fter

STR

Rt,

[Rn,

#of

fset

]St

ore

Reg

iste

r w

ord

STR

B, S

TR

BTR

t, [R

n, #

offs

et]

Stor

e R

egis

ter

byte

STR

DR

t, R

t2, [

Rn,

#of

fset

]St

ore

Reg

iste

r tw

o w

ords

STR

EXR

d, R

t, [R

n, #

offs

et]

Stor

e R

egis

ter

Excl

usiv

e

STR

EXB

Rd,

Rt,

[Rn]

Stor

e R

egis

ter

Excl

usiv

e By

te

STR

EXH

Rd,

Rt,

[Rn]

Stor

e R

egis

ter

Excl

usiv

e H

alfw

ord

STR

H, S

TR

HT

Rt,

[Rn,

#of

fset

]St

ore

Reg

iste

r H

alfw

ord

STRT

Rt,

[Rn,

#of

fset

]St

ore

Reg

iste

r w

ord

SUB,

SU

BS{R

d,} R

n, O

p2Su

btra

ctN

,Z,C

,V

ECE 5655/4655 Real-Time DSP 3–27

Page 28: Cortex-M4 Chapter Architecture and ASM Programmingmwickert/ece5655/lecture_notes/ARM/ece5655_c… · Cortex-M4 Architecture and ASM Programming Introduction In this chapter programming

Chapter 3 • Cortex-M4 Architecture and ASM Programming

Cortex-M4 Instruction Set Tables (cont.)

Mne

mon

icO

pera

nds

Bri

ef d

escr

ipti

onFl

ags

SUB,

SU

BW{R

d,} R

n, #

imm

12Su

btra

ctN

,Z,C

,V

SVC

#im

mSu

perv

isor

Cal

l

SXTA

B{R

d,}

Rn,

Rm

,{,RO

R #

}Ex

tend

8 b

its t

o 32

and

add

SXTA

B16

{Rd,

} R

n, R

m,{,

ROR

#}

Dua

l ext

end

8 bi

ts t

o 16

and

add

SXTA

H{R

d,}

Rn,

Rm

,{,RO

R #

}Ex

tend

16

bits

to 3

2 an

d ad

d

SXT

B16

{Rd,

} Rm

{,RO

R #

n}Si

gned

Ext

end

Byte

16

SXT

B{R

d,} R

m{,R

OR

#n}

Sign

ext

end

a by

te

SXT

H{R

d,} R

m{,R

OR

#n}

Sign

ext

end

a ha

lfwor

d

TBB

[Rn,

Rm

]Ta

ble

Bran

ch B

yte

TBH

[Rn,

Rm

, LSL

#1]

Tabl

e Br

anch

Hal

fwor

d

TEQ

Rn,

Op2

Test

Equ

ival

ence

N,Z

,C

TST

Rn,

Op2

Test

N,Z

,C

3–28 ECE 5655/4655 Real-Time DSP

Page 29: Cortex-M4 Chapter Architecture and ASM Programmingmwickert/ece5655/lecture_notes/ARM/ece5655_c… · Cortex-M4 Architecture and ASM Programming Introduction In this chapter programming

Cortex-M4 Instruction Set Tables

Cortex-M4 Instruction Set Tables (cont.)

Mne

mon

icO

pera

nds

Bri

ef d

escr

ipti

onFl

ags

UA

DD

16{R

d,} R

n, R

mU

nsig

ned

Add

16

GE

UA

DD

8{R

d,} R

n, R

mU

nsig

ned

Add

8G

E

USA

X{R

d,}

Rn,

Rm

Uns

igne

d Su

btra

ct a

nd A

dd w

ith E

xcha

nge

GE

UH

AD

D16

{Rd,

} Rn,

Rm

Uns

igne

d H

alvi

ng A

dd 1

6

UH

AD

D8

{Rd,

} Rn,

Rm

Uns

igne

d H

alvi

ng A

dd 8

UH

ASX

{Rd,

} Rn,

Rm

Uns

igne

d H

alvi

ng A

dd a

nd S

ubtr

act w

ith E

xcha

nge

UH

SAX

{Rd,

} Rn,

Rm

Uns

igne

d H

alvi

ng S

ubtr

act a

nd A

dd w

ith E

xcha

nge

UH

SUB1

6{R

d,} R

n, R

mU

nsig

ned

Hal

ving

Sub

trac

t 16

UH

SUB8

{Rd,

} Rn,

Rm

Uns

igne

d H

alvi

ng S

ubtr

act 8

UBF

XR

d, R

n, #

lsb,

#w

idth

Uns

igne

d Bi

t Fie

ld E

xtra

ct

UD

IV{R

d,} R

n, R

mU

nsig

ned

Div

ide

UM

AA

LR

dLo,

RdH

i, R

n, R

mU

nsig

ned

Mul

tiply

Acc

umul

ate

Acc

umul

ate

Long

(32

x 32

+

32 +

32),

64-b

it re

sult

ECE 5655/4655 Real-Time DSP 3–29

Page 30: Cortex-M4 Chapter Architecture and ASM Programmingmwickert/ece5655/lecture_notes/ARM/ece5655_c… · Cortex-M4 Architecture and ASM Programming Introduction In this chapter programming

Chapter 3 • Cortex-M4 Architecture and ASM Programming

Cortex-M4 Instruction Set Tables (cont.)

Mne

mon

icO

pera

nds

Bri

ef d

escr

ipti

onFl

ags

UM

LAL

RdL

o, R

dHi,

Rn,

Rm

Uns

igne

d M

ultip

ly w

ith A

ccum

ulat

e (3

2 x

32 +

64)

, 64-

bit

resu

lt

UM

ULL

RdL

o, R

dHi,

Rn,

Rm

Uns

igne

d M

ultip

ly (3

2 x

32),

64-b

it re

sult

UQ

AD

D16

{Rd,

} Rn,

Rm

Uns

igne

d Sa

tura

ting A

dd 1

6

UQ

AD

D8

{Rd,

} Rn,

Rm

Uns

igne

d Sa

tura

ting A

dd 8

UQ

ASX

{Rd,

} Rn,

Rm

Uns

igne

d Sa

tura

ting A

dd a

nd S

ubtr

act w

ith E

xcha

nge

UQ

SAX

{Rd,

} R

n, R

mU

nsig

ned

Satu

ratin

g Su

btra

ct a

nd A

dd w

ith E

xcha

nge

UQ

SUB1

6{R

d,} R

n, R

mU

nsig

ned

Satu

ratin

g Su

btra

ct 1

6

UQ

SUB8

{Rd,

} Rn,

Rm

Uns

igne

d Sa

tura

ting

Subt

ract

8

USA

D8

{Rd,

} Rn,

Rm

Uns

igne

d Su

m o

f Abs

olut

e D

iffer

ence

s

USA

DA

8{R

d,}

Rn,

Rm

, Ra

Uns

igne

d Su

m o

f Abs

olut

e D

iffer

ence

s an

d A

ccum

ulat

e

USA

TR

d, #

n, R

m{,s

hift

#s}

Uns

igne

d Sa

tura

teQ

USA

T16

Rd,

#n,

Rm

Uns

igne

d Sa

tura

te 1

6Q

3–30 ECE 5655/4655 Real-Time DSP

Page 31: Cortex-M4 Chapter Architecture and ASM Programmingmwickert/ece5655/lecture_notes/ARM/ece5655_c… · Cortex-M4 Architecture and ASM Programming Introduction In this chapter programming

Cortex-M4 Instruction Set Tables

Cortex-M4 Instruction Set Tables (cont.)

Mne

mon

icO

pera

nds

Bri

ef d

escr

ipti

onFl

ags

UA

SX{R

d,} R

n, R

mU

nsig

ned

Add

and

Sub

trac

t with

Exc

hang

eG

E

USU

B16

{Rd,

} Rn,

Rm

Uns

igne

d Su

btra

ct 1

6G

E

USU

B8{R

d,} R

n, R

mU

nsig

ned

Subt

ract

8G

E

UX

TAB

{Rd,

} Rn,

Rm

,{,RO

R #

}R

otat

e, e

xten

d 8

bits

to 3

2 an

d A

dd

UX

TAB1

6{R

d,} R

n, R

m,{,

ROR

#}

Rot

ate,

dua

l ext

end

8 bi

ts t

o 16

and

Add

UX

TAH

{Rd,

} Rn,

Rm

,{,RO

R #

}R

otat

e, u

nsig

ned

exte

nd a

nd A

dd H

alfw

ord

UX

TB

{Rd,

} Rm

{,RO

R #

n}Z

ero

exte

nd a

Byt

e

UX

TB1

6{R

d,} R

m {,

ROR

#n}

Uns

igne

d Ex

tend

Byt

e 16

UX

TH

{Rd,

} Rm

{,RO

R #

n}Z

ero

exte

nd a

Hal

fwor

d

VABS

.F32

Sd, S

mFl

oatin

g-po

int A

bsol

ute

VAD

D.F

32{S

d,} S

n, S

mFl

oatin

g-po

int A

dd

VC

MP.F

32Sd

, <Sm

| #0

.0>

Com

pare

two

float

ing-

poin

t reg

iste

rs, o

r on

e flo

atin

g-po

int r

egis

ter

and

zero

FPSC

R

ECE 5655/4655 Real-Time DSP 3–31

Page 32: Cortex-M4 Chapter Architecture and ASM Programmingmwickert/ece5655/lecture_notes/ARM/ece5655_c… · Cortex-M4 Architecture and ASM Programming Introduction In this chapter programming

Chapter 3 • Cortex-M4 Architecture and ASM Programming

Cortex-M4 Instruction Set Tables (cont.)

Mne

mon

icO

pera

nds

Bri

ef d

escr

ipti

onFl

ags

VC

MPE

.F32

Sd, <

Sm |

#0.0

>C

ompa

re tw

o flo

atin

g-po

int r

egis

ters

, or

one

float

ing-

poin

t reg

iste

r an

d ze

ro w

ith In

valid

Ope

ratio

n ch

eck

FPSC

R

VC

VT.

S32.

F32

Sd, S

mC

onve

rt b

etw

een

float

ing-

poin

t and

inte

ger

VC

VT.

S16.

F32

Sd, S

d, #

fbits

Con

vert

bet

wee

n flo

atin

g-po

int a

nd fi

xed

poin

t

VC

VT

R.S

32.F

32Sd

, Sm

Con

vert

bet

wee

n flo

atin

g-po

int a

nd in

tege

r w

ith

roun

ding

VC

VT

<B|H

>.F3

2.F1

6Sd

, Sm

Con

vert

s ha

lf-pr

ecis

ion

valu

e to

sin

gle-

prec

isio

n

VC

VT

T<B

|T>.

F32.

F16

Sd, S

mC

onve

rts

sing

le-p

reci

sion

reg

iste

r to

hal

f-pre

cisi

on

VD

IV.F

32{S

d,} S

n, S

mFl

oatin

g-po

int D

ivid

e

VFM

A.F

32{S

d,} S

n, S

mFl

oatin

g-po

int F

used

Mul

tiply

Acc

umul

ate

VFN

MA

.F32

{Sd,

} Sn,

Sm

Floa

ting-

poin

t Fus

ed N

egat

e M

ultip

ly A

ccum

ulat

e

VFM

S.F3

2{S

d,} S

n, S

mFl

oatin

g-po

int F

used

Mul

tiply

Sub

trac

t

VFN

MS.

F32

{Sd,

} Sn,

Sm

Floa

ting-

poin

t Fus

ed N

egat

e M

ultip

ly S

ubtr

act

VLD

M.F

<32|

64>

Rn{

!}, li

stLo

ad M

ultip

le e

xten

sion

reg

iste

rs

3–32 ECE 5655/4655 Real-Time DSP

Page 33: Cortex-M4 Chapter Architecture and ASM Programmingmwickert/ece5655/lecture_notes/ARM/ece5655_c… · Cortex-M4 Architecture and ASM Programming Introduction In this chapter programming

Cortex-M4 Instruction Set Tables

Cortex-M4 Instruction Set Tables (cont.)

Mne

mon

icO

pera

nds

Bri

ef d

escr

ipti

onFl

ags

VLD

R.F

<32|

64>

<Dd|

Sd>,

[Rn]

Load

an

exte

nsio

n re

gist

er fr

om m

emor

y

VLM

A.F

32{S

d,} S

n, S

mFl

oatin

g-po

int M

ultip

ly A

ccum

ulat

e

VLM

S.F3

2{S

d,} S

n, S

mFl

oatin

g-po

int M

ultip

ly S

ubtr

act

VM

OV.

F32

Sd, #

imm

Floa

ting-

poin

t Mov

e im

med

iate

VM

OV

Sd, S

mFl

oatin

g-po

int M

ove

regi

ster

VM

OV

Sn, R

tC

opy A

RM

cor

e re

gist

er t

o si

ngle

pre

cisi

on

VM

OV

Sm, S

m1,

Rt,

Rt2

Cop

y 2

AR

M c

ore

regi

ster

s to

2 s

ingl

e pr

ecis

ion

VM

OV

Dd[

x], R

tC

opy A

RM

cor

e re

gist

er t

o sc

alar

VM

OV

Rt,

Dn[

x]C

opy

scal

ar t

o A

RM

cor

e re

gist

er

VM

RS

Rt,

FPSC

RM

ove

FPSC

R to

AR

M c

ore

regi

ster

or A

PSR

N,Z

,C,V

VM

SRFP

SCR

, Rt

Mov

e to

FPS

CR

from

AR

M C

ore

regi

ster

FPSC

R

VM

UL.

F32

{Sd,

} Sn,

Sm

Floa

ting-

poin

t Mul

tiply

ECE 5655/4655 Real-Time DSP 3–33

Page 34: Cortex-M4 Chapter Architecture and ASM Programmingmwickert/ece5655/lecture_notes/ARM/ece5655_c… · Cortex-M4 Architecture and ASM Programming Introduction In this chapter programming

Chapter 3 • Cortex-M4 Architecture and ASM Programming

Cortex-M4 Instruction Set Tables (cont.)

Mne

mon

icO

pera

nds

Bri

ef d

escr

ipti

onFl

ags

VN

EG.F

32Sd

, Sm

Floa

ting-

poin

t Neg

ate

VN

MLA

.F32

Sd, S

n, S

mFl

oatin

g-po

int M

ultip

ly a

nd A

dd

VN

MLS

.F32

Sd, S

n, S

mFl

oatin

g-po

int M

ultip

ly a

nd S

ubtr

act

VN

MU

L{S

d,} S

n, S

mFl

oatin

g-po

int M

ultip

ly

VPO

Plis

tPo

p ex

tens

ion

regi

ster

s

VPU

SHlis

tPu

sh e

xten

sion

reg

iste

rs

VSQ

RT.F

32Sd

, Sm

Cal

cula

tes

float

ing-

poin

t Squ

are

Roo

t

VST

MR

n{!}

, list

Floa

ting-

poin

t reg

iste

r St

ore

Mul

tiple

VST

R.F

<32|

64>

Sd, [

Rn]

Stor

es a

n ex

tens

ion

regi

ster

to

mem

ory

VSU

B.F<

32|6

4>{S

d,} S

n, S

mFl

oatin

g-po

int S

ubtr

act

WFE

Wai

t Fo

r Ev

ent

WFI

Wai

t Fo

r In

terr

upt

No

te: f

ull e

xpla

natio

n o

f eac

h in

stru

ctio

n c

an b

e fo

und

in C

orte

x-M

4 D

evic

es’ G

ene

ric U

ser

Gui

de (

Re

f-4

)

3–34 ECE 5655/4655 Real-Time DSP

Page 35: Cortex-M4 Chapter Architecture and ASM Programmingmwickert/ece5655/lecture_notes/ARM/ece5655_c… · Cortex-M4 Architecture and ASM Programming Introduction In this chapter programming

Cortex-M4 Instruction Set Tables

Cortex-M4 Instruction Set Tables (cont.)• Cortex-M4 Suffix

– Some instructions can be followed by suffixes to updateprocessor flags or execute the instruction on a certain con-dition

Suffi

x D

escr

ipti

onE

xam

ple

Exa

mpl

e ex

plan

atio

n

SU

pdat

e A

PSR

(flag

s)A

DD

SR

1,

#0x2

1A

dd0x

21 t

o R

1 an

d up

date

APS

R

EQ, N

E,C

S, C

C, M

I, PL

, VS,

VC, H

I, LS

, G

E, L

T, G

T, LE

Con

ditio

nex

ecut

ion

e.g.

EQ=

equa

l, N

E= n

ot e

qual

, LT=

less

tha

nBN

Ela

bel

Bran

ch t

o th

e la

bel i

f not

equ

al

ECE 5655/4655 Real-Time DSP 3–35

Page 36: Cortex-M4 Chapter Architecture and ASM Programmingmwickert/ece5655/lecture_notes/ARM/ece5655_c… · Cortex-M4 Architecture and ASM Programming Introduction In this chapter programming

Chapter 3 • Cortex-M4 Architecture and ASM Programming

C Calling Assembly

For real-time DSP applications the most common scenarioinvolving assembly code writing, if needed at all, will be C call-ing assembly. In simple terms the rules are:

• Formally, the ARM Architecture Procedure Call Standard(AAPCS) defines:

– Which registers must be saved and restored

– How to call procedures

– How to return from procedures

3–36 ECE 5655/4655 Real-Time DSP

Page 37: Cortex-M4 Chapter Architecture and ASM Programmingmwickert/ece5655/lecture_notes/ARM/ece5655_c… · Cortex-M4 Architecture and ASM Programming Introduction In this chapter programming

C Calling Assembly

AAPCS Register Use Conventions

• Make it easier to create modular, isolated and integrated code

• Scratch registers are not expected to be preserved uponreturning from a called subroutine

– This applies to r0–r3

• Preserved (“variable”) registers are expected to have theiroriginal values upon returning from a called subroutine

– This applies to r4–r8, r10–r11

– Use PUSH {r4,..} and POP {r4,...}

ECE 5655/4655 Real-Time DSP 3–37

Page 38: Cortex-M4 Chapter Architecture and ASM Programmingmwickert/ece5655/lecture_notes/ARM/ece5655_c… · Cortex-M4 Architecture and ASM Programming Introduction In this chapter programming

Chapter 3 • Cortex-M4 Architecture and ASM Programming

AAPCS Core Register Use

Reg

iste

rSy

nony

mSp

ecia

lR

ole

in t

he p

roce

dure

cal

l sta

ndar

dr1

5PC

The

Pro

gram

Cou

nter

.r1

4LR

The

Lin

k R

egis

ter.

r13

SPT

he S

tack

Poi

nter

.r1

2IP

The

Intr

a-Pr

oced

ure-

call

scra

tch

regi

ster

.r1

1v8

Vari

able

-reg

ister

8.

r10

v7Va

riab

le-r

egist

er 7

.

r9v6

,SB,

TR

Plat

form

reg

iste

r. T

he m

eani

ng o

f thi

s re

gist

er is

def

ined

by

the

pla

tform

sta

ndar

d.r8

v5Va

riab

le-r

egist

er 5

.r7

v4Va

riab

le r

egis

ter

4.r6

v3Va

riab

le r

egis

ter

3.r5

v2Va

riab

le r

egis

ter

2.r4

v1Va

riab

le r

egis

ter

1.r3

a4A

rgum

ent

/ scr

atch

reg

iste

r 4.

r2a3

Arg

umen

t / s

crat

ch r

egis

ter

3.r1

a2A

rgum

ent

/ res

ult

/ scr

atch

reg

iste

r 2.

r0a1

Arg

umen

t / r

esul

t / s

crat

ch r

egis

ter

1.

Mus

t be

sav

ed, r

esto

red

by

calle

e-pr

oced

ure

it m

ay m

odify

th

em.

Cal

ling

subr

outi

ne e

xpec

ts

thes

e to

ret

ain

thei

r va

lue.

Mus

t be

sav

ed, r

esto

red

by

calle

e-pr

oced

ure

it m

ay m

odify

th

em.

Don

’t n

eed

to b

e sa

ved.

May

be

used

for

argu

men

ts, r

esul

ts, o

r te

mpo

rary

val

ues.

3–38 ECE 5655/4655 Real-Time DSP

Page 39: Cortex-M4 Chapter Architecture and ASM Programmingmwickert/ece5655/lecture_notes/ARM/ece5655_c… · Cortex-M4 Architecture and ASM Programming Introduction In this chapter programming

Example: Vector Norm Squared

Example: Vector Norm Squared

In this example we will be computing the squared length of avector using 16-bit (int16_t) signed numbers. In mathematicalterms we are finding

(3.1)

where

(3.2)

is an -dimensional vector (column or row vector).

• The solution will be obtained in two different ways:

– Conventional C programming

– Cortex-M assembly

• Optimization is not a concern at this point

• The focus here is to see by way of a simple example, how tocall a C routine from C (obvious), and how to call an assem-bly routine from C

A2

An2

n 1=

N

=

A A1 AN=

N

ECE 5655/4655 Real-Time DSP 3–39

Page 40: Cortex-M4 Chapter Architecture and ASM Programmingmwickert/ece5655/lecture_notes/ARM/ece5655_c… · Cortex-M4 Architecture and ASM Programming Introduction In this chapter programming

Chapter 3 • Cortex-M4 Architecture and ASM Programming

C Version

• We implement this simple routine in C using a declared vec-tor length N and vector contents in the array v

• The C source, which includes the called functionnorm_sq_c is given below:

/******************************************************

Vector norm-squared routine in C and Assembly

******************************************************/

#include "fm4_wm8731_init.h"#include "FM4_slider_interface.h"

// Macros from fm4_wm8731_init.c need to configure GPIO without starting // codec ISRs. PFR is port function setting register - 0 for GPIO, 1 for // peripheral function#define GET_PFR(pin_ofs) ((volatile unsigned char*) (PFR_BASE + pin_ofs))// PCR is port pull-up setting register - 0 for pull-up, 1 for no pull-up#define GET_PCR(pin_ofs) ((volatile unsigned char*) (PCR_BASE + pin_ofs))// PDDR is port direction setting register - 0 for GPIO in, 1 for GPIO out#define GET_DDR(pin_ofs) ((volatile unsigned char*) (DDR_BASE + pin_ofs))

// Create (instantiate) GUI slider data structurestruct FM4_slider_struct FM4_GUI;

// Norm squared in C prototypeint16_t norm_sq_c(int16_t* v, int16_t n);// ASM function prototypesextern uint32_t simple_sqrt(uint32_t x);extern int16_t norm_sq_asm(int16_t *x, int16_t n);

/*---------------------------------------------------------------------------- MAIN function *----------------------------------------------------------------------------*/int main(void){

int16_t x = 0;int16_t v[5] = {1,2,3,6,7};uint32_t zInt = 99;uint32_t zInt_sq;char message[50];

3–40 ECE 5655/4655 Real-Time DSP

Page 41: Cortex-M4 Chapter Architecture and ASM Programmingmwickert/ece5655/lecture_notes/ARM/ece5655_c… · Cortex-M4 Architecture and ASM Programming Introduction In this chapter programming

Example: Vector Norm Squared

// Set up DIAGNOSTIC_PIN GPIO for timing of function calls// Follow approach used in fm4_wm8731_init(), but without starting ISR

bFM4_GPIO_ADE_AN00 = 0x00; // P10 DIAGNOSTIC_PIN *GET_PFR(DIAGNOSTIC_PIN) &= ~0u; // set pin function as GPIO *GET_DDR(DIAGNOSTIC_PIN) = 1u ; // set pin direction as output *GET_PCR(DIAGNOSTIC_PIN) &= ~0u; // set pin to have pull-up

// Initialize the slider interface by setting the baud rate (460800 or // 921600) and initial float values for each of the 6 slider parameters

init_slider_interface(&FM4_GUI,460800, 1.0, 1.0, 0.0, 0.0, 0.0, 0.0);

// Norm squared experimentgpio_set(DIAGNOSTIC_PIN,HIGH); //pin = P10/A3x = norm_sq_c (v, 5);// call c functiongpio_set(DIAGNOSTIC_PIN,LOW);sprintf(message, "Norm squared C: The answer is %d\n", x);write_uart0(message);gpio_set(DIAGNOSTIC_PIN,HIGH); //pin = P10/A3x = norm_sq_asm (v, 5);// call assembly functiongpio_set(DIAGNOSTIC_PIN,LOW);sprintf(message, "Norm squared ASM: The answer is %d\n", x);write_uart0(message);

// uint16_t Square root experimentgpio_set( DIAGNOSTIC_PIN,HIGH); //pin = P10/A3zInt_sq = simple_sqrt(zInt);gpio_set(DIAGNOSTIC_PIN,LOW);sprintf(message, "uint16_t SQRT of %d is %d\n", zInt, zInt_sq);write_uart0(message);

while(1){// Update slider parameters//update_slider_parameters(&FM4_GUI);

}}

int16_t norm_sq_c(int16_t* v, int16_t n){

int16_t i;int16_t out = 0;for(i=0; i<n; i++){out += v[i]*v[i];

}return out;

ECE 5655/4655 Real-Time DSP 3–41

Page 42: Cortex-M4 Chapter Architecture and ASM Programmingmwickert/ece5655/lecture_notes/ARM/ece5655_c… · Cortex-M4 Architecture and ASM Programming Introduction In this chapter programming

Chapter 3 • Cortex-M4 Architecture and ASM Programming

}

• Notice in this code we have configured three additional out-put pins for physical code timing

– P1B = D0 is used to time norm_sq_c

– P1C = D1 is used to time norm_sq_asm

• The expected answer is

• Physical code time and cycle count timing comparison withthe assembly version, is come up next

Assembly Version

• The assembly routine is the following:; File demo_asm.s

PRESERVE8 ; Preserve 8 byte stack alignment THUMB ; indicate THUMB code is used

AREA |.text|, CODE, READONLY;Start of the CODE areaEXPORT norm_sq_asm

norm_sq_asm FUNCTION; Input array address: R0; Number of elements: R1MOVS R2, R0 ; move the address in R0 to R2MOVS R0, #0 ; initialize the result

sum_loopLDRSH R3, [R2],#0x2; load int16_t value pointed to

; by R2 into R3, then incrementMLA R0, R3, R3, R0; sq & accum in one step (faster)SUBS R1, R1, #1; R1 = R1 - 1, decrement the countCMP R1, #0 ; compare to 0 and set Z registerBNE sum_loop; branch if compare not zeroBX LR ; return R0ENDFUNCEND ; End of file

1 4 9 36 49+ + + + 99=From Terminal

3–42 ECE 5655/4655 Real-Time DSP

Page 43: Cortex-M4 Chapter Architecture and ASM Programmingmwickert/ece5655/lecture_notes/ARM/ece5655_c… · Cortex-M4 Architecture and ASM Programming Introduction In this chapter programming

Example: Vector Norm Squared

• From just the C source it is not obvious that the function pro-totype for norm_asm is actually an assembly routine

• The answer is again 99

Performance Comparison

• In the Keil IDE debugger we set break points around thefunction to be timed:

• We take measurements using the timers available in the lowerright status bar; alternatively count cycles using the states

From Terminal

ECE 5655/4655 Real-Time DSP 3–43

Page 44: Cortex-M4 Chapter Architecture and ASM Programmingmwickert/ece5655/lecture_notes/ARM/ece5655_c… · Cortex-M4 Architecture and ASM Programming Introduction In this chapter programming

Chapter 3 • Cortex-M4 Architecture and ASM Programming

register:

• To interpret the cycle count as real time, consider the FM4running with a 200 MHz clock frequency or 5 ns clock period

• As a cross check physical timing is explored using the Ana-

norm_sq_c with -O0 norm_sq_c with -O3 norm_sq_asm

time = 380 ns time = 340 ns time = 220 ns

Reset when you arrive at thefunction call, then step overand read the elasped time

When debugging observe the registersand in particular note the change in statesbefore and after the function call is made

right-click to view

3–44 ECE 5655/4655 Real-Time DSP

Page 45: Cortex-M4 Chapter Architecture and ASM Programmingmwickert/ece5655/lecture_notes/ARM/ece5655_c… · Cortex-M4 Architecture and ASM Programming Introduction In this chapter programming

Example: Vector Norm Squared

log Discovery logic analyzer:

• The physical time results are more consistent, but a bias isintroduced since time is required to set and reset the GPIOpin

• The ASM code is timed from Keil by setting the break pointsaround the GPIO pin set functions:

• The above result is much longer than the physically measuredresult; conclude that removing the GPIO bias is not obvious

C version (-O3)

C calling ASM version

470 ns vs 350 ns implies a speedup of ~25.5%

Time is 420 ns including pin set and reset, vs 220 ns without —> ~200 ns due to pins

ECE 5655/4655 Real-Time DSP 3–45

Page 46: Cortex-M4 Chapter Architecture and ASM Programmingmwickert/ece5655/lecture_notes/ARM/ece5655_c… · Cortex-M4 Architecture and ASM Programming Introduction In this chapter programming

Chapter 3 • Cortex-M4 Architecture and ASM Programming

Example: Unsigned Integer Square Root1

• Added to main // uint16_t Square root experiment gpio_set(PF7, HIGH); //Pin PF7 = D2 zInt_sq = simple_sqrt(zInt); gpio_set(PF7, LOW); //Pin PF7 = D2 sprintf(message, "uint16_t SQRT of %d is %d\n", zInt, zInt_sq); write_uart0(message);...

• Added to asm ; in demo_asm.s

PRESERVE8 ; Preserve 8 byte stack alignment THUMB ; indicate THUMB code is used

AREA |.text|, CODE, READONLY; Start of the CODE areaEXPORT simple_sqrt

simple_sqrt FUNCTION; Input : R0; Output : R0 (square root result)MOVW R1, #0x8000 ; R1 = 0x00008000MOVS R2, #0 ; Initialize result

simple_sqrt_loopADDS R2, R2, R1 ; M = (M + N)MUL R3, R2, R2 ; R3 = M^2CMP R3, R0 ; If M^2 > InputIT HI ; Greater ThanSUBHI R2, R2, R1 ; M = (M - N)LSRS R1, R1, #1 ; N = N >> 1BNE simple_sqrt_loopMOV R0, R2 ; Copy to R0 and returnBX LR ; Return

ENDFUNC

1.Yiu Chapter 20, p. 664.

3–46 ECE 5655/4655 Real-Time DSP

Page 47: Cortex-M4 Chapter Architecture and ASM Programmingmwickert/ece5655/lecture_notes/ARM/ece5655_c… · Cortex-M4 Architecture and ASM Programming Introduction In this chapter programming

Example: Unsigned Integer Square Root

Function Flow Chart

Sample Results

• For an input of 64 the output is 8, as expected

Execution time = 0.60us (~600ns)

From terminal

ECE 5655/4655 Real-Time DSP 3–47

Page 48: Cortex-M4 Chapter Architecture and ASM Programmingmwickert/ece5655/lecture_notes/ARM/ece5655_c… · Cortex-M4 Architecture and ASM Programming Introduction In this chapter programming

Chapter 3 • Cortex-M4 Architecture and ASM Programming

• Physical is comparable at :

• For an input of 99 the output is 9 (81 is closest to 99), asexpected

Useful Resources

• Architecture Reference Manual:http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.ddi0403c/index.html

• Cortex-M4 Technical Reference Manual:http://infocenter.arm.com/help/topic/com.arm.doc.ddi0439d/DDI0439D_cortex_m4_proces-

sor_r0p1_trm.pdf

• Cortex-M4 Devices Generic User Guide:http://infocenter.arm.com/help/topic/com.arm.doc.dui0553a/DUI0553A_cortex_m4_dgug.pdf

710ns

From terminal

3–48 ECE 5655/4655 Real-Time DSP