the cortex-m chapter series: hardware and...
TRANSCRIPT
er
The Cortex-M Series: Hardware and SoftwareIntroduction
In this chapter the real-time DSP platform of primary focus forthe course, the Cortex M4, will be introduced and explained. interms of hardware, software, and development environments.Beginning topics include:
• ARM Architectures and Processors
– What is ARM Architecture
– ARM Processor Families
– ARM Cortex-M Series
– Cortex-M4 Processor
– ARM Processor vs. ARM Architectures
• ARM Cortex-M4 Processor
– Cortex-M4 Processor Overview
– Cortex-M4 Block Diagram
– Cortex-M4 Registers
Chapt
2
ECE 5655/4655 Real-Time DSP 2–1
Chapter 2 • The Cortex-M Series: Hardware and Software
What is ARM Architecture
• ARM architecture is a family of RISC-based processor archi-tectures
– Well-known for its power efficiency;
– Hence widely used in mobile devices, such as smart-phones and tablets
– Designed and licensed to a wide eco-system by ARM
• ARM Holdings
– The company designs ARM-based processors;
– Does not manufacture, but licenses designs to semiconduc-tor partners who add their own Intellectual Property (IP)on top of ARM’s IP, fabricate and sell to customers;
– Also offer other IP apart from processors, such as physicalIPs, interconnect IPs, graphics cores, and developmenttools
2–2 ECE 5655/4655 Real-Time DSP
ARM Processor Families
ARM Processor Families• Cortex-A series (Application)
– High performance processorscapable of full Operating Sys-tem (OS) support;
– Applications include smart-phones, digital TV, smartbooks, home gateways etc.
• Cortex-R series (Real-time)
– High performance for real-time applications;
– High reliability
– Applications include automotive braking system, power-trains etc.
• Cortex-M series (Microcontroller)
– Cost-sensitive solutions for deterministic microcontrollerapplications;
– Applications include microcontrollers, mixed signaldevices, smart sensors, automotive body electronics andairbags;
• SecurCore series
– High security applications.
• Previous classic processors: Include ARM7, ARM9, ARM11families
Cortex-A
Cortex-A57
Cortex-A53
Cortex-A15
Cortex-A9
Cortex-A8
Cortex-A7
Cortex-A5
Cortex-R7
Cortex-R5
Cortex-R4
Cortex-M4
Cortex-M3
Cortex-M1
Cortex-M0+
Cortex-M0
SC000
SC100
SC300
ARM11
ARM9
ARM7
Cortex-R
Cortex-M
SecurCore
Classic
As of Dec 2013
New!: Cortex-M7, Cortex-M33
New!: Cortex-M23 (no DSP)
As of Jan 2017
ECE 5655/4655 Real-Time DSP 2–3
Chapter 2 • The Cortex-M Series: Hardware and Software
ARM Families and Architecture Over Time1
1. J. Yiu, The Definitive Guide to ARM Cortex-M3 and Cortex-M4 Processors, 3rd edi-tion, Newnes 2014.
2–4 ECE 5655/4655 Real-Time DSP
Relative Performance
Relative Performance1
Design an ARM-based SoC
• Select a set of IP cores from ARM and/or other third-party IPvendors
• Integrate IP cores into a single chip design
• Give design to semiconductor foundries for chip fabrication
1. https://www.arm.com/products/processors/cortex-m
ARM-basedSoC
ROMARM
processorRAM
System bus
Peripherals
External Interface
SoC
SoC DesignLicensable IPs
IP libraries
Cortex-A9 Cortex-R5 Cortex-M4
ARM7 ARM9 ARM11
AXI bus AHB bus APB bus
GPIO I/O blocks Timer
DRAM ctrl FLASH ctrl SRAM ctrl
ECE 5655/4655 Real-Time DSP 2–5
Chapter 2 • The Cortex-M Series: Hardware and Software
ARM Cortex-M Series
• Cortex-M series: Cortex-M0, M0+, M1, M3, M4, M7, M33.
• Energy-efficiency
– Lower energy cost, longer battery life
• Smaller code
– Lower silicon costs
• Ease of use
– Faster software development and reuse
• Embedded applications
– Smart metering, human interface devices, automotive andindustrial control systems, white goods, consumer prod-ucts and medical instrumentation
ARM Processors vs. ARM Architectures
• ARM architecture
– Describes the details of instruction set, programmer’smodel, exception model, and memory map
– Documented in the Architecture Reference Manual
• ARM processor
– Developed using one of the ARM architectures
– More implementation details, such as timing information
2–6 ECE 5655/4655 Real-Time DSP
Companies Making ARM Chips
– Documented in processor’s Technical Reference Manual
Companies Making ARM Chips
• Apple, AppliedMicro, Microchip (Atmel), Broadcom,Cypress Semiconductor, Nvidia, , Sam-sung Electronics, ST Microelectronics, and Texas Instru-ments (http://en.wikipedia.org/wiki/ARM_architecture)
• More — Xilinx (Zynq), ...
ARMv4/v4T Architecture
ARMv5/ v4E Architecture
ARMv6 Architecture
ARMv7Architecture
ARM v6-Me.g. Cortex-M0, M1
e.g. ARM7TDMI e.g. ARM9926EJ-S e.g. ARM1136
ARMv8 ArchitectureARMv7-A
e.g. Cortex-A9
ARMv7-Re.g. Cortex-R4
ARMv7-Me.g. Cortex-M4
ARMv8-Ae.g. Cortex-A53
Cortex-A57
ARMv8-R
As of Dec 2013
NXP Qualcomm
ECE 5655/4655 Real-Time DSP 2–7
Chapter 2 • The Cortex-M Series: Hardware and Software
ARM Cortex-M Series FamilyA
RM
Arc
hite
ctur
eC
ore
Arc
hite
ctur
eT
hum
b®T
hum
b®-2
Har
dwar
eM
ulti
ply
Har
dwar
eD
ivid
eSa
tura
ted
Mat
hD
SPE
xten
sion
sFl
oati
ngP
oint
AR
Mv6
-MVo
n N
eum
ann
Mos
tSu
bset
1 or
32
cycl
eN
oN
oSo
ftw
are
No
AR
Mv6
-MVo
n N
eum
ann
Mos
tSu
bset
1 or
32
cycl
eN
oN
oSo
ftw
are
No
AR
Mv6
-MVo
n N
eum
ann
Mos
tSu
bset
3 or
33
cycl
eN
oN
oSo
ftw
are
No
AR
Mv7
-MH
arva
rdEn
tire
Entir
e1
cycl
eYe
sYe
sSo
ftw
are
No
AR
Mv7
E-M
Har
vard
Entir
eEn
tire
1 cy
cle
Yes
Yes
Har
dwar
eO
ptio
nal
2–8 ECE 5655/4655 Real-Time DSP
Cortex-M4 Processor Overview
Cortex-M4 Processor Overview
• Cortex-M4 Processor
– Introduced in 2010
– Designed with a large variety of highly efficient signal pro-cessing features
– Features extended single-cycle multiply accumulateinstructions, optimized SIMD arithmetic, saturating arith-metic and an optional Floating Point Unit.
• High Performance Efficiency
– 1.25 DMIPS/MHz (Dhrystone Million Instructions PerSecond / MHz) at the order of µWatts / MHz
• Low Power Consumption
– Longer battery life – especially critical in mobile products
• Enhanced Determinism
– The critical tasks and interrupt routines can be servedquickly in a known number of cycles
ECE 5655/4655 Real-Time DSP 2–9
Chapter 2 • The Cortex-M Series: Hardware and Software
Cortex-M4 Processor Features
• 32-bit Reduced Instruction Set Computing (RISC) processor
• Harvard architecture
– Separated data bus and instruction bus
• Instruction set
– Include the entire Thumb®-1 (16-bit) and Thumb®-2 (16/32-bit) instruction sets1
• 3-stage + branch speculation pipeline
• Performance efficiency
– 1.25 – 1.95 DMIPS/MHz (Dhrystone Million InstructionsPer Second / MHz)
1. J. Yiu, The Definitive Guide to ARM Cortex-M3 and Cortex-M4 Processors, 3rd edition,Newnes, 2014.
2–10 ECE 5655/4655 Real-Time DSP
Cortex-M4 Processor Features
• Supported Interrupts
– Non-maskable Interrupt (NMI) + 1 to 240 physical inter-rupts
– 8 to 256 interrupt priority levels
• Supports Sleep Modes
– Up to 240 Wake-up Interrupts
– Integrated WFI (Wait For Interrupt) and WFE (Wait ForEvent) Instructions and Sleep On Exit capability (to becovered in more detail later)
– Sleep & Deep Sleep Signals
– Optional Retention Mode with ARM Power ManagementKit
• Enhanced Instructions
– Hardware Divide (2-12 Cycles)
– Single-Cycle 16, 32-bit MAC, Single-cycle dual 16-bitMAC
– 8, 16-bit SIMD arithmetic
• Debug
– Optional JTAG & Serial-Wire Debug (SWD) Ports
– Up to 8 Breakpoints and 4 Watchpoints
• Memory Protection Unit (MPU)
– Optional 8 region MPU with sub regions and backgroundregion
ECE 5655/4655 Real-Time DSP 2–11
Chapter 2 • The Cortex-M Series: Hardware and Software
• Cortex-M4 processor is designed to meet the challenges oflow dynamic power constraints while retaining light foot-prints
– 180 nm ultra low power process –157 µW/MHz
– 90 nm low power process – 33 µW/MHz
– 40 nm G process – 8 µW/MHz
ARM Cortex-M4 Implementation Data
Process180ULL(7-track, typical 1.8v, 25C)
90LP(7-track, typical 1.2v, 25C)
40G9-track, typical 0.9v, 25C)
Dynamic Power 157 µW/MHz 33 µW/MHz 8 µW/MHz
Floorplanned Area 0.56 mm2 0.17 mm2 0.04 mm2
2–12 ECE 5655/4655 Real-Time DSP
Cortex-M4 Block Diagram
Cortex-M4 Block Diagram
Opt
iona
lW
IC
Ne
ste
d V
ect
or
Inte
rru
pt
Con
trol
ler
(NV
IC)
Op
tion
al
De
bu
gA
cce
ss P
ort
AR
M C
orte
x-M
4 M
icro
pro
cess
or
Opt
iona
l Mem
ory
pr
otec
tion
uni
t
Bu
s m
atr
ix
Co
de in
terf
ace
SR
AM
and
perip
her
al in
terf
ace
Opt
ion
al F
PU
Pro
cess
or c
ore
Op
tion
al
Em
be
dde
dT
race
Mac
roce
ll
Op
tion
al S
eria
lW
ire V
iew
er
Op
tiona
l F
lash
pa
tch
Op
tion
al
Da
taw
atc
hpo
ints
ECE 5655/4655 Real-Time DSP 2–13
Chapter 2 • The Cortex-M Series: Hardware and Software
Cortex-M4 Block Diagram (cont.)
• Processor core
– Contains internal registers, the ALU, data path, and somecontrol logic
– Registers include sixteen 32-bit registers for both generaland special usage
• Processor pipeline stages
– Three-stage pipeline: fetch, decode, and execution
– Some instructions may take multiple cycles to execute, inwhich case the pipeline will be stalled
– The pipeline will be flushed if a branch instruction is exe-cuted
– Up to two instructions can be fetched in one transfer (16-bit instructions)
• Nested Vectored Interrupt Controller (NVIC)
– Up to 240 interrupt request signals and a non-maskableinterrupt (NMI)
Fetch Decode Execute
Fetch Decode Execute
Fetch Decode Execute
Instruction 1
Instruction 2
Instruction 3
Fetch Decode ExecuteInstruction 4
Time
2–14 ECE 5655/4655 Real-Time DSP
Cortex-M4 Block Diagram (cont.)
– Automatically handles nested interrupts, such as compar-ing priorities between interrupt requests and the currentpriority level
• Wakeup Interrupt Controller (WIC)
– For low-power applications, the microcontroller can entersleep mode by shutting down most of the components.
– When an interrupt request is detected, the WIC can informthe power management unit to power up the system.
• Memory Protection Unit (optional)
– Used to protect memory content, e.g. make some memoryregions read-only or preventing user applications fromaccessing privileged application data
• Bus interconnect
– Allows data transfer to take place on different buses simul-taneously
– Provides data transfer management, e.g. a write buffer, bit-oriented operations (bit-band)
– May include bus bridges (e.g. AHB-to-APB bus bridge) toconnect different buses into a network using a singleglobal memory space
– Includes the internal bus system, the data path in the pro-cessor core, and the AHB LITE interface unit
ECE 5655/4655 Real-Time DSP 2–15
Chapter 2 • The Cortex-M Series: Hardware and Software
• Debug subsystem
– Handles debug control, program breakpoints, and datawatchpoints
– When a debug event occurs, it can put the processor core ina halted state, where developers can analyse the status ofthe processor at that point, such as register values and flags
Cortex-M4 Registers
• Processor registers
– The internal registers are used to store and process tempo-rary data within the processor core
– All registers are inside the processor core, hence they canbe accessed quickly
– Load-store architecture
– To process memory data, they have to be first loaded frommemory to registers, processed inside the processor coreusing register data only, and then written back to memoryif needed
• Cortex-M4 registers
– Register bank
– Sixteen 32-bit registers (thirteen are used for general-pur-pose);
– Special registers
2–16 ECE 5655/4655 Real-Time DSP
Cortex-M4 Registers (cont.)
Cortex-M4 Registers (cont.)
• R0 – R12: general purpose registers
– Low registers (R0 – R7) can be accessed by any instruction
– High registers (R8 – R12) sometimes cannot be accessede.g. by some Thumb (16-bit) instructions
• R13: Stack Pointer (SP)
– Records the current address of the stack
– Used for saving the context of a program while switchingbetween tasks
R0
R1
R2
R3
R4
R5
R6
R7
R8
R9
R10
R11
R12
R13(banked)
R14
R15
x PSR
Stack Pointer (SP)
Link Register (LR)
Program Counter (PC)
PRIMASK
CONTROL
Program Status Registers (PSR)
Interrupt mask register
Stack definition
Special registers
Register bank
MSP
PSP
Main Stack Pointer
Process Stack Pointer
APSR EPSR IPSR
ApplicationPSR
ExecutionPSR
InterruptPSR
LowRegisters
HighRegisters
General purposeregister
FAULTMASK
BASEPRI
ECE 5655/4655 Real-Time DSP 2–17
Chapter 2 • The Cortex-M Series: Hardware and Software
– Cortex-M4 has two SPs: Main SP, used in applications thatrequire privileged access e.g. OS kernel, and exceptionhandlers, and Process SP, used in base-level applicationcode (when not running an exception handler)
• Program Counter (PC)
– Records the address of the current instruction code
– Automatically incremented by 4 at each operation (for 32-bit instruction code), except branching operations
– A branching operation, such as function calls, will changethe PC to a specific address, meanwhile it saves the currentPC to the Link Register (LR)
Data Data
PC
SP
Address
Low
High
PUSH POP
Stack
Code
Heap
2–18 ECE 5655/4655 Real-Time DSP
Cortex-M4 Registers (cont.)
• R14: Link Register (LR)
– The LR is used to store the return address of a subroutineor a function call
– The program counter (PC) will load the value from LRafter a function is finished
• xPSR, combined Program Status Register
– Provides information about program execution and ALUflags
– Application PSR (APSR)
– Interrupt PSR (IPSR)
– Execution PSR (EPSR)
PC
LR
MainProgram
code
subroutineCurrent PC
Load PC with the address in LR to return to the main program
Current LR
Return from a subroutine to the main program
PC
LR
MainProgram
code
subroutine
Current PC
1. Save currentPC to LR
2. Load PC with the starting address of the subroutine
Call a subroutine
Code region
Code region
N Z C V
ISR number
Reserved
Reserved
ICI/IT ReservedT
Reserved ISR number
APSR
IPSR
EPSR
xPSR
bit0bit8bit16bit24bit31
Q
ICI/IT TN Z C V Q
ICI/IT
ICI/IT
ECE 5655/4655 Real-Time DSP 2–19
Chapter 2 • The Cortex-M Series: Hardware and Software
• APSR
– N: negative flag – set to one if the result from ALU is neg-ative
– Z: zero flag – set to one if the result from ALU is zero
– C: carry flag – set to one if an unsigned overflow occurs
– V: overflow flag – set to one if a signed overflow occurs
– Q: sticky saturation flag – set to one if saturation hasoccurred in saturating arithmetic instructions, or overflowhas occurred in certain multiply instructions
• IPSR
– ISR number – current executing interrupt service routinenumber
• EPSR
– T: Thumb state – always one since Cortex-M4 only sup-ports the Thumb state (more on processor states in the nextmodule)
– IC/IT: Interrupt-Continuable Instruction (ICI) bit, IF-THEN instruction status bit
• Interrupt mask registers
– 1-bit PRIMASK
– Set to one will block all the interrupts apart from nonmas-kable interrupt (NMI) and the hard fault exception
– 1-bit FAULTMASK
2–20 ECE 5655/4655 Real-Time DSP
Cortex-M4 Registers (cont.)
– Set to one will block all the interrupts apart from NMI
– 1-bit BASEPRI
– Set to one will block all interrupts of the same or lowerlevel (only allow for interrupts with higher priorities)
• CONTROL: special register
– 1-bit stack definition
– Set to one: use the process stack pointer (PSP)
– Clear to zero: use the main stack pointer (MSP)
Reserved
Reserved
BASEPRI
CONTROL
bit8bit16bit24bit31
Stack definition
ReservedFAULTMASK
ReservedPRIMASK
ECE 5655/4655 Real-Time DSP 2–21
Chapter 2 • The Cortex-M Series: Hardware and Software
Software Development Overview
• The software development flow1:
• At the file level the build process for Keil MDK is:
1. J. Yiu, The Definitive Guide to ARM Cortex-M3 and Cortex-M4 Processors, 3rd edition,Newnes, 2014.
2–22 ECE 5655/4655 Real-Time DSP
Software Development Overview
• When using the GNU tool chain compilation and linking aremerged1
• A debugging process that we will follow with the limitedcapability of the STM32F4 on-board emulator, is the use of aUART
– The level shifter is not required with the USB to serial rx/tx wire device found in the lab
1. J. Yiu, The Definitive Guide to ARM Cortex-M3 and Cortex-M4 Processors, 3rd edition,Newnes, 2014.
ECE 5655/4655 Real-Time DSP 2–23
Chapter 2 • The Cortex-M Series: Hardware and Software
• The preferred IDE for this first offering of the Cortex-M4version of the course, we will focus on the use of the ARMKeil integrated development environment
Keil Specific Debugging Examples
• In this section a few simple debugging examples are providedto help you get started with writing and debugging code
• The absolute starting point for working with hardware soft-ware IDE combination is Lab0 on the course Web Site
Setting Breakpoints
• First build the code in your project and launch the debugger
• The red d button is toggle for starting and stopping thedebugger
Toggle for debugging
First rebuildthe target
2–24 ECE 5655/4655 Real-Time DSP
Keil Specific Debugging Examples
• Set break points at code lines 54 and 55
• Now click the run button and verify that the debugger stops atline 54
• The State (cycle) counter and Sec (total session run timestopwatch) will advance as you move from the main entrypoint down to line 54
Following the debugger launch,the code is loaded and you arebrought to main
Clicking in the gray regionsallows you to set breakpoints
Starting debug from main
Run!
Before After
ECE 5655/4655 Real-Time DSP 2–25
Chapter 2 • The Cortex-M Series: Hardware and Software
Profiling or Code Timing
• You just saw the code run from main to the first breakpoint
• You also saw that CPU cycle count and clock stopwatch eachincrement
• Being able to count cycle and/or observe run time in secondsis very valuable in characterizing the performance of real-time code
• If you now clock the run button again, the debugger willadvance to line 55 and the cycle count and CPU time willadvance by some amount; don’t do it just yet
• The middle field of the lower right status bar of the Keil appcontains two resettable stop watches:
• Reset the t1 stop watch and now click run
• Stop watch now displays the elapsed time in running thedot_p function
• The States in the registers view has also incremented
Right-click to set/reset the time displayHover to just see the three displays
Context menuseen when youright-click
2–26 ECE 5655/4655 Real-Time DSP
Keil Specific Debugging Examples
• The debugger has knowledge of the CPU clock frequency, sothe time increment is a true estimate of the elapsed process-ing time to enter-compute-return from dot_p; NICE!
Using the Command Line
• By hovering over variables or looking at the Command Stack+ Locals view (lower right above the stop watch display), youcan look at variables that are within scope of where thedebugger has stopped
• You can expand array variables for example
1034000 to 1034127so 127 cycles elapsed
0.27 us at fclk = 168 MHz => ~45 clksWhy a difference?
ECE 5655/4655 Real-Time DSP 2–27
Chapter 2 • The Cortex-M Series: Hardware and Software
• The command line also allows you to type commands andobtain debug information
Setting the Compiler Optimization Level
• With the debugger stopped you can go to the Options for Tar-get button
•
command line entries
command lineresponse area
click hereto openOptions forTargetThen chooseC/C++
2–28 ECE 5655/4655 Real-Time DSP
Keil Specific Debugging Examples
Writing to GPIO for Real-Time Timing
• Writing to a GPIO for scope/logic analyzer timing measure-ments
ECE 5655/4655 Real-Time DSP 2–29
Chapter 2 • The Cortex-M Series: Hardware and Software
Useful Resources
• Architecture Reference Manual:http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.ddi0403c/index.html
• Cortex-M4 Technical Reference Manual:http://infocenter.arm.com/help/topic/com.arm.doc.ddi0439d/DDI0439D_cortex_m4_proces-sor_r0p1_trm.pdf
• Cortex-M4 Devices Generic User Guide:http://infocenter.arm.com/help/topic/com.arm.doc.dui0553a/DUI0553A_cortex_m4_dgug.pdf
• The Designer's Guide to the Cortex-M Processor Family: ATutorial Approach (Martin)
http://www.amazon.com/Designers-Guide-Cortex-M-Processor-Family/dp/0080982964/
ref=sr_1_4?ie=UTF8&qid=1423511335&sr=8-4&keywords=Cortex-M
• Embedded Systems: Real-Time Operating Systems for ArmCortex M Microcontrollers (Valvano)
http://www.amazon.com/Embedded-Systems-Real-Time-Operating-Microcontrollers/dp/1466468866/ref=sr_1_1?s=books&ie=UTF8&qid=1423511893&sr=1-1&keywords=Real-Time+Operating+Systems+for+Cortex-M
2–30 ECE 5655/4655 Real-Time DSP