cpu risc example: armv1 arm1 arm arm2 armv2 arm2 … · apple iphone (original and 3g), apple ipod...
TRANSCRIPT
Reti Logiche Università degli studi di Udine
CPU RISC example:
ARM
Reti Logiche Università degli studi di Udine
ARM
architectures
ARMv1, ARMv2, ARMv3
are obsolete
ARM7ARMv3
ARM700, ARM710, ARM710a
CPU families, architectures, and cores
ARM1ARMv1
ARM1 (April 1985 ; ~ 25K transistors)
ARM2ARMv2
ARM2 (1986 ; ~ 30K transistors)
ARMv2aARM250
ARM3ARMv2a
ARM3
ARM6ARMv3
ARM60, ARM600, ARM610
Reti Logiche Università degli studi di Udine
ARM
CPU families, architectures, and cores
ARM7TDMI
ARMv4T
ARM7TDMI, ARM710T, ARM720T, ARM740T
ARM7EJ
ARMv5TEJ
ARM7EJ-S
ARM8
ARMv4
ARM810
Reti Logiche Università degli studi di Udine
ARM
CPU families, architectures, and cores
StrongARM
ARMv4SA-1
ARM9TDMI
ARMv4TARM9TDMI, ARM920T, ARM922T, ARM940T
ARM9E
ARMv5TEARM946E-S, ARM966E-S, ARM968E-S, ARM996HS
ARMv5TEJARM926EJ-S
Reti Logiche Università degli studi di Udine
ARM
CPU families, architectures, and cores
ARM10E
ARMv5TE
ARM1020E, ARM1022E
ARMv5TEJ
ARM1026EJ-S
XScale
ARMv5TE
XScale
Reti Logiche Università degli studi di Udine
ARM
CPU families, architectures, and cores
ARM11
ARMv6
ARM1136J-S
ARMv6T2
ARM1156T2-S
ARMv6K
ARM1176JZ-S, ARM11 MPCore
Reti Logiche Università degli studi di Udine
ARM
CPU families, architectures, and cores
Cortex-A
ARMv7-A (Application profile)
Cortex-A5, Cortex-A8, Cortex-A9, Cortex-A15
Cortex-R
ARMv7-R (Real-time profile)
Cortex-R4, Cortex-R5, Cortex-R7
Cortex-M
ARMv7-M (Microcontroller profile)
Cortex-M0, Cortex-M1, Cortex-M3, Cortex-M4
Reti Logiche Università degli studi di Udine
ARM
CPU families, architectures, and cores
Apple A6
ARMv7-A
Apple A6
Qualcomm Snapdragon
ARMv7-A
Scorpion, Krait
Reti Logiche Università degli studi di Udine
ARM
CPU families, architectures, and cores
Cortex-A50
ARMv8-A
Cortex-A53, Cortex-A57
Apple A7
ARMv8-A
Apple A7
X-Gene
ARMv8-A
X-Gene
Reti Logiche Università degli studi di Udine
ARM
Examples of ARM cores applications
ARM1136J-SKindle DX [Freescale i.MX31]
Nokia phones (E63, E71, 5800, E51, 6700 Classic, 6120 Classic, 6210
Navigator, 6220 Classic, 6290, 6710 Navigator, 6720 Classic, E75, N97,
N81) [Freescale MXC300-30]
ARM1176JZ(F)-SApple iPhone (original and 3G), Apple iPod touch (1st and 2nd Generation)
Motorola RIZR Z8, Motorola RIZR Z10
Nintendo 3DS
Reti Logiche Università degli studi di Udine
ARM
Examples of ARM cores applications
Cortex-A8Apple iPhone (3GS and 4), Apple iPod touch (3rd and 4th Generation),
Apple iPad [cpu: Apple A4]
BeagleBoard
Motorola (Droid, Droid X, Droid 2, Droid R2D2 Edition)
Samsung (Omnia HD, Wave S8500, i9000 Galaxy S, P1000 Galaxy Tab)
Sony Ericsson (Satio, Xperia X10)
Nokia N900
Google Nexus S
Cortex-A9Apple iPad 2 [cpu: Apple A5], Apple iPhone 4GS [cpu: Apple A5]
LG Optimus 2X
Motorola (Atrix 4G, DROID BIONIC, Xoom)
PandaBoard
Reti Logiche Università degli studi di Udine
Architecture overview
� 32 bit architecture size
� Size of general purpose registers
� 7-9 processor modes
� 1 unprivileged mode
� 6-8 privileged modes
� 31-34 general-purpose registers
� Special registers
� Program status, memory management, processor configuration, ...
depends on
extensions
Reti Logiche Università degli studi di Udine
Instruction sets
� ARM
� 32-bit instructions
� Default instruction set
� Thumb
� 16 bit instructions
� High code density
� Reduced performance
� Thumb2
� Thumb extended with 32-bit instructions
� High code density
� Good performance
Thumb
decompressor
Thumb
decompressor
ARM
instruction
decoder
ARM
instruction
decoder
Reti Logiche Università degli studi di Udine
Extensions
Thumb
16-bit instructions
No conditional execution
Implicit operands
Only 8 registers available for many instructions
ARMv4T, ARMv5, ARMv6, ARMv7, ARMv8
Thumb-2
Extends Thumb instruction set32 bit instructions
16 bit instructions
ARMv6T2, ARMv7, ARMv8
Instruction set extensions
Reti Logiche Università degli studi di Udine
Extensions
Jazelle Extension
Java bytecode execution (Direct Bytecode Execution)Hardware execution (~95%)
Interpreted as a short sequence of ARM instructions
Emulated via SW
From ARMv6 is requiredbut can be implemented as trivial
ThumbEE Extension
Extension of the Thumb instruction setRequired for ARMv7-A (backward compatibility)
Optional for ARMv7-R
Suitable for JIT and AOT compilationJava, C#, Perl, Python
Meant as successor of Jazelle
Currently, usage is deprecated
Instruction set extensions
Reti Logiche Università degli studi di Udine
Extensions
VFP Extension
VFPv1 (obsolete), VFPv2, VFPv3, VFPv4
Optional
Floating point support and registersSingle-precision floating point
Double-precision floating point
Advanced SIMD Extension (NEON)
SIMDv1, SIMDv2
Optional
Additional SIMD instructions and registersInteger
Single-precision floating point
Instruction set extensions
Reti Logiche Università degli studi di Udine
Extensions
Fast Context Switch Extension (FCSE)
Modified address translationVA � MVA � PA
Depends on the process ID
Deprecated
Security Extension
2 Security states (Secure, Non-secure)
Additional processor mode: Monitor Mode (MON)
Restrictions on changes of execution state
Restrictions on memory accesses
ARMv6K, ARMv7-A
Optional
Architecture extensions
Reti Logiche Università degli studi di Udine
Extensions
Multiprocessing Extension
New instruction: PLDW
Changes on TLB and cache behavior
ARMv7-A, ARMv7-R, ARMv8
Optional
Large Physical Address Extension
Provides physical addresses up to 240
Requires the multiprocessing extension
3 levels of page tables
ARMv7, ARMv8
Optional
Architecture extensions
Reti Logiche Università degli studi di Udine
Extensions
Virtualization Extension
Modified MMU behavior
Additional instructions
Additional processor mode: Hyp
Requires the multiprocessing extension
Requires the large physical address extension
Requires the trivial Jazelle implementation
ARMv7-A, ARMv8
Optional
Architecture extensions
Reti Logiche Università degli studi di Udine
Extensions
Generic Timer Extension
System timer (with low-latency access)
ARMv7-A, ARMv7-R
Optional
Performance Monitor Extension
Special registers with event counters (implem. dependent)
ARMv7
Optional (recommended)
Architecture extensions
Reti Logiche Università degli studi di Udine
Processor modes
� USR: User Mode (for user level code execution)
� SVC: Supervisor Mode (for kernel level code execution)
� activated on reset and when a SVC instruction is executed
� SYS: System Mode (similar to SVC, without banked registers)
� used to read/modify SP and LR of User Mode from kernel code
� IRQ: Normal Interrupt Mode (for interrupt handling)
� activated when the IRQ line is asserted
� FIQ: Fast Interrupt Mode (for fast interrupt handling)
� activated when the FIQ line is asserted
� UND: Undefined Mode
� activated when an invalid instruction is executed
� ABT: Abort Mode (for memory access faults handling)
� activated when� an instruction or data fetch is attempted from an invalid address (MMU fault):
� synchronous abort
� external exception from memory subsystem (e.g., parity error, unusable address):� asynchronous abort
� Hyp: Hypervisor Mode (for management of virtualized systems)
� If virtualization extensions are present
� Monitor: Monitor Mode (for handling secure vs non-secure transitions)
� If security extensions are present
privileged
modes
Reti Logiche Università degli studi di Udine
General Purpose registers
� Size: 32 bit
� Names: R0 – R15
� R0 - R12: general purpose� R8 - R12 banked in FIQ mode
� R13 (or SP): stack pointer (software rule)� banked in all privileged modes, but for SYS
� R14 (or LR): function return address� banked in all privileged modes, but for SYS
� R15 (or PC): program counter� Points to current instruction + 8 (when executing ARM instructions)� Points to current instruction + 4 (when executing Thumb instructions)� Instructions can read and write PC
� Banked registers
� duplicated copies of registers� available in some processor mode
Reti Logiche Università degli studi di Udine
Registers banks
R0R0
R1R1
R2R2
R3R3
R4R4
R5R5
R6R6
R7R7
R8R8
R9R9
R10R10
R11R11
R12R12
SPSP
LRLR
PCPC
SP_svcSP_svc
LR_svcLR_svc
SP_abtSP_abt
LR_abtLR_abt
SP_undSP_und
LR_undLR_und
SP_irqSP_irq
LR_irqLR_irq
R8_fiqR8_fiq
R9_fiqR9_fiq
R10_fiqR10_fiq
R11_fiqR11_fiq
R12_fiqR12_fiq
SP_fiqSP_fiq
LR_fiqLR_fiq
CPSRCPSR
SPSR_svcSPSR_svc SPSR_abtSPSR_abt SPSR_undSPSR_und SPSR_irqSPSR_irq SPSR_fiqSPSR_fiq
SP_hypSP_hyp
SPSR_hypSPSR_hyp
ELR_hypELR_hyp
SP_monSP_mon
LR_monLR_mon
SPSR_monSPSR_mon
privileged modes
R0
R1
R2
R3
R4
R5
R6
R7
R8
R9
R10
R11
R12
SP
LR
PC
R0
R1
R2
R3
R4
R5
R6
R7
R8
R9
R10
R11
R12
PC
R0
R1
R2
R3
R4
R5
R6
R7
R8
R9
R10
R11
R12
PC
R0
R1
R2
R3
R4
R5
R6
R7
R8
R9
R10
R11
R12
PC
R0
R1
R2
R3
R4
R5
R6
R7
R8
R9
R10
R11
R12
PC
R0
R1
R2
R3
R4
R5
R6
R7
PC
CPSR CPSR CPSR CPSR CPSR CPSR
R0
R1
R2
R3
R4
R5
R6
R7
R8
R9
R10
R11
R12
PC
CPSR
R0
R1
R2
R3
R4
R5
R6
R7
R8
R9
R10
R11
R12
PC
CPSR
LR
User System Supervisor Abort Undefined IRQ FIQ Hyp Monitor
With
virtualization
extension
With
security
extension
exception modes
Reti Logiche Università degli studi di Udine
Program status register
� Current Program Status Register: CPSR
� Condition flags
� Special flags
� Exception mask bits
� Execution state bits
� Mode bits
31 30 29 28 27 26 25 24 23 20 19 16 15 10 9 8 7 6 5 4 0
NN ZZ CC VV QQ IT[1:0]IT[1:0] JJ GEGE IT[7:2]IT[7:2] EE AA II FF TT M[4:0]M[4:0]
Program status register format
Reti Logiche Università degli studi di Udine
Program status register
� Current Program Status Register: CPSR
� Condition flags
� accessible (read and write in all modes)
� N: negative flag
� Z: zero flag
� C: carry flag
� V: overflow flag
� Special flags
� accessible (read and write in all modes)
� Q: overflow or saturation for saturation arithmetic
� GE: Greater than or Equal flags for SIMD instructions
Reti Logiche Università degli studi di Udine
Program status register
� Current Program Status Register: CPSR
� Exception mask bits
� accessible (read: in all modes; write: in privileged modes)
� A: Asynchronous abort disable
� 0: async. abort exceptions enabled
� 1: async. abort exceptions masked
async. abort: external exception from memory subsystem
� e.g., parity error, unusable address
� not MMU exceptions (MMU exceptions are always synchronous)
� I: Interrupt disable
� 0: IRQ exceptions enabled
� 1: IRQ exceptions masked
� F: Fast interrupt disable
� 0: FIQ exceptions enabled
� 1: FIQ exceptions masked
Reti Logiche Università degli studi di Udine
Program status register
� Current Program Status Register: CPSR
� Execution state bits
� IT[7:0]: If-Then (for the Thumb IT instruction)
� not accessible (read as zero, writes are ignored or unpredictable)
� J: Jazelle
� not accessible (read as zero, writes are ignored or unpredictable)
� E: Endianness
� accessible (read: in all modes; write: in privileged modes)
� access is deprecated
� T: Thumb
� not accessible (read as zero, writes are ignored or unpredictable)
Reti Logiche Università degli studi di Udine
Program status register
� Current Program Status Register: CPSR
� Mode bits
� accessible (read: in all modes; write: in privileged modes)
� M[4:0] mode
� 10000: USR
� 10001: FIQ
� 10010: IRQ
� 10011: SVC
� 10111: ABT
� 11011: UND
� 11111: SYS
� 11010: HYP
� 10110: MON
Reti Logiche Università degli studi di Udine
Program status register
� APSR
� the portion of CPSR useful in USR mode
� E, A, I, F, M are readable in USR mode
� the access is deprecated in USR mode
� reading E is deprecated in all modes
� IT, J, T are readable in USR mode
� reads return 0 in all modes
31 30 29 28 27 26 25 24 23 20 19 16 15 10 9 8 7 6 5 4 0
NN ZZ CC VV QQ IT[1:0]IT[1:0] JJ GEGE IT[7:2]IT[7:2] EE AA II FF TT M[4:0]M[4:0]
Read as zero
Read possible but deprecated
Read as zero
Reti Logiche Università degli studi di Udine
Program status register
� Saved Program Status Register: SPSR
� only in privileged modes
� one for each mode, but for SYS
� SPSR_svc, SPSR_irq, SPSR_fiq, SPSR_und, SPSR_abt,
SPSR_hyp, SPSR_mon
� accessible (read and write)
Reti Logiche Università degli studi di Udine
ARM: program status register
User mode:
only changes to flags
examples:
MRS R0, CPSR load R0 with content of CPSR (the accessible bits)
MSR CPSR_f, R0 write the flag portion of CPRS
no saved copy
Privileged modes:
examples:
MSR CPSR_f, R0 write the flag portion of CPRS
MSR CPSR, R0 write CPRS
MSR SPSR, R0 write SPRS (if exists)
MSR SPSR_c, R0 write the lowest byte of SPRS (if exists)
Reti Logiche Università degli studi di Udine
ARM structure
Increm.
Barrel
shifter
Mult.
Address register
Register bank
Dataout register Datain register
Instruction
decode
and
control
ALU
A[31:0]Control
PC
P
C
A
L
U
b
u
s
A
b
u
s
B
b
u
s
I-cache
Register read
D-cache
Register write
I-decode
shift
byte repl.
rot/sgn ext
+4
+4
muxALU
X
next
pc
pc+4
pc+8
LDM/
STM post-
index
pre-index
B, BLMOV pcSUBS pc
load/store
address
buffer/
data
execute
reg
shift
forwarding
paths
instruction
decode
immediate
fields
write-back
fetch
r15
LDR pc
Reti Logiche Università degli studi di Udine
ARM: exception handling
Exceptions
ResetExternal reset asserted
Undefined instructionUndefined or invalid instruction executed
Supervisor callSVC instruction executed
Prefetch abortInvalid address during instruction fetch
Data abortInvalid address during data fetch
IRQExternal interrupt asserted
FIQExternal fast interrupt (high priority) asserted
Reti Logiche Università degli studi di Udine
Exception handling
Other exceptions (not considered here)
� Secure Monitor Call
� SMC instruction executed
� Only if security extensions are present
� Hypervisor Call
� HVC instruction executed
� Hyp Trap
� privileged instruction executed in a virtual machine
� Virtual Abort
� external asynchronous abort within a virtual machine
� Virtual IRQ
� Virtual IRQ generated in a virtual machine
� Virtual FIQ
� Virtual FIQ generated in a virtual machine
Only if virtualization
extensions are present
Only if secure extensions
are present
Reti Logiche Università degli studi di Udine
ARM: exception handling
Behavior:
1. Change processor mode
2. Save CPSR (to SPSR of the new mode)
3. Save return address in LR of the new mode
4. Mask IRQ exception
5. Mask other exceptions if needed� depends on exception
6. Jump to a fixed address� depends on exception
Reti Logiche Università degli studi di Udine
ARM: exception handling
Behavior:
Reset
PC = EBASE + 0x00 - New mode = SVC - Mask asynchronous Abort and FIQ
Undefined instruction
PC = EBASE + 0x04 - New mode = UND
Supervisor call
PC = EBASE + 0x08 - New mode = SVC
Prefetch abort
PC = EBASE + 0x0C - New mode = ABT - Mask asynchronous Abort
Data abort
PC = EBASE + 0x10 - New mode = ABT - Mask asynchronous Abort
IRQ
PC = EBASE + 0x18 - New mode = IRQ
FIQ
PC = EBASE + 0x1C - New mode = FIQ - Mask asynchronous Abort and FIQ
Reti Logiche Università degli studi di Udine
ARM: exception handling
Behavior:
� Exception base address
� Bit V of special register SCTLR (System Control Register)
� 0: EBASE = 0 (default)
� 1: EBASE = 0xFFFF0000 (Hivecs)
� Vectored interrupt support
� Vendor implementation dependent
� Several IRQ and FIQ lines
� Each line has its own priority and exception address
Reti Logiche Università degli studi di Udine
Memory model
Virtual Memory System Architecture (VMSA)
MMU: Memory Management Unit
Address translation
Memory protection
Protected Memory System Architecture (PMSA)
MPU: Memory Protection Unit
Memory protection
No address translation
Not considered
here
Reti Logiche Università degli studi di Udine
Virtual Memory System Architecture
� Memory areas:
� Supersections: 16 MB (support is optional)
� Sections: 1 MB
� Large pages: 64 KB
� Small pages: 4 KB
� 2-level page table
� pointed by special registers
� TTBR0: Translation Table Base Register 0
� TTBR1: Translation Table Base Register 1
Reti Logiche Università degli studi di Udine
Virtual Memory System Architecture
� First level table
� Pointed by TTBR0 or TTBR1
� Contains first level descriptors
� 2nd level page table address (22 bits)
� Section base address (12 bits)
� Supersection base address (8 bits)
� Second level table
� Pointed by a first level descriptor
� Contains second level descriptors
� Large page base address (16 bits)
� Small page base address (20 bits)
supersection
section
L page
S page
Reti Logiche Università degli studi di Udine
ARM: Virtual address translation
31 6 5 4 012314 13
Translation table address (Hi)Translation table address (Hi)
31 14-N 13-N 6 5 4 0123
Translation table address (Hi)Translation table address (Hi)
Translation Table Base Register 1 (TTBR1) format
Translation Table Base Register 0 (TTBR0) format
31 023
NN
Translation Table Control Register (TTBCR) format
N: bits to discharge in translation
Reti Logiche Università degli studi di Udine
ARM: Virtual address translation
Table address (Hi)Table address (Hi)
First level table 0
Physical address of
first level page table
18+N14-N
0
TTBR0
Table address (Hi)Table address (Hi)
First level table 1
Physical address of
first level page table
1814
0
TTBR1
Used if
N == 0
VA[31:32-N] == 0
Used if
N != 0 and VA[31:32-N] != 0
Reti Logiche Università degli studi di Udine
ARM: Virtual address translation
TTBR0TTBR0
0
232-N - 1
232-N
232 - 1
First level table 0
Physical
addressAddress
translation
Physical
address
Address
translation
TTBR1TTBR1
First level table 1
Vir
tual
add
ress
sp
ace
Reti Logiche Università degli studi di Udine
Virtual Memory System Architecture
� Virtual address translation
� Small pages
� 12 bits (address[31:20]): first-level table index
� ignore N MSBs when TTBR0 is used
� N: 3 LSBs of special register TTBCR (Translation Table Control Register)
� 8 bits (address[19:12]): second-level table index
� 12 bits (address[11:0]): page offset
Reti Logiche Università degli studi di Udine
ARM: Virtual address translation
Page tablePage table
Small PageSmall Page
Table address (Hi)Table address (Hi)
First level table
Second level table
X31 20 19 12 11 0
12
812-N
20
Physical address
22
00
2
Virtual address
2
00
Physical address of
first level descriptor
Physical address of
second level
descriptor
18+N
31-N when using TTBR0
31 when using TTBR1X =
Reti Logiche Università degli studi di Udine
Virtual address translation example
31 6 5 4 0123
11 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
1314
TTBR0
Translation table address (Hi)
31 0
00 00 00 00 00 00 00 00 00 00 11 00 00 00 00 00 00 11 11 11 00 00 00 00 00 00 00 00 00 11 00 00
11121920
VA
11 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 11 00 00 00 Page tablePage table
First level table00
Physical address of first level descriptor
2
1218
Type: page table
Base: 1100111100000000000000
Step1: read entry 2 from first level table
Access 32-bit word at 0x80000008
found:
type is page table
base is 1100111100000000000000
second level table address: 0xCF000000
31 023
00 00 00TTBCR
N = 0
Reti Logiche Università degli studi di Udine
Virtual address translation example
31 0
00 00 00 00 00 00 00 00 00 00 11 00 00 00 00 00 00 11 11 11 00 00 00 00 00 00 00 00 00 11 00 00
11121920
VA
11 11 00 00 11 11 11 11 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 11 11 11 00 00 Small PageSmall Page
Second level table00
Physical address of second level descriptor
2
822
Type: small page
Base: 00010000000000000000
11 11 00 00 11 11 11 11 00 00 00 00 00 00 00 00 00 00 00 00 00 00
From first level page table
Step2: read entry 3 from second level table
Access 32-bit word at 0xCF00001C
found:
type is small page
base is 00010000000000000000
page address: 0x10000000
31 023
00 00 00TTBCR
N = 0
Reti Logiche Università degli studi di Udine
Virtual address translation example
31 0
00 00 00 00 00 00 00 00 00 00 11 00 00 00 00 00 00 11 11 11 00 00 00 00 00 00 00 00 00 11 00 00
11121920
VA
00 00 11 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 11 00 00 Physical address
1220
00 00 11 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
From second level page table
Step3:
physical address is 0x10000004
31 023
00 00 00TTBCR
N = 0
Reti Logiche Università degli studi di Udine
Virtual Memory System Architecture
� Virtual address translation
� Large pages
� 12 bits (address[31:20]): first-level table index
� ignore N MSBs when TTBR0 is used
� N: 3 LSBs of special register TTBCR (Translation Table Control Register)
� 8 bits (address[19:12]): second-level table index
� 16 bits (address[15:0]): page offset
YES: they overlap!
the 2nd level page table must have repeated entries
16 repeated entries
Reti Logiche Università degli studi di Udine
ARM: Virtual address translation
Page tablePage table
Large PageLarge Page
Table address (Hi)Table address (Hi)
First level table
Second level table
X31 20 19 12 11 0
16
812-N
16
Physical address
22
00
2
Virtual address
2
00
Physical address of
first level descriptor
Physical address of
second level
descriptor
18+N
15
31-N when using TTBR0
31 when using TTBR1X =
Reti Logiche Università degli studi di Udine
Virtual Memory System Architecture
� Virtual address translation
� Sections
� 12 bits (address[31:20]): first-level table index
� ignore N MSBs when TTBR0 is used
� N: 3 LSBs of special register TTBCR (Translation Table Control Register)
� 20 bits (address[19:0]): section offset
Reti Logiche Università degli studi di Udine
ARM: Virtual address translation
SectionSection
Table address (Hi)Table address (Hi)
First level table
X31 20 19 0
20
12-N
Physical address
12
Virtual address
2
00
Physical address of
first level descriptor
18+N
31-N when using TTBR0
31 when using TTBR1X =
Reti Logiche Università degli studi di Udine
Virtual Memory System Architecture
� Virtual address translation
� Supersections
� 12 bits (address[31:20]): first-level table index
� ignore N MSBs when TTBR0 is used
� N: 3 LSBs of special register TTBCR (Translation Table Control Register)
� 24 bits (address[23:0]): section offset
YES: they overlap!
in the 1st level page table, Supersection entries must be repeated 16 times
Reti Logiche Università degli studi di Udine
ARM: Virtual address translation
SupersectionSupersection
Table address (Hi)Table address (Hi)
First level table
X31 20 19 0
24
12-N
Physical address
8
Virtual address
2
00
Physical address of
first level descriptor
18+N
23
16, if 40-bit physical
addresses are
supported (optional)
31-N when using TTBR0
31 when using TTBR1X =
Reti Logiche Università degli studi di Udine
ARM: Virtual address translation
Page tablePage table
SectionSection
SupersectionSupersection
SupersectionSupersection
Large PageLarge Page
Large PageLarge Page
Small PageSmall Page
Repeated
16 times
Repeated
16 times
16 MB
memory
region
1 MB
memory
region
64 KB
memory
page
4 KB
memory
page
Table addressTable address
VA[X : 20]VA[X : 20] VA[19 : 12]VA[19 : 12]
From TTBR0
or TTBR1
(right padded)
First level table
Second level table
In both tables, entries also include
control bits
(entry type, access permissions, etc.)
32 bits 32 bits
31-N when using TTBR0
31 when using TTBR1X =
Reti Logiche Università degli studi di Udine
ARM instruction set
� 32-bit instructions
� Aligned on a four-byte boundary
� 3-address instructions
� Very regular format
� Conditional execution of (almost) every instruction
� Shift and ALU operation in a single instruction
� Cannot specify a 32-bit immediate constant
� Not enough bits in a 32-bit instruction
� Data instructions use 12 bits for immediate
� exploiting the barrel shifter
� Data processing instructions use 12 bits for immediate� 8 bits for base constant: imm8� 4 bits for rotation: rot� immediate = rotate_right(imm8, rot)� Available constants
� 0 – 255 (no rotation)� 256, 260, 264, …, 1020 (64, 65, …, 255) rotated of 30� 1024, 1040, 1056, …, 4080 (64, 65, …, 255) rotated of 28� …
Reti Logiche Università degli studi di Udine
Conditional execution
� Conditional instructions
� Branches
� Pipeline hazard
� Pipeline stalls: several cycles lost
� Branch prediction
� Speculative execution
� Data processing instructions
� Execution proceed in any case
� If condition is false, results are discarded
� Conditional instruction behaves as a NOP
� 1 cycle lost
Reti Logiche Università degli studi di Udine
Conditional execution
� A “conditional suffix” makes the ARM instruction conditional
� Example
� Instruction:
� add r0, r1, r2
� Sum the content of registers r1 and r2, and store the result in r0 (r0 <= r1 + r2)
� Conditional suffix:
� eq
� true if flag Z is 1
� usually a previous comparison (a subtraction) has provided 0 as result
� ARM conditional instruction:
� addeq r0, r1, r2
� The operation is performed only if Z = 1
� The operation is actually always performed, but the result is stored only if Z = 1
� eq: conditional suffix
Reti Logiche Università degli studi di Udine
Conditional execution
conditional suffixes
Suffix Meaning Flags
EQ EQual Z == 1
NE Not Equal Z == 0
CS Carry Set C == 1
CC Carry Clear C == 0
MI MInus, negative N == 1
PL PLus, positive or zero N == 0
VS Overflow (V Set) V == 1
VC No overflow (V Clear) V == 0
HI Unsigned HIgher C == 1 and Z == 0
LS Unsigned Lower or Same C == 0 or Z == 1
GE Signed Greater than or Equal N == V
LT Signed Less Than N != V
GT Signed Greater Than Z == 0 and N == V
LE Signed Less than or Equal Z == 1 or N != V
None (AL) Always -
For tests on unsigned values
For tests on signed values
Reti Logiche Università degli studi di Udine
Conditional execution and flags
� Many data processing instructions do not affect flags by default
� Suffix S means: Store Flags
� Examples
� sub r1, r2, r3
� Subtract content of r3 from content of r2, store result in r1
� r1 <= r2 – r3
� Do NOT store flags in CPSR
� subs r1, r2, r3
� As above
� DO store flags in CPSR
� cmp does not require S
� cmp is only used to generate flags
� cmp means: subtract, do not store result but store flags
� e,g., cmp r1, r2: perform r1 - r2, discard result, store flags
Reti Logiche Università degli studi di Udine
Conditional execution
if (a >= b) c = 0;else c = 1;
if (a >= b) c = 0;else c = 1;
a, b: signed integers
a mapped on r1
b mapped on r2
c mapped on r3
cmp r1, r2 // generate flags blt else // if a < b, branch to “else” code mov r3, #0 // this is the “then” code b endifelse:
mov r3, #1 // this is the “else” codeendif:
...
Example:
cmp r1, r2 // generate flags movge r3, #0 // “if” assignment movlt r3, #1 // “else” assignment
c = 1
...
a >= b ?ny
c = 0
else:
endif:
No branches
Reti Logiche Università degli studi di Udine
ARM instructions
� Data processing instructions
� Data transfer
� Arithmetic
� Logical
� Comparison
� Shift
� Control flow
� Memory access
� System instructions
Reti Logiche Università degli studi di Udine
ARM instructions
Data processing instruction:
OPCODE<flag suffix><conditional suffix> OPERANDS
OPCODE: operation
<flag suffix>: store flags
S: store flags
nothing: do not store flags
Some exceptions: CMP, TST, ... : always store flags
<conditional suffix>: conditional execution
Instruction is executed only if condition is true
OPERANDS: registers and immediate constants
Reti Logiche Università degli studi di Udine
Data processing instructions
� Only work on registers, NOT memory
� Second operand is sent to the ALU via barrel shifter
� Mostly 3-address instructions
� First: destination register
� Second: first operand (always a register)
� Third: second operand (a register or a constant)
Reti Logiche Università degli studi di Udine
Data processing instructions
� Barrel shifter operations
� LSL: Logical Shift Left
� LSR: Logical Shift Right
� ASR: Arithmetic Shift Right
� ROR: Rotate Right
� RRX: Rotate Right Extended
operand 0
operand0
operand
operand
Same as ROR, but operand is
33-bit (carry flag is added)
Reti Logiche Università degli studi di Udine
Data processing instructions
� Data movement
� MOV, MVN
� Arithmetic
� ADD, ADC, SUB, SBC, RSB, RSC
� MUL
� MLA, MLS, UMULL, UMLAL, SMULL, SMLAL, UMAAL
� Some ARM cores also have instructions for integer division: UDIV, SDIV
� Logical
� AND, ORR, EOR, BIC, MVN
� Comparison
� CMP, CMN
� TST, TEQ
� Shift
� LSL, LSR, ASR, ROR, RRX
MVN is actually a NOT
4-address instructions
Not true instructions, actually mov with a shift applied
Arithmetic instructions without result storing
Logical instructions without result storing
Reti Logiche Università degli studi di Udine
Data movement instruction
� MOV: move data to a register
� mov rd, N
� rd: destination register
� N: immediate or source register (and shift)
� Examples
� mov r0, r2 r0 <= r2
� mov r0, #1 r0 <= 1
� mov r0, r1, lsl #2 r0 <= r1 << 2
� mov r0, r1, lsl r2 r0 <= r1 << r2
� MVN: move data negated to a register
� mvn rd, N
� N is negated before being stored in rd
Reti Logiche Università degli studi di Udine
Arithmetic instructions
� ADD: sum data
� add rd, rn, N
� rd: destination register
� rn: first source register
� N: immediate or source register (and shift)
� Examples
� add r0, r1, r2 r0 <= r1 + r2
� add r0, r1, #2 r0 <= r1 + 2
� add r0, r1, r2, lsl #2 r0 <= r1 + (r2 << 2)
� add r0, r1, r2, lsl r3 r0 <= r1 + (r2 << r3)
� Others
� SUB: subtract� e.g., sub r0, r1, r2 r0 <= r1 - r2
� RSB: reverse subtract� e.g., rsb r0, r1, r2 r0 <= r2 - r1
� ADC: add with carry� e.g., adc r0, r1, r2 r0 <= r1 + r2 + carry_flag
� SBC: subtract with carry� e.g., sbc r0, r1, r2 r0 <= r1 - r2 - carry_flag
� RSC: reverse subtract with carry� e.g., rsc r0, r1, r2 r0 <= r2 - r1 - carry_flag
Reti Logiche Università degli studi di Udine
Arithmetic instructions
� MUL: multiply
� e.g., mul r0, r1, r2 r0 <= r1 * r2 {lowest 32 bits}
� MLA: multiply and accumulate
� e.g., mla r0, r1, r2, r3 r0 <= r3 + r1 * r2 {lowest 32 bits}
� MLS: multiply and subtract
� e.g., mls r0, r1, r2, r3 r0 <= r3 - r1 * r2 {lowest 32 bits}
� UMULL: unsigned multiply long
� e.g., umull r0, r1, r2, r3 r1:r0 <= r2 * r3
� UMLAL: unsigned multiply and accumulate long
� e.g., umlal r0, r1, r2, r3 r1:r0 <= r1:r0 + r2 * r3
� SMULL: signed multiply long
� e.g., smull r0, r1, r2, r3 r1:r0 <= r2 * r3
� SMLAL: signed multiply and accumulate long
� e.g., smlal r0, r1, r2, r3 r1:r0 <= r1:r0 + r2 * r3
� UMAAL: unsigned multiply and accumulate 2 long
� e.g., umaal r0, r1, r2, r3 r1:r0 <= r1 + r0 + r2 * r3
No barrel
shifter for
operands of
these
instructions
Reti Logiche Università degli studi di Udine
Logical instructions
� AND: bitwise and
� e.g., and r0, r1, r2 r0 <= r1 and r2
� ORR: bitwise or
� e.g., orr r0, r1, r2, lsl #1 r0 <= r1 or (r2 << 1)
� EOR: bitwise xor
� e.g., eor r0, r1, r2, lsl r3 r0 <= r1 xor (r2 << r3)
� BIC: bit clear
� Clear all bits of the first operand that are set in the second operand� e.g., bic r0, r2, r3 r0 <= r2 and not r3
Reti Logiche Università degli studi di Udine
Comparison instructions
� CMP: compare
� e.g., cmp r4, r5 r4 – r5 {do not store result, always store flags}
� CMN: compare negative
� e.g., cmn r4, r5 r4 + r5 {do not store result, always store flags}
� TST: test
� e.g., tst r4, r5 r4 and r5 {do not store result, always store flags}
� TEQ: test equivalence
� e.g., teq r4, r5 r4 xor r5 {do not store result, always store flags}
Reti Logiche Università degli studi di Udine
Shift instructions
� LSL: logical shift left
� e.g., lsl r0, r1, #5 r0 <= r1 << 5� actually: mov r0, r1, lsl #5
� e.g., lsl r0, r1, r2 r0 <= r1 << r2� actually: mov r0, r1, lsl r2
� LSR: logical shift right
� e.g., lsr r0, r1, #5 r0 <= r1 >> 5� actually: mov r0, r1, lsr 5
� ASR: arithmetic shift right
� e.g., asr r0, r1, #5 r0 <= r1 >> 5� actually: mov r0, r1, asr #5
� ROR: rotate right
� e.g., ror r0, r1, #5 rotation right without carry� actually: mov r0, r1, ror #5
� RRX: rotate right with extend
� only 1 bit rotation is available� e.g., rrx r0, r1 rotation right, 1 bit, with carry
� actually: mov r0, r1, rrx
Reti Logiche Università degli studi di Udine
Data processing instructions
� Others
� Count leading zeros
� CLZ
� Saturated arithmetic
� QADD, QSUB, QDADD, QDSUB, …
� Parallel arithmetic
� SADD16, SSUB16, SADD8, SSUB8, …
� Halfword multiply and multiply accumulate instructions
� SMULWB, SMULWT, SMLABB, SMLABT, SMLATB, SMLATT, …
� Floating-point data processing
� Advanced SIMD instructions
Reti Logiche Università degli studi di Udine
Data processing instructions
� Notes
� Due to immediate constant limitations, mov cannot load
small negative values in registers
� Use mvn
� r0 <= -1 mvn r0, #0
� r1 <= -3 mvn r1, #2
� Fast multiplication for a small constant can be implemented
exploiting the barrel shifter
� e.g.,
� r4 <= r3 * 35 add r4, r3, r3, lsl #2 (r4 <= r3 * 5)
rsb r4, r4, r4, lsl #3 (r4 <= r4 * 7)
Reti Logiche Università degli studi di Udine
Control flow instructions
� Branch instructions
� Conditional or unconditional
� With or without link
� Link: save next instruction address in LR
� For subroutine calls
� Target address
� Immediate constant (offset from PC)
� Register
� With or without instruction set changing
� Switch between ARM and Thumb execution
� Implicit branches
� Instructions that use PC as destination register
� Others than ldm are deprecated
Reti Logiche Università degli studi di Udine
ARM instructions
Control flow (branches):
OPCODE<conditional suffix> DESTINATION
OPCODE: operation
<conditional suffix>: conditional execution
DESTINATION: register or immediate constant
Reti Logiche Università degli studi di Udine
Branch instructions
� B: branch
� e.g., b label pc <= address of label
� e.g., beq label pc <= address of label if Z = 1
� destination address is in an immediate constant� computed (by assembler) as PC relative immediate offset
� BL: branch with link
� e.g., bl function lr <= <return address> ; pc <= function address
� destination address is in an immediate constant� computed (by assembler) as PC relative immediate offset
� BX: branch and exchange
� e.g., bx lr pc <= lr {change instruction set if needed}
� destination address is in a register
� BLX: branch with link and exchange
� e.g., blx function lr <= <return address> ; pc <= function address
{change instruction set if needed}
� e.g., blx r0 pc <= r0 {change instruction set if needed}
� destination address is in a register or in an immediate constant
Reti Logiche Università degli studi di Udine
Memory access instructions
� Single register transfers
� Data types
� 32-bit (word)
� 16-bit (half-word)
� 8-bit (byte)
� Direction
� Load: LD
� Store: ST
� Addressing
� Pre/post increment/decrement
� Allows efficient array access
Reti Logiche Università degli studi di Udine
ARM instructions
Memory access (single data):
OPCODE<size><conditional suffix> OPERANDS
OPCODE: operation
<size>:
B: byte
SB: signed byte (not for STR)
H: halfword (16-bit)
SH: signed halfword (16-bit) (not for STR)
Nothing: word (32-byte)
<conditional suffix>: conditional execution
OPERANDS:destination/source register
address and indexing specification
Reti Logiche Università degli studi di Udine
Single register transfers
� Load 32-bit from memory to register
� ldr rd, [rn]
� Use address stored in rn and load data from memory
� Store data in rd
� Example
� ldr r0, [r1] r0 <= MEM[r1]
Reti Logiche Università degli studi di Udine
Single register transfers
� Load 32-bit from memory to register
� Pre-increment/decrement
� Pre-: (step 1) compute address, (step 2) access memory
� ldr rd, [rn, +/- rm, shift]
� Use address stored in rn +- (shifted rm)
� Example
� ldr r0, [r1, r2, lsl #2] r0 <= MEM[r1 + (r2 << 2)]
� ldr rd, [rn, +/- #imm12]
� Example
� ldr r0, [r1, #12] r0 <= MEM[r1 + 12]
Reti Logiche Università degli studi di Udine
Single register transfers
� Load 32-bit from memory to register
� Pre-increment/decrement with pointer update
� Pre-: (step 1) compute address, (step 2) access memory
� Update of pointer is indicated by !
� ldr rd, [rn, +/- rm, shift]!
� Use address computed as rn +- (shifted rm)
� Update rn
� Example
� ldr r0, [r1, r2, lsl #2] r0 <= MEM[r1 + (r2 << 2)]
r1 <= r1 + (r2 << 2)
� ldr rd, [rn, +/- #imm12]!
� Use address computed as rn +- #imm12
� Update rn
� Example
� ldr r0, [r1, #20]! r0 <= MEM[r1 + 20]
r1 <= r1 + 20
Reti Logiche Università degli studi di Udine
Single register transfers
� Load 32-bit from memory to register
� Post-increment/decrement (with implicit pointer update)
� Post-: (step 1) access memory, (step 2) compute address
� Update is implicit (otherwise computation is meaningless): no !
� ldr rd, [rn], +/- rm, shift
� Use address stored in rn
� Update rn with rn +- (shifted rm)
� Example
� ldr r0, [r1], r2, lsl #2 r0 <= MEM[r1
r1 <= r1 + (r2 << 2)
� ldr rd, [rn], +/- #imm12
� Use address stored in rn
� Update rn with rn +- #imm12
� Example
� ldr r0, [r1], #16 r0 <= MEM[r1]
r1 <= r1 + 16
Reti Logiche Università degli studi di Udine
Single register transfers
� Load 32-bit from memory to register (summary)
� Pre-increment
� ldr rd, [rn, offset]
� ldr rd, [rn, +/- rm, shift]
� ldr rd, [rn, +/-#imm32]
� Pre-increment with pointer update
� ldr rd, [rn, offset]!
� ldr rd, [rn, +/- rm, shift]!
� ldr rd, [rn, +/-#imm32]!
� Post-increment (with pointer update)
� ldr rd, [rn], offset
� ldr rd, [rn], +/- rm, shift
� ldr rd, [rn], +/-#imm32
Reti Logiche Università degli studi di Udine
Single register transfers
� Store 32-bit data to memory from register
� Similar to ldr
� Pre-increment
� str rt, [rn, offset]
� Pre-increment with pointer update
� str rt, [rn, offset]!
� Post-increment (with pointer update)
� str rt, [rn], offset
Store data
contained
in rt
Reti Logiche Università degli studi di Udine
Single register transfers
� Other sizes
� Same address specification
� Use the lower part of the source/destination register
� Load
� Load byte (unsigned): LDRB
� Load byte (signed): LDRSB
� Load half-word (unsigned): LDRB
� Load half-word (signed): LDRSH
� Store
� Store byte: STRB
� Store half-word: STRH
Reti Logiche Università degli studi di Udine
Single register transfers
� Others
� Double register transfers
� LDRD, STRD
� Load a couple of registers
� load-linked, store conditional
� LDREX, STREX, CLREX
� Used for multiprocessing synchronization
� Since ARMv6
� Memory-register data swap
� SWP
� Double access
� Deprecated since ARMv6
� Extra load/store instructions, unprivileged
� LDRT, LDRBT, LDRSBT, LDRHT, LDRSHT, STRT, STRBT, STRHT
Reti Logiche Università degli studi di Udine
Memory access instructions
� Multiple register transfers
� Transfer a subset of registers to/from memory
� Data types
� 32-bit (word)
� Direction
� Load: LDM
� Store: STM
� Addressing
� Increment/decrement before/after
� Allows efficient stack access
Reti Logiche Università degli studi di Udine
ARM instructions
Memory access (multiple data):
OPCODE<conditional suffix><addressing mode> OPERANDS
OPCODE: operation
<conditional suffix>: conditional execution
<addressing mode>:DA: decrement after
IA: increment after
DB: decrement before
IB: increment before
OPERANDS:pointer (register)
List of destination/source registers and pointer update request
Registers are transferred in order
Lowest register number is always transferred to/from lowest memory
location accessed.
Or (stack oriented addressing modes):
FA: full ascending
FD: full descending
EA: empty ascending
ED: empty descending
Reti Logiche Università degli studi di Udine
Memory access (multiple data)
� Memory access (multiple data) examples:
� ldmia r13!, {r0-r12, r14} ; IA: increment after
; r0 <= MEM[r13] step-1: access memory – step-2 (After): Increment address
; r1 <= MEM[r13 + 4]
; ...
; r12 <= MEM[r13 + 48]
; r14 <= MEM[r13 + 52]
; r13 <= r13 + 56 pointer update required (!)
� stmib r13!, {r0, r2} ; IB: increment before
; MEM[r13 + 4] <= r0 step-1 (Before): Increment address – step-2: access memory
; MEM[r13 + 8] <= r1
; r13 <= r13 + 8 pointer update required (!)
Reti Logiche Università degli studi di Udine
Stack
sp
r14 (lr)
r12 (ip)
r11 (fp)
r5r4
r15 (pc)sp
Full Descending
Store: decrement before
Load: increment after
restore r5 from stack
STMFD
LDMFD
STMDB
LDMIA
Full Descending:
Full: sp points to a location with data
Descending: sp must be decremented when pushing into the stack
Empty Descending:
Empty: sp points to an empty location
Descending: sp must be decremented when pushing into the stack
Empty Ascending:
Empty: sp points to an empty location
Ascending: sp must be incremented when pushing into the stack
Full Ascending:
Full: sp points to a location with data
Ascending: sp must be incremented when pushing into the stack
Reti Logiche Università degli studi di Udine
Stack
Full Descending Empy Descending Empty Ascending Full Ascending
ST: decrement before
LD: increment after
ST: decrement after
LD: increment before
ST: increment after
LD: decrement before
ST: increment before
LD: decrement after
STMFD
LDMFD
STMDB
LDMIA
sp
r14 (lr)
r12 (ip)
r11 (fp)r5
r4
r15 (pc)spsp
r14 (lr)
r12 (ip)r11 (fp)
r5
r4
r15 (pc)
sp
spr14 (lr)
r12 (ip)r11 (fp)
r5
r4
r15 (pc)
spsp
r14 (lr)
r12 (ip)
r11 (fp)r5
r4
r15 (pc)sp
Other possible stacks
Functions must use a FD stack
Reti Logiche Università degli studi di Udine
System instructions
� SVC: supervisor call (also: SWI)
� e.g., svc #0
� MCR: move to coprocessor (special register) from register
� MRC: move to register from coprocessor (special register)
� MRS: move to register from status register
� e.g., mrs r0, CPSR� e.g., mrs r0, SPSR
� MSR: move to status register
� e.g., msr CPSR, R0� e.g., msr CPSR_f, R0 write only the flags portion
� e.g., msr SPSR, R0� e.g., msr CPSR_f, #0x20000000 set the C flag
Reti Logiche Università degli studi di Udine
System instructions
� Others
� Memory barriers
� DSB, DMB, ISB
� Other traps
� HVC, SMC
� Two registers core-coprocessor transfers
� MCRR, MCRR2, MRCC, MRCC2
� …
Reti Logiche Università degli studi di Udine
ARMv8
64-bit architecture
Backward compatible with 32-bit ARM architectures
ARMv7-A with:
Multiprocessing Extensions
Large Physical Address Extension
Virtualization Extensions
Security Extensions
VFPv4
SIMDv2
Reti Logiche Università degli studi di Udine
ARMv8
64-bit architecture
Backward compatible with 32-bit ARM architectures
2 execution statesAArch64
R0-R30: general purpose, 64-bit registers
SP: 64-bit stack pointer
PC: 64-bit program counter (not directly writable)
V0-V31: SIMD and floating point, 128-bit registers
Aarch32ARMv7-A with
A32 instruction set
In former notation: ARM instruction set
T32 instruction set
In former notation: Thumb + Thumb-2 instruction sets
No Jazelle; no ThumbEE
Reti Logiche Università degli studi di Udine
ARM: other info
ARM Architecture Reference Manual
ARMv7-A and ARMv7-R edition
ARM v7-M Architecture Reference Manual
ARMv8, for ARMv8-A architecture profile