systemy rt i embedded wykład 7 rdzenie arm, część3 · arm1136 • arm1136j(f)-s processor:...

103
Systemy RT i embedded Wrocław 2013 Wykład 7 Rdzenie ARM, część 3

Upload: others

Post on 19-Apr-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Systemy RT i embedded Wykład 7 Rdzenie ARM, część3 · ARM1136 • ARM1136J(F)-S Processor: –Designedfor use as applications processors; includes many features of the ARM1176JZ(F)-S

Systemy RT i embedded

Wrocław 2013

Wykład 7

Rdzenie ARM, część 3

Page 2: Systemy RT i embedded Wykład 7 Rdzenie ARM, część3 · ARM1136 • ARM1136J(F)-S Processor: –Designedfor use as applications processors; includes many features of the ARM1176JZ(F)-S

Plan

• ARM11

• Cortex-A5

• Cortex-A9

• Cortex-A15

• big.LITTLE processing

• Cortex-A50

Page 3: Systemy RT i embedded Wykład 7 Rdzenie ARM, część3 · ARM1136 • ARM1136J(F)-S Processor: –Designedfor use as applications processors; includes many features of the ARM1176JZ(F)-S

ARM11

Page 4: Systemy RT i embedded Wykład 7 Rdzenie ARM, część3 · ARM1136 • ARM1136J(F)-S Processor: –Designedfor use as applications processors; includes many features of the ARM1176JZ(F)-S

ARM11

Source: [1]

Page 5: Systemy RT i embedded Wykład 7 Rdzenie ARM, część3 · ARM1136 • ARM1136J(F)-S Processor: –Designedfor use as applications processors; includes many features of the ARM1176JZ(F)-S

ARM11

• The ARM11™ processor family provides the engine that powers a lot of smartphones in production today;

• It is also widely used in consumer, home, and embedded applications.

• ARM11 delivers extreme low power and a range of performance from 350 MHz in small area designs up to 1 GHz in speed optimized designs

• ARM11 processor software is compatible with all previous generations of ARM processors,

• ARM11 introduces 32-bit SIMD for media processing

Page 6: Systemy RT i embedded Wykład 7 Rdzenie ARM, część3 · ARM1136 • ARM1136J(F)-S Processor: –Designedfor use as applications processors; includes many features of the ARM1176JZ(F)-S

ARM11 features summary

• Main features:

– Powerful ARMv6 instruction set architecture

– ARM Thumb® instruction set reduces memory

bandwidth and size requirements by up to

35%

– ARM Jazelle® technology for efficient

embedded Java execution

– ARM DSP extensions

– SIMD (Single Instruction Multiple Data) media

processing extensions deliver up to 2x

performance for video processing

Page 7: Systemy RT i embedded Wykład 7 Rdzenie ARM, część3 · ARM1136 • ARM1136J(F)-S Processor: –Designedfor use as applications processors; includes many features of the ARM1176JZ(F)-S

ARM11 features summary

• Main features:– ARM TrustZone® technology for on-chip security

foundation (ARM1176JZ-S and ARM1176JZF-S processors)

– Low power consumption:• 0.6 mW/MHz (0.13 µm, 1.2 V) including cache controllers

• Energy saving power-down modes address static leakage currents in advanced processes

– High performance integer processor• 8-stage integer pipeline delivers high clock frequency (9

stages for ARM1156T2(F)-S)

• Separate load-store and arithmetic pipelines

• Branch Prediction and Return Stack

Page 8: Systemy RT i embedded Wykład 7 Rdzenie ARM, część3 · ARM1136 • ARM1136J(F)-S Processor: –Designedfor use as applications processors; includes many features of the ARM1176JZ(F)-S

ARM11 features summary

• Main features:

– Thumb-2 technology (ARM1156(F)-S only) for

enhanced performance, energy efficiency and code

density

– High performance memory system design

• Supports 4-64k cache sizes

• Optional tightly coupled memories with DMA for multi-

media applications

• High-performance 64-bit memory system speeds data access

for media processing and networking applications

• ARMv6 memory system architecture accelerates OS context-

switch

Page 9: Systemy RT i embedded Wykład 7 Rdzenie ARM, część3 · ARM1136 • ARM1136J(F)-S Processor: –Designedfor use as applications processors; includes many features of the ARM1176JZ(F)-S

ARM11 features summary

• Main features:– Vectored interrupt interface and low-interrupt-

latency mode speeds interrupt response and real-time performance

– Optional Vector Floating Point coprocessor for automotive/industrial controls and 3D graphics acceleration (ARM1136JF-S, ARM1176JZF-S and ARM1156T2F-S processors)

Page 10: Systemy RT i embedded Wykład 7 Rdzenie ARM, część3 · ARM1136 • ARM1136J(F)-S Processor: –Designedfor use as applications processors; includes many features of the ARM1176JZ(F)-S

ARM11 – cores types

Page 11: Systemy RT i embedded Wykład 7 Rdzenie ARM, część3 · ARM1136 • ARM1136J(F)-S Processor: –Designedfor use as applications processors; includes many features of the ARM1176JZ(F)-S

ARM1176

• ARM1176JZ(F)-S and ARM11 MPCore™

Processors:

– Designed for use as applications processors in

consumer and wireless products.

– Both processors feature the ARMv6 instruction

set architecture, with media processing

extensions, ARM Jazelle® technology, and ARM

Thumb® for compact code.

Page 12: Systemy RT i embedded Wykład 7 Rdzenie ARM, część3 · ARM1136 • ARM1136J(F)-S Processor: –Designedfor use as applications processors; includes many features of the ARM1176JZ(F)-S

ARM1176

• ARM1176JZ(F)-S and ARM11 MPCore™ Processors:

– In the ARM11 processor family, only the

ARM1176JZ(F)-S processor has ARM TrustZone™

technology. TrustZone technology provides support

within the CPU and platform architecture for building

the trusted computing environments required to

enable protection of critical system functions from

downloaded applications, copyright protection of

downloaded media, safe over-the-air system

upgrades.

Page 13: Systemy RT i embedded Wykład 7 Rdzenie ARM, część3 · ARM1136 • ARM1136J(F)-S Processor: –Designedfor use as applications processors; includes many features of the ARM1176JZ(F)-S

ARM1136• ARM1136J(F)-S Processor:

– Designed for use as applications processors; includes many features of the ARM1176JZ(F)-S processor

– Does not include AMBA® 3 AXI™ bus or TrustZone.

– Some users implement the ARM1136J(F)-S processor for compatibility with existing AMBA AHB bus peripherals from their ARM9 processor-based SoCdesigns

– AMBA AHB to AXI fabric enables simpler migration of AHB bus peripherals to ARM1176JZ(F)-S processor-based designs.

– Software-compatible migration path to latest generation ARM Cortex-A class processors.

Page 14: Systemy RT i embedded Wykład 7 Rdzenie ARM, część3 · ARM1136 • ARM1136J(F)-S Processor: –Designedfor use as applications processors; includes many features of the ARM1176JZ(F)-S

ARM1156

• ARM1156T2-S Processor:– First ARM processor to incorporate ARM Thumb-2

technology for even higher code density and instruction set efficiency.

– Thumb-2 technology uses 31 percent less memory than pure 32-bit code to reduce system cost, while delivering up to 38 percent better performance than existing Thumb technology.

– These processors also feature optional parity protection for caches and Tightly Coupled Memories (TCM), and non-maskable interrupts, making them ideal for embedded control applications where high reliability or high availability are paramount.

Page 15: Systemy RT i embedded Wykład 7 Rdzenie ARM, część3 · ARM1136 • ARM1136J(F)-S Processor: –Designedfor use as applications processors; includes many features of the ARM1176JZ(F)-S

ARM1156

• ARM1156T2-S Processor:

– The ARM1156T2-S processors feature an enhanced

Memory Protection Unit (MPU) and offer an ideal

upgrade path for embedded control applications

currently using ARM946E-S, ARM966E-S or older 16-

bit processors.

– These processors feature AMBA 3 AXI specification

interfaces, offering higher system bus bandwidth

with fewer bus layers and rapid timing closure.

– Software-compatible migration path to latest

generation ARM Cortex-R class processors

Page 16: Systemy RT i embedded Wykład 7 Rdzenie ARM, część3 · ARM1136 • ARM1136J(F)-S Processor: –Designedfor use as applications processors; includes many features of the ARM1176JZ(F)-S

Cortex vs ARM9 vs ARM11

Page 17: Systemy RT i embedded Wykład 7 Rdzenie ARM, część3 · ARM1136 • ARM1136J(F)-S Processor: –Designedfor use as applications processors; includes many features of the ARM1176JZ(F)-S

Jazelle

Page 18: Systemy RT i embedded Wykład 7 Rdzenie ARM, część3 · ARM1136 • ARM1136J(F)-S Processor: –Designedfor use as applications processors; includes many features of the ARM1176JZ(F)-S

Jazelle

• Main features:

– Jazelle technology for acceleration of execution

environments

– Jazelle is a combined hardware and software

solution:

• software is a full featured multi-tasking Java Virtual

Machine (JVM), highly optimized to take advantage of

Jazelle technology architecture extensions available in

many ARM processor cores

• hardware support depends on the silicon vendor

– Jazelle architecture extensions delivers high

performance applications and games, fast start-up

and application switching with a very low memory

and power budget

Page 19: Systemy RT i embedded Wykład 7 Rdzenie ARM, część3 · ARM1136 • ARM1136J(F)-S Processor: –Designedfor use as applications processors; includes many features of the ARM1176JZ(F)-S

Jazelle

• Main features:

– High-efficiency Java bytecode execution, >1000

Caffeine Marks @ 200MHz

– Ultra-low Java system cost

– Low power consumption for battery operated

wireless embedded devices

– Single chip MCU, DSP and Java solution

– Integrated into a number of ARM CPU cores

– Rapid ASIC or ASSP integration with reduced time-to-

market

Page 20: Systemy RT i embedded Wykład 7 Rdzenie ARM, część3 · ARM1136 • ARM1136J(F)-S Processor: –Designedfor use as applications processors; includes many features of the ARM1176JZ(F)-S

Jazelle

Page 21: Systemy RT i embedded Wykład 7 Rdzenie ARM, część3 · ARM1136 • ARM1136J(F)-S Processor: –Designedfor use as applications processors; includes many features of the ARM1176JZ(F)-S

Jazelle – layer model

Page 22: Systemy RT i embedded Wykład 7 Rdzenie ARM, część3 · ARM1136 • ARM1136J(F)-S Processor: –Designedfor use as applications processors; includes many features of the ARM1176JZ(F)-S

Jazelle – how it works

• A third instruction set: Java Byte Code (besides

ARM and Thumb)

• A new Java processor mode (with J bit in CPSR)

• The switching between Java mode and other

modes is very simple and fast

• Interrupts are handled as normal, and cause an

immediate return from Java state to ARM state

to run the interrupt handler. At the end of the

interrupt routine, the normal return mechanism

will return the processor to Java state

Page 23: Systemy RT i embedded Wykład 7 Rdzenie ARM, część3 · ARM1136 • ARM1136J(F)-S Processor: –Designedfor use as applications processors; includes many features of the ARM1176JZ(F)-S

Jazelle – interrupts

Page 24: Systemy RT i embedded Wykład 7 Rdzenie ARM, część3 · ARM1136 • ARM1136J(F)-S Processor: –Designedfor use as applications processors; includes many features of the ARM1176JZ(F)-S

Jazelle – registers

Page 25: Systemy RT i embedded Wykład 7 Rdzenie ARM, część3 · ARM1136 • ARM1136J(F)-S Processor: –Designedfor use as applications processors; includes many features of the ARM1176JZ(F)-S

Vector Floating Point

Page 26: Systemy RT i embedded Wykład 7 Rdzenie ARM, część3 · ARM1136 • ARM1136J(F)-S Processor: –Designedfor use as applications processors; includes many features of the ARM1176JZ(F)-S

VFP - architecture

• ARM Floating Point architecture (VFP) provides

hardware support for floating point

operations in half-, single- and double-precision

floating point arithmetic

• There have been three main versions of VFP to

date:

– VFPv1 is obsolete

– VFPv2 is an optional extension to the ARM

instruction set in the ARMv5TE, ARMv5TEJ and

ARMv6 architectures

– VFPv3 is an optional extension to the ARM

instruction set in the ARMv7 architecture

Page 27: Systemy RT i embedded Wykład 7 Rdzenie ARM, część3 · ARM1136 • ARM1136J(F)-S Processor: –Designedfor use as applications processors; includes many features of the ARM1176JZ(F)-S

VFP9 - coprocessor

• VFP9-S synthesizable Vector Floating Point

(VFP) coprocessor is compatible with all of the

ARM9E cores

• The support code has two components:– a library of routines which perform unimplemented functions

(such as transcendental functions)

– some supported functions (such as division) and a set of

exception handlers for processing exception conditions

Page 28: Systemy RT i embedded Wykład 7 Rdzenie ARM, część3 · ARM1136 • ARM1136J(F)-S Processor: –Designedfor use as applications processors; includes many features of the ARM1176JZ(F)-S

VFP9 - coprocessor

• Features:– ARM VFPv2 ISA

– 16 double precision or 32 single precision registers

– Full IEEE754 compliance with ARM support code

– Run-Fast mode for near IEEE754 compliance (hardware only)

– Binary compatible with VFP10 and VFP11

– Portable to any process with supporting tools and cell library

– 100 - 130K gates

– 1.3Mflops/MHz

– Area <1.0mm2 TSMC 0.13µm G

– 180 - 210MHz (worst case) TSMC 0.13µm G

– <0.4mW/MHz (typical) power consumption on TSMC 0.13µm G

Page 29: Systemy RT i embedded Wykład 7 Rdzenie ARM, część3 · ARM1136 • ARM1136J(F)-S Processor: –Designedfor use as applications processors; includes many features of the ARM1176JZ(F)-S

VFP10 - coprocessor

• VFP10-S synthesizable Vector Floating Point

(VFP) coprocessor is compatible with all of the

ARM10E cores

• The support code has two components:– a library of routines which perform unimplemented functions

(such as transcendental functions)

– some supported functions (such as division) and a set of

exception handlers for processing exception conditions

Page 30: Systemy RT i embedded Wykład 7 Rdzenie ARM, część3 · ARM1136 • ARM1136J(F)-S Processor: –Designedfor use as applications processors; includes many features of the ARM1176JZ(F)-S

VFP10 - coprocessor• Features:

– ISA is ARM VFPv2

– 16 double precision or 32 single precision registers

– Large independent register file with 64-bit LD/ST interface

– Full IEEE754 compliance with ARM support code

– Run-Fast mode for near IEEE754 compliance (hardware only)

– Binary compatible with VFP9 and VFP11

– Scalar and vector operation support (ideal for FP DSP)

– Parallel LD/ST, FMAC, and DIV/SQRT execution engines

– 2.0Mflops/MHz

– Area ~1.16mm 2 TSMC 0.13µm LV

– Up to 325MHz (worst case) TSMC 0.13µm LV

– <0.4mW/MHz (typical) power consumption on TSMC 0.13µm LV

Page 31: Systemy RT i embedded Wykład 7 Rdzenie ARM, część3 · ARM1136 • ARM1136J(F)-S Processor: –Designedfor use as applications processors; includes many features of the ARM1176JZ(F)-S

VFP10 - coprocessor

• VFP10 Instruction Set (VFPv2):

– Arithmetic:

• Add, Sub, Mult, Neg-Mult, Negate, Abs Value,

Compare, Div, Square Root

– FMAC (Single and double versions):

• Multiply-Add, Multiply-Subtract, Neg-Multiply-

Add, Neg-Multiply-Subtract

– Type conversions

– Load/Store scalars and vectors, 64-bits per

cycle

Page 32: Systemy RT i embedded Wykład 7 Rdzenie ARM, część3 · ARM1136 • ARM1136J(F)-S Processor: –Designedfor use as applications processors; includes many features of the ARM1176JZ(F)-S

VFP11 - coprocessor• VFP11 synthesizable Vector Floating Point (VFP)

coprocessor is compatible with all of the ARM11

cores (VFP v2 compatibile)

• The VFP11 coprocessor is optimized for:– high data transfer bandwidth through 64-bit split load and store

buses

– fast hardware execution of a high percentage of operations on

normalized data, resulting in higher overall performance while

providing full IEEE 754 standard support when required

– hardware divide and square root operations in parallel with

other arithmetic operations to reduce the impact of long-

latency operations

– near IEEE 754 standard compatibility in RunFast mode without

support code assistance, providing determinable run-time

calculations for all input data

Page 33: Systemy RT i embedded Wykład 7 Rdzenie ARM, część3 · ARM1136 • ARM1136J(F)-S Processor: –Designedfor use as applications processors; includes many features of the ARM1176JZ(F)-S

VFP11 - coprocessor• The VFP11 coprocessor has three separate

instruction pipelines:

– the Multiply and Accumulate (FMAC) pipeline

– the Divide and Square root (DS) pipeline

– the Load/Store (LS) pipeline.

• Each pipeline can operate independently of the

other pipelines and in parallel with them

• Each of the three pipelines shares the first two

pipeline stages, Decode and Issue

Page 34: Systemy RT i embedded Wykład 7 Rdzenie ARM, część3 · ARM1136 • ARM1136J(F)-S Processor: –Designedfor use as applications processors; includes many features of the ARM1176JZ(F)-S

VFP11 - coprocessor• More than one instruction to be completed per

cycle.

• Instructions issued to the FMAC pipeline can

complete out of order with respect to

operations in the LS and DS pipelines

• Except for divide and square root operations,

the pipelines support single-cycle throughput

for all single-precision operations and most

double-precision operations

Page 35: Systemy RT i embedded Wykład 7 Rdzenie ARM, część3 · ARM1136 • ARM1136J(F)-S Processor: –Designedfor use as applications processors; includes many features of the ARM1176JZ(F)-S

VFP11 - coprocessor• Double-precision multiply and multiply and

accumulate operations have a two-cycle

throughput.

• The LS pipeline is capable of supplying two

single-precision operands or one double-

precision operand per cycle, balancing the data

transfer capability with the operand

requirements.

Page 36: Systemy RT i embedded Wykład 7 Rdzenie ARM, część3 · ARM1136 • ARM1136J(F)-S Processor: –Designedfor use as applications processors; includes many features of the ARM1176JZ(F)-S

FMAC Pipeline

Page 37: Systemy RT i embedded Wykład 7 Rdzenie ARM, część3 · ARM1136 • ARM1136J(F)-S Processor: –Designedfor use as applications processors; includes many features of the ARM1176JZ(F)-S

VFPv3 FPU• VFPv3 version of the FPU can be found in Cortex-A

architectures

• The FPU features are:– support for single-precision and double-precision floating-point formats

– support for conversion between half-precision and single-precision

– operation latencies reduced for most operations in single-precision and

double-precision

– high data transfer bandwidth through 64-bit split load and store buses

– completion of load transfers can be performed out-of-order

– normalized and denormalized data are all handled in hardware

– trapless operation enabling fast execution

– support for speculative execution

– low power consumption with high level clock gating and small die size.

Page 38: Systemy RT i embedded Wykład 7 Rdzenie ARM, część3 · ARM1136 • ARM1136J(F)-S Processor: –Designedfor use as applications processors; includes many features of the ARM1176JZ(F)-S

VFPv3 FPU• Unlike VFPv2 implementations, the VFPv3

implementation provides:

– fixed-point to floating-point conversion instructions

and floating-point constant loads

– IEEE half-precision and alternative half-precision

format support

– trapless exception support.

Page 39: Systemy RT i embedded Wykład 7 Rdzenie ARM, część3 · ARM1136 • ARM1136J(F)-S Processor: –Designedfor use as applications processors; includes many features of the ARM1176JZ(F)-S

Trust Zone

Page 40: Systemy RT i embedded Wykład 7 Rdzenie ARM, część3 · ARM1136 • ARM1136J(F)-S Processor: –Designedfor use as applications processors; includes many features of the ARM1176JZ(F)-S

Trust Zone• ARM TrustZone® technology is a system-wide approach to

security on high performance computing platforms for a huge

array of applications including secure payment, digital rights

management (DRM), enterprise and web-based services

• TrustZone technology, tightly integrated tightly into Cortex™-

A and ARM1176 processors, extends throughout the system

via the AMBA® AXI™ bus and specific TrustZone System IP

blocks

• It is possible to secure peripherals such as secure memory,

crypto blocks, keyboard and screen to ensure they can be

protected from software attack.

Page 41: Systemy RT i embedded Wykład 7 Rdzenie ARM, część3 · ARM1136 • ARM1136J(F)-S Processor: –Designedfor use as applications processors; includes many features of the ARM1176JZ(F)-S

Trust Zone - hardware• The security of the system is achieved by

partitioning all of the SoC hardware and software

resources so that they exist in one of two worlds:– the Secure world for the security subsystem

– the Normal world for everything else

• Hardware logic present in the TrustZone-

enabled AMBA3 AXI™ bus fabric ensures that

Normal world components do not access Secure

world resources, enabling construction of a

strong perimeter boundary between the two

Page 42: Systemy RT i embedded Wykład 7 Rdzenie ARM, część3 · ARM1136 • ARM1136J(F)-S Processor: –Designedfor use as applications processors; includes many features of the ARM1176JZ(F)-S

Trust Zone - hardware

Page 43: Systemy RT i embedded Wykład 7 Rdzenie ARM, część3 · ARM1136 • ARM1136J(F)-S Processor: –Designedfor use as applications processors; includes many features of the ARM1176JZ(F)-S

Trust Zone - hardware• The TrustZone hardware architecture

extensions enable a single physical processor

core to execute code safely and efficiently

from both the Normal world and the Secure

world in a time-sliced fashion

• This removes the need for a dedicated security

processor core, which saves silicon area and

power

• The final aspect of the TrustZone hardware

architecture is a security-aware debug

infrastructure that can enable control over

access to Secure world debug, without

Page 44: Systemy RT i embedded Wykład 7 Rdzenie ARM, część3 · ARM1136 • ARM1136J(F)-S Processor: –Designedfor use as applications processors; includes many features of the ARM1176JZ(F)-S

Trust Zone - software• The implementation of a Secure world in the

SoC hardware requires some secure software to

run within it and to make use of the sensitive

assets stored there

• There are many possible secure software

architectures:

– The most advanced is a dedicated Secure world

operating system

– The simplest is a synchronous library of code placed

in the Secure world

Page 45: Systemy RT i embedded Wykład 7 Rdzenie ARM, część3 · ARM1136 • ARM1136J(F)-S Processor: –Designedfor use as applications processors; includes many features of the ARM1176JZ(F)-S

Trust Zone - software

Page 46: Systemy RT i embedded Wykład 7 Rdzenie ARM, część3 · ARM1136 • ARM1136J(F)-S Processor: –Designedfor use as applications processors; includes many features of the ARM1176JZ(F)-S

Cortex-A

Page 47: Systemy RT i embedded Wykład 7 Rdzenie ARM, część3 · ARM1136 • ARM1136J(F)-S Processor: –Designedfor use as applications processors; includes many features of the ARM1176JZ(F)-S

Cortex-A

Application Examples for Cortex-A Processors

Page 48: Systemy RT i embedded Wykład 7 Rdzenie ARM, część3 · ARM1136 • ARM1136J(F)-S Processor: –Designedfor use as applications processors; includes many features of the ARM1176JZ(F)-S

Cortex-A

• The ARM Cortex™-A series of applications processors provide an entire range of solutions:– for devices hosting a rich OS platform

– For devices hosting user applications:

• ultra-low-cost handset

• smartphones,

• mobile computing platforms,

• digital TV and set-top boxes

• enterprise networking,

• printers and

• server solutions

• etc

Page 49: Systemy RT i embedded Wykład 7 Rdzenie ARM, część3 · ARM1136 • ARM1136J(F)-S Processor: –Designedfor use as applications processors; includes many features of the ARM1176JZ(F)-S

Cortex-A

• All Cortex-A Processors share a common architecture and feature set:– ARMv7-A architecture

– Support for full Operating Systems (Symbian, Andriod, Ubuntu, etc.)

– Instruction Set Support - ARM, Thumb-2, Thumb, Jazelle®, DSP

– TrustZone® Security Extensions

– Advanced single-precision and double-precision Floating Point support

– NEON™ media processing engine

Page 50: Systemy RT i embedded Wykład 7 Rdzenie ARM, część3 · ARM1136 • ARM1136J(F)-S Processor: –Designedfor use as applications processors; includes many features of the ARM1176JZ(F)-S

Cortex-A features summary

Page 51: Systemy RT i embedded Wykład 7 Rdzenie ARM, część3 · ARM1136 • ARM1136J(F)-S Processor: –Designedfor use as applications processors; includes many features of the ARM1176JZ(F)-S

Cortex-A5

Page 52: Systemy RT i embedded Wykład 7 Rdzenie ARM, część3 · ARM1136 • ARM1136J(F)-S Processor: –Designedfor use as applications processors; includes many features of the ARM1176JZ(F)-S

Cortex-A5

• Main features:– Architecture: ARM v7-A

– 1.57 DMIPS / MHz per core

– Single or multicore versions available (1-4 cores)

– MMU

– ARM/Thumb/Thumb-2

– ThrustZone Technology

– Configurable L1 caches (from 4-64kB)

Page 53: Systemy RT i embedded Wykład 7 Rdzenie ARM, część3 · ARM1136 • ARM1136J(F)-S Processor: –Designedfor use as applications processors; includes many features of the ARM1176JZ(F)-S

Cortex-A5

• Main features:

– NEON Media Processing Engine - The MPE extends the

Cortex-A5 Floating Point Unit (FPU) an additional register set

supporting a rich set of SIMD operations over 8, 16, and 32-bit

integer and 32-bit Floating-Point data types

– VFPv4-D16

– Jazelle

– AXI bus (over 3x memory bandwidth over

ARM1176JZ-S)

– Advanced Multicore Technologies

– pipeline with dynamic branch prediction

Page 54: Systemy RT i embedded Wykład 7 Rdzenie ARM, część3 · ARM1136 • ARM1136J(F)-S Processor: –Designedfor use as applications processors; includes many features of the ARM1176JZ(F)-S

Cortex-A5

Page 55: Systemy RT i embedded Wykład 7 Rdzenie ARM, część3 · ARM1136 • ARM1136J(F)-S Processor: –Designedfor use as applications processors; includes many features of the ARM1176JZ(F)-S

Cortex-A5

• Configurable options:

Page 56: Systemy RT i embedded Wykład 7 Rdzenie ARM, część3 · ARM1136 • ARM1136J(F)-S Processor: –Designedfor use as applications processors; includes many features of the ARM1176JZ(F)-S

Cortex-A5• Multicore technologies:

– SCU – Snoop Control Unit - central intelligence

responsible for managing:

• interconnect,

• arbitration,

• communication,

• cache-2-cache and system memory transfers,

• cache coherence

• other capabilities for all multicore technology

enabled processors

Page 57: Systemy RT i embedded Wykład 7 Rdzenie ARM, część3 · ARM1136 • ARM1136J(F)-S Processor: –Designedfor use as applications processors; includes many features of the ARM1176JZ(F)-S

Cortex-A5• Multicore technologies:

– ACP – Accelerator Coherence Port – AMBA 3 AXI

compatibile slave interface on the SCU

providing an interconnect point for a range of

system masters that - for overall system

performance, power consumption or reasons

of software simplification - are better

interfaced directly with the Cortex-A5 MPCore

processor

Page 58: Systemy RT i embedded Wykład 7 Rdzenie ARM, część3 · ARM1136 • ARM1136J(F)-S Processor: –Designedfor use as applications processors; includes many features of the ARM1176JZ(F)-S

Cortex-A5• Multicore technologies:

– GIC – Generic Interrupt Controller - provides a rich and flexible approach to inter-processor communication and the routing and prioritisation of system interrupts

– Supports up to 224 independent interrupts under software control:• each interrupt can be distributed across CPU,

• hardware prioritised,

• routed between the operating system and TrustZone software management layer

Page 59: Systemy RT i embedded Wykład 7 Rdzenie ARM, część3 · ARM1136 • ARM1136J(F)-S Processor: –Designedfor use as applications processors; includes many features of the ARM1176JZ(F)-S

Cortex-A8

Page 60: Systemy RT i embedded Wykład 7 Rdzenie ARM, część3 · ARM1136 • ARM1136J(F)-S Processor: –Designedfor use as applications processors; includes many features of the ARM1176JZ(F)-S

Cortex-A8

• Main features:

– Architecture: ARM v7-A

– 2.0 DMIPS / MHz per core

– Single core versions available only

– MMU

– ARM/Thumb/Thumb-2

– ThrustZone Technology

– NEON

– VFP v3

Page 61: Systemy RT i embedded Wykład 7 Rdzenie ARM, część3 · ARM1136 • ARM1136J(F)-S Processor: –Designedfor use as applications processors; includes many features of the ARM1176JZ(F)-S

Cortex-A9

Page 62: Systemy RT i embedded Wykład 7 Rdzenie ARM, część3 · ARM1136 • ARM1136J(F)-S Processor: –Designedfor use as applications processors; includes many features of the ARM1176JZ(F)-S

Cortex-A9• Main features:

– Architecture: ARM v7-A

– 2.5 DMIPS / MHz per core

– Single or Multicore versions available (1-4 cores)

– MMU

– ARM/Thumb/Thumb-2

– Jazelle

– DSP extension

– Advanced Multicore Technologies

– NEON MPE

– VFP v3

Page 63: Systemy RT i embedded Wykład 7 Rdzenie ARM, część3 · ARM1136 • ARM1136J(F)-S Processor: –Designedfor use as applications processors; includes many features of the ARM1176JZ(F)-S

Cortex-A9

• Main features:

– superscalar, variable length, out-of-order

pipeline with dynamic branch prediction

– two 64-bit AXI master interfaces with Master

0 for the data side bus and Master 1 for the

instruction side bus

– support for advanced power management

with up to 3 power domains

– Support for Preload Engine

Page 64: Systemy RT i embedded Wykład 7 Rdzenie ARM, część3 · ARM1136 • ARM1136J(F)-S Processor: –Designedfor use as applications processors; includes many features of the ARM1176JZ(F)-S

Cortex-A9 - options

Page 65: Systemy RT i embedded Wykład 7 Rdzenie ARM, część3 · ARM1136 • ARM1136J(F)-S Processor: –Designedfor use as applications processors; includes many features of the ARM1176JZ(F)-S

Cortex-A

Source: [3]

Page 66: Systemy RT i embedded Wykład 7 Rdzenie ARM, część3 · ARM1136 • ARM1136J(F)-S Processor: –Designedfor use as applications processors; includes many features of the ARM1176JZ(F)-S

Cortex-A

Page 67: Systemy RT i embedded Wykład 7 Rdzenie ARM, część3 · ARM1136 • ARM1136J(F)-S Processor: –Designedfor use as applications processors; includes many features of the ARM1176JZ(F)-S

Cortex-A9 – PLE

• PLE – Preload Engine - loads selected

regions of memory into L2

• PLE FIFO available

Page 68: Systemy RT i embedded Wykład 7 Rdzenie ARM, część3 · ARM1136 • ARM1136J(F)-S Processor: –Designedfor use as applications processors; includes many features of the ARM1176JZ(F)-S

ARM NEON

• The ARM® NEON™ general-purpose SIMD engine efficiently processes current and future multimedia formats, enhancing the user experience.

• NEON technology can accelerate multimedia and signal processing algorithms such as video encode/decode, 2D/3D graphics, gaming, audio and speech processing, image processing, telephony, and sound synthesis by at least 3x the performance of ARMv5 and at least 2x the performance of ARMv6 SIMD.

Page 69: Systemy RT i embedded Wykład 7 Rdzenie ARM, część3 · ARM1136 • ARM1136J(F)-S Processor: –Designedfor use as applications processors; includes many features of the ARM1176JZ(F)-S

ARM NEON

Source: [2]

Page 70: Systemy RT i embedded Wykład 7 Rdzenie ARM, część3 · ARM1136 • ARM1136J(F)-S Processor: –Designedfor use as applications processors; includes many features of the ARM1176JZ(F)-S

ARM NEON - features

• Supports wide range of multimedia

codecs:

– Many soft codec standards: MPEG-4, H.264,

On2 VP6/7/8, Real, AVS.....

– Ideal solution for normal size "internet

streaming" decode of various formats

– Not just for codecs - also applicable to 2D

and 3D graphics and other vector processing

– Off the shelf tools, OS support, and

ecosystem support

Page 71: Systemy RT i embedded Wykład 7 Rdzenie ARM, część3 · ARM1136 • ARM1136J(F)-S Processor: –Designedfor use as applications processors; includes many features of the ARM1176JZ(F)-S

ARM NEON - features

• Fewer cycles needed than in previous

versions:

– NEON will give 60-150% performance boost on

complex video codecs

– Individual simple DSP algorithms can show

larger performance boost (4x-8x)

– Processor can sleep sooner, resulting in

overall dynamic power saving

Page 72: Systemy RT i embedded Wykład 7 Rdzenie ARM, część3 · ARM1136 • ARM1136J(F)-S Processor: –Designedfor use as applications processors; includes many features of the ARM1176JZ(F)-S

ARM NEON - features

• SIMD and scalar single-precision floating-point

computation

• scalar double-precision floating-point

computation

• SIMD and scalar half-precision floating-point

conversion

• SIMD 8, 16, 32, and 64-bit signed and unsigned

integer computation

• 8 or 16-bit polynomial computation for single-

bit coefficients

Page 73: Systemy RT i embedded Wykład 7 Rdzenie ARM, część3 · ARM1136 • ARM1136J(F)-S Processor: –Designedfor use as applications processors; includes many features of the ARM1176JZ(F)-S

ARM NEON - features

• structured data load capabilities

• large, shared register file, addressable as:

– thirty-two 32-bit S (single) registers

– thirty-two 64-bit D (double) registers

– sixteen 128-bit Q (quad) registers

Page 74: Systemy RT i embedded Wykład 7 Rdzenie ARM, część3 · ARM1136 • ARM1136J(F)-S Processor: –Designedfor use as applications processors; includes many features of the ARM1176JZ(F)-S

ARM NEON - Operations

• Data operations include:

– addition and subtraction

– multiplication with optional accumulation

– maximum or minimum value driven lane selection

operations

– inverse square-root approximation

– comprehensive data-structure load instructions,

including register-bank-resident table lookup

Page 75: Systemy RT i embedded Wykład 7 Rdzenie ARM, część3 · ARM1136 • ARM1136J(F)-S Processor: –Designedfor use as applications processors; includes many features of the ARM1176JZ(F)-S

ARM NEON - features

• Other performance boosting features:– Aligned and unaligned data access allows for

efficient vectorization of SIMD operations.

– Clean instruction set architecture designed for autovectorizing compilers and hand coding.

– Efficient access to packed arrays such as ARGB or xyz coordinates

– Support for both integer and floating point operations ensures adaptability to a broad range of applications, from codecs to High Performance Computing to 3D graphics.

Page 76: Systemy RT i embedded Wykład 7 Rdzenie ARM, część3 · ARM1136 • ARM1136J(F)-S Processor: –Designedfor use as applications processors; includes many features of the ARM1176JZ(F)-S

ARM NEON - features

• Other perfomance boosting features:

– Tight coupling to the ARM processor provides

a single instruction stream and a unified view

of memory, presenting a single development

platform target with a simpler tool flow.

– The large NEON register file with its dual

128-bit/64-bit views enables efficient

handling of data and minimizes access to

memory, enhancing data throughput.

Page 77: Systemy RT i embedded Wykład 7 Rdzenie ARM, część3 · ARM1136 • ARM1136J(F)-S Processor: –Designedfor use as applications processors; includes many features of the ARM1176JZ(F)-S

Cortex-A15

Page 78: Systemy RT i embedded Wykład 7 Rdzenie ARM, część3 · ARM1136 • ARM1136J(F)-S Processor: –Designedfor use as applications processors; includes many features of the ARM1176JZ(F)-S

Cortex-A15• Main features:

– Architecture: ARM v7-A

– 1-4X SMP within a single processor cluster

– Multiple coherent SMP processor clusters through AMBA® 4 technology

– MMU

– ARM/Thumb-2

– DSP & SIMD extensions - increased perfromance

– Advanced Multicore Technologies

– NEON Advanced SIMD – increased perfromance

– VFP v4

Page 79: Systemy RT i embedded Wykład 7 Rdzenie ARM, część3 · ARM1136 • ARM1136J(F)-S Processor: –Designedfor use as applications processors; includes many features of the ARM1176JZ(F)-S

Cortex-A15

• Main features:

– ThrustZone Technologies

– Hardware virtualization support - highly efficient

hardware support for data management and arbitration,

whereby multiple software environments and their applications

are able to simultaneously access the system capabilities

– Large Physical Address Extensions (LPAE) -enables the processor to access up to 1TB of memory

Page 80: Systemy RT i embedded Wykład 7 Rdzenie ARM, część3 · ARM1136 • ARM1136J(F)-S Processor: –Designedfor use as applications processors; includes many features of the ARM1176JZ(F)-S
Page 81: Systemy RT i embedded Wykład 7 Rdzenie ARM, część3 · ARM1136 • ARM1136J(F)-S Processor: –Designedfor use as applications processors; includes many features of the ARM1176JZ(F)-S

big.LITTLE processing

Page 82: Systemy RT i embedded Wykład 7 Rdzenie ARM, część3 · ARM1136 • ARM1136J(F)-S Processor: –Designedfor use as applications processors; includes many features of the ARM1176JZ(F)-S

big.LITTLE processing

• Main features:

– Connection of „big” processor – Cortex-A15

and little, power efficient processor – Cortex-

A7

– Such combination simplifies connecting high

performance of a device (smartphone…) with

a long battery life

– Both processors can have 1-4 cores and

implements a single AMBA® 4 coherent

interface

Page 83: Systemy RT i embedded Wykład 7 Rdzenie ARM, część3 · ARM1136 • ARM1136J(F)-S Processor: –Designedfor use as applications processors; includes many features of the ARM1176JZ(F)-S

big.LITTLE processing

• Cortex-A7:

– Cortex-A7 is an in-order, non-symmetric dual-

issue processor with a pipeline length of

between 8-stages and 10-stages

Page 84: Systemy RT i embedded Wykład 7 Rdzenie ARM, część3 · ARM1136 • ARM1136J(F)-S Processor: –Designedfor use as applications processors; includes many features of the ARM1176JZ(F)-S

big.LITTLE processing

• Cortex-A15:

– Cortex-A15 is an out-of-order sustained triple-

issue processor with a pipeline length of

between 15-stages and 24-stages

Page 85: Systemy RT i embedded Wykład 7 Rdzenie ARM, część3 · ARM1136 • ARM1136J(F)-S Processor: –Designedfor use as applications processors; includes many features of the ARM1176JZ(F)-S

big.LITTLE processing

• Performance & Energy comparison

– the energy consumed by the execution of an

instruction is partially related to the number

of pipeline stages it must traverse

Page 86: Systemy RT i embedded Wykład 7 Rdzenie ARM, część3 · ARM1136 • ARM1136J(F)-S Processor: –Designedfor use as applications processors; includes many features of the ARM1176JZ(F)-S

big.LITTLE processing

• Performance & Energy comparison

– the energy consumed by the execution of an

instruction is partially related to the number

of pipeline stages it must traverse

Page 87: Systemy RT i embedded Wykład 7 Rdzenie ARM, część3 · ARM1136 • ARM1136J(F)-S Processor: –Designedfor use as applications processors; includes many features of the ARM1176JZ(F)-S

big.LITTLE processing

• Interconnection – System architecture

Page 88: Systemy RT i embedded Wykład 7 Rdzenie ARM, część3 · ARM1136 • ARM1136J(F)-S Processor: –Designedfor use as applications processors; includes many features of the ARM1176JZ(F)-S

big.LITTLE processing

• Task

Migratrion

Page 89: Systemy RT i embedded Wykład 7 Rdzenie ARM, część3 · ARM1136 • ARM1136J(F)-S Processor: –Designedfor use as applications processors; includes many features of the ARM1176JZ(F)-S

Cortex-A50 series

Page 90: Systemy RT i embedded Wykład 7 Rdzenie ARM, część3 · ARM1136 • ARM1136J(F)-S Processor: –Designedfor use as applications processors; includes many features of the ARM1176JZ(F)-S

Cortex-A50 Series

• The Cortex-A50 Series is the latest range

of processors based on the ARMv8

Architecture

• The series includes support for new

energy efficient 64-bit execution state

(AArch64) that operates alongside an

enhanced version of ARM’s existing 32-bit

execution state

• The Cortex-A50 Series comprises the A53

and A57 processors

Page 91: Systemy RT i embedded Wykład 7 Rdzenie ARM, część3 · ARM1136 • ARM1136J(F)-S Processor: –Designedfor use as applications processors; includes many features of the ARM1176JZ(F)-S

Cortex-A50 Series

• Cortex-A50 series processors are 32-bit

processors with 64-bit capability

• They deliver more performance for ARMv7

32-bit code in AArch32 execution state,

and offer support for 64-bit data and

larger virtual addressing space in AArch64

execution state

• Clean interworking between 32-bit and

64-bit is supported

Page 92: Systemy RT i embedded Wykład 7 Rdzenie ARM, część3 · ARM1136 • ARM1136J(F)-S Processor: –Designedfor use as applications processors; includes many features of the ARM1176JZ(F)-S

Cortex-A50 Series – Why 64-bits?

• An obvious reason for 64-bit is the support of

more than 4GB of physical memory

• In server and desktop applications, OS and

application software are frequently 64-bit today

• Support for 64-bit in ARMv8 will enable ARM

processors to become more broadly deployed in

server and desktop applications, and will

provide future-proof support for the eventual

migration of 64-bit operating systems to mobile

applications

Page 93: Systemy RT i embedded Wykład 7 Rdzenie ARM, część3 · ARM1136 • ARM1136J(F)-S Processor: –Designedfor use as applications processors; includes many features of the ARM1176JZ(F)-S

Cortex-A50 Series – processors

• Series consists currently of two models:

A53 and A57

• Both processors can operate

independently or be combined into an

big.LITTLE processing

• Both processors are fully compatible with

extensive ARM software assets

Page 94: Systemy RT i embedded Wykład 7 Rdzenie ARM, część3 · ARM1136 • ARM1136J(F)-S Processor: –Designedfor use as applications processors; includes many features of the ARM1176JZ(F)-S

Cortex-A53

series

Page 95: Systemy RT i embedded Wykład 7 Rdzenie ARM, część3 · ARM1136 • ARM1136J(F)-S Processor: –Designedfor use as applications processors; includes many features of the ARM1176JZ(F)-S

Cortex-A53

• The Cortex-A53 processor is ARM's most

efficient application processor ever

• This processor can deliver the compute

power of today's high-end smartphone, in

lowest power and area footprint, enabling

all-day battery life for typical device uses

• Cortex-A53 efficiently runs legacy ARM 32-

bit applications

Page 96: Systemy RT i embedded Wykład 7 Rdzenie ARM, część3 · ARM1136 • ARM1136J(F)-S Processor: –Designedfor use as applications processors; includes many features of the ARM1176JZ(F)-S

Cortex-A53

• Cortex-A53 features cache coherent interoperability with ARM Mali™ family graphics processing units (GPUs)

• Cortex-A53 connects seamlessly to ARM interconnect with up to 16 cores configurations with more in the future

• Cortex-A53 offers optional reliability and scalability features for high-performance enterprise applications

Page 97: Systemy RT i embedded Wykład 7 Rdzenie ARM, część3 · ARM1136 • ARM1136J(F)-S Processor: –Designedfor use as applications processors; includes many features of the ARM1176JZ(F)-S

Cortex-A53 - performance

Page 98: Systemy RT i embedded Wykład 7 Rdzenie ARM, część3 · ARM1136 • ARM1136J(F)-S Processor: –Designedfor use as applications processors; includes many features of the ARM1176JZ(F)-S

Cortex-A57 series

Page 99: Systemy RT i embedded Wykład 7 Rdzenie ARM, część3 · ARM1136 • ARM1136J(F)-S Processor: –Designedfor use as applications processors; includes many features of the ARM1176JZ(F)-S

Cortex-A57

• The Cortex-A57 processor is ARM's most

advanced, high-performance application

processor

• The Cortex-A57 processor efficiently runs

legacy ARM 32-bit applications

• Optional reliability and scalability

features for high-performance enterprise

applications

Page 100: Systemy RT i embedded Wykład 7 Rdzenie ARM, część3 · ARM1136 • ARM1136J(F)-S Processor: –Designedfor use as applications processors; includes many features of the ARM1176JZ(F)-S

Cortex-A57

• The Cortex-A57 processor features

interoperability with ARM Mali™ family

graphics processing units (GPUs) for GPU

compute applications

• The Cortex-A57 processor connects

seamlessly to ARM interconnect with up to

16 core configurations with more in the

future

Page 101: Systemy RT i embedded Wykład 7 Rdzenie ARM, część3 · ARM1136 • ARM1136J(F)-S Processor: –Designedfor use as applications processors; includes many features of the ARM1176JZ(F)-S

Cortex-A57 - performance

Page 102: Systemy RT i embedded Wykład 7 Rdzenie ARM, część3 · ARM1136 • ARM1136J(F)-S Processor: –Designedfor use as applications processors; includes many features of the ARM1176JZ(F)-S

Thank you for your attention

Page 103: Systemy RT i embedded Wykład 7 Rdzenie ARM, część3 · ARM1136 • ARM1136J(F)-S Processor: –Designedfor use as applications processors; includes many features of the ARM1176JZ(F)-S

References

[1] ARM11 core documentation; www.arm.com

[2] www.arm.com

[3] ARM9 family documentation; www.arm.com