building the future of 64-bit computing with armv8-a · pdf filebuilding the future of 64-bit...

31
1 Building the future of 64-bit computing with ARMv8-A Ian Smythe June 2014

Upload: lehanh

Post on 11-Feb-2018

214 views

Category:

Documents


1 download

TRANSCRIPT

1

Building the future of 64-bit computing

with ARMv8-A

Ian Smythe

June 2014

2

Introduction to ARMv8- A Architecture and Instruction Set Enhancements

Designing efficient systems with ARMv8-A

Driving success by building on the ARM® software ecosystem

Agenda:

3

ARM’s business model has fostered a wave of innovation in mobile devices

Advanced personal computers are becoming affordable to all

Datacentre and network operators are turning to ARM solutions to drive efficiency

Smart Mobile Device Shipments

(Smartphones and Tablets)

ARM and Gartner Estimates, CAGR figures based on 2013

Vo

lum

e in m

illio

ns

Entry-level Mid-range Premium

At The Heart Of Modern Computing

0

500

1000

1500

2000

2500

2013 2015 2018

<$150 >$400

0

10

20

30

40

50

2012 2020

Global Data Creation (Zetabytes)

Computer Science Group 2013

4

ARM7TDMI ARM1176 Cortex®-A9 Cortex-A50 series ARM926EJ

ARMv4

Increasing SoC complexity

Increasing OS complexity

Increasing choice of HW and SW

Architecture Evolution

1995 2005 2015

Virtualization

5

ARMv8-A Design Requirements

Extend OS capabilities to sub-$100 devices

Performance apps

Enhanced multimedia processing

64-bit memory addressing Virtualisation High bandwidth Enable innovation for hyperscale operators

Entry-level

Computing

High-end

Enterprise

‘Desktop Class’

Computing

6

ARMv8-A Instruction Set Enhancements

ARMv8-A is one of the most significant architecture changes in ARM’s history

ARMv8-A ARMv7-A

ARMv7-A Compatible

A32+T32 A64

CRYPTO

Scalar FP

Advanced SIMD

AArch32 AArch64

Applications

and software

ARMv8 extends ARM’s 32-bit code, whilst

being fully compatible with ARMv7

Addressing emerging software trends

AARCH32: Evolution of 32-bit

Ideal for concurrent programming

C11, C++11 Java5

More efficient, high-performance thread-safe software

Enhanced security and encryption

AARCH64: Efficient 64-bit execution Clean instruction set

Modern compiler friendly

Reduced complexity for operating systems, hypervisors

Designed to maximize reuse of existing hardware

7

ARMv8-A Architecture Designed for Efficiency Enhancement Why it Matters

64-bit architecture

Increased number and size of

general purpose registers

Efficient access to large datasets

Gains in performance and code efficiency

Large Virtual Address Space Applications not limited to 4GB memory

Large memory mapped files handled efficiently

Efficient 32-bit/64-bit architecture Common software architecture (phone, tablet, clamshell)

A single software model across the entire portfolio

Double the number and size of

NEON™ registers

Enhanced capacity of SIMD multimedia engine

Cryptography support

Over10x software encryption performance

New security models for consumer and enterprise

8

Rapidly Evolving Consumer Requirements

Secure Digital World

Business & Productivity

Natural Language

Sleeker and Slimmer form factors

9

Tablets Have Changed Computing

0

100

200

300

400

500

600

2010 2011 2012 2013 2014

Mo

bile C

om

pu

tin

g V

olu

me (

Mu

)

ARM Tablets RoW

ARM Tablets China SoC

x86 Laptops

10

ARMv8-A Networking Opportunity

Mobile

Subscriber

Office /

Home User

Industrial /

Internet of

Things

n x Cortex-A57/A53

Cortex-R/M

+ Partner IP

n x Cortex-A57

+ Partner IP

n x Cortex-A57

n x Cortex-A53

+ Partner IP

“C”-Programmable,

Software Defined,

Heterogeneous Compute

Aggregation Network

Wid

e R

ange

of D

esi

gn P

oin

ts

Core Networking & Server

Access Network

11

Enhanced Privacy, Security And Personalization

ARM security framework with TrustZone® is available in all ARMv7-A and ARMv8-A processors.

ARM security and virtualization framework is available in ARMv8-A and ARMv7-A processors launched since 2010

ARM7TDMI ARM1176 Cortex-A9 Cortex-A50 series ARM926EJ

ARMv4

Virtualization

12

Premium 64b/32b

performance

Industry leading premium

performance cores aimed at

top range 32b/64b SoCs

Mid range 32-bit mobile Premier performance for

midrange cost and power in

smartphones and consumer

devices

High Efficiency

CPUs Delivering industry

leading power efficiency

enabling low cost, low

power, and maximum

efficiency for standalone and

big.LITTLE combinations

Cortex-A7*

ARM big.LITTLETM processing solutions portfolio

ARMv8

13+ stage out-of-order

Triple Decode, Wide Multi-issue

ARMv8

8 stage in-order

Dual issue

Cortex-A53

ARMv7

Out-of-order

ARMv7

8 stage in-order

Partial Dual issue

Cortex-A7

2015 2014

Cortex-A15

Cortex-A7

Cortex-A57

Cortex-A53

Cortex-A17

Cortex-A7

ARMv7

8 stage in-order

Partial Dual issue

Cortex-A7

ARMv7

8 stage in-order

Partial Dual issue

Market Drivers

ARMv8

8 stage in-order

Dual issue

Cortex-A53

13

Highest single-threaded performance

Out-of-order, multi-issue pipeline tuned for modern workloads

Streamlined 64b architecture w/ improved 32b performance

10x speedup in encryption with new cryptography extensions

Premium Mobile feature set

Maximum performance in smart phone power envelope

big.LITTLE compatible for extended dynamic range of operation

Multicore scalability via AMBA® 4 ACE

Enterprise class feature set

64 bit , RAS/ECC, Virtualization support

High Throughput floating point unit

Multicore scalability via AMBA 5 CHI

ARM Cortex-A57: Highest Performance ARMv8-A core

14

Highest Efficiency Core with ARM V8 Architecture

In-order pipeline, balanced design

10x performance improvement for encryption and SIMD

Hardware controlled power retention

Versatile Efficient Core

Most optimized dual issue in order architecture for power performance balance

Numerous configuration options adopting to different application needs

Enhanced for Enterprise Applications

Area scales from 6.4mm2 – 0.7mm2

Enhanced high bandwidth memory subsystem

RAS support for networking work load

ARM Cortex-A53: Delivering More For Less

ARM® Cortex® -A53 MPCore

4

3

2

128-bit AMBA®4 ACE or

AMBA5 CHI Coherent Bus Interface

SCU

Cortex-A53 Core

ARMv8

32b/64b Core

NEON

SIMD engine

Floating

Point Unit

8-64K I-Cache,

Optional Parity

8-64K D-Cache,

Optional ECC

Core 1

L2 Cache w/ Optional ECC (128KB – 2MB)

CoreSight™ Multicore Debug and Trace

SPECint2000

Octane

Bbench

Cortex-A53 Cortex-A7

Cortex-A53 Performance Improvement over Cortex-A7

@ same frequency

15

0.0x

0.5x

1.0x

1.5x

2.0x

Cortex-A7 Cortex-A53 Cortex-A53

0.0x

0.5x

1.0x

1.5x

2.0x

Cortex-A15 Cortex-A57 Cortex-A57

ARMv8-A AArch64 Performance Improvements

ARMv8-A processors are fully compatible with ARMv7-A software

Existing ARMv7-A 32-bit software runs faster on today’s ARMv8-A processors

Browsing-related workloads

Same process

technology node

Same process

technology node

Target process

technology node Target process

technology node

Rela

tive p

erf

orm

an

ce

16

Building the next generation mobile computing devices

Mali-T72x

GPU L2

DMC+DDR

L2

Cortex-A53

Cortex-A57

CCI-400 Cache Coherent Interconnect

Superphone

Premium Computing Systems

Notebook performance for any screen size

2.5K tablets, 4K monitors

Virtualization and TrustZone support enable a secure,

multi-profile device

Always on user experience with improved battery life

Mass Market Computing Systems

big.LITTLE 2.4 and 4.4 configurations

1080p driving 4K, compatibility with 4K camera

Standalone quad and octa Cortex-53

platforms possible based on segment

Process technology: 28nm down to 14nm

DMC+DDR

Tablet/Clamshell

CCI-400 Cache Coherent Interconnect

MaliTM-T76x

GPU L2

Cortex-A57

L2

Cortex-A53

17

Premium Mobile System Design

GIC-500

I/O Coherent

Masters

Cortex-A57 Cortex-A53

Memory System LPDDR3 Peripherals

Media Subsystem

MMU-500

MMU-500

DRAM

VP-500DP-500

NIC-400

NIC-400

Mali T760

GPU

CoreLink CCI-400

TZC-400

18

Mid-range Mobile System Design

CCI-400

Mali T720

GPU

GIC-400

I/O Coherent

Masters

Cortex-A17 Cortex-A7

Memory System LPDDR3 Peripherals

Media Subsystem

MMU-500

MMU-500

DRAM

Mali-

V500

Mali-

DP500

NIC-400

NIC-400

TZC-400

19

Entry-Level Mobile 2015 Example System

Cortex-A53 based heterogeneous

system

2x Cortex-A53 Clusters

1x Tuned for maximum performance

point

1 x Tuned for highest efficiency

Mid-range Area Optimized GPU

4 Shader cores

CCI-400

Mali-T720

GPU

GIC-400

I/O Coherent

Masters

Cortex-A53* Cortex-A53

Memory System LPDDR3 Peripherals

Media Subsystem

DRAM

Mali-

V500

Mali-

DP500

NIC-400

NIC-400

TZC-400

20

CPU Sub-Systems in Set Top Box and OTT devices

Dual Cortex-A9

40nm, 600Mhz-1.2Ghz Quad Cortex-A7

• Single cluster design

• Low cost (die, power)

big.LITTLE Octa Cortex A15/Cortex-A7

• Dual cluster design with CCI-400

• Well suited for mixed workloads

• System coherency

Quad Cortex-A12/17

• Performance leadership in the mid-range

• Most area and cost-efficient solution

• 60% performance uplift over Cortex-A9

ARMv8 (Cortex A53/57 & ISA)

• Single or multi-cluster design

• 64-bit,32bit

• 30% increase in performance

2012 2013 2014 2015

Quad Cortex-A9

28nm, 1.4Ghz -1.2Ghz

Octa core : big.LITTLE

Cortex-A15/Cortex-A7

ARM v7 big.LITTLE

Cortex A17/ Cortex-A7

Cortex-A15/Cortex-A7

Quad Cortex-A17

28nm, 1.4Ghz -1.2Ghz

Cost effective mass production

Dual/Quad Cortex-A7

Ideal for entry-level

smart TV

ARMv8, big.LITTLE

Cortex-A57/Cortex-A53

advanced interconnect

Quad Cortex-A15

1.2-1.5Ghz

21

Infrastructure System Design

DSPDSP

ACE

NIC-400

Flash GPIO

NIC-400

USB

Interrupt Control

CoreLink GIC-500

CoreLink

DMC-520

AHB

PCIe

10-40

GbE

DPI Crypto

CoreLink™ CCN-508 Cache Coherent Network

DSP SATA

CoreLink

DMC-520

CoreLink

DMC-520

CoreLink

DMC-520

PCIe

DPI

I/O Virtualisation CoreLink MMU-500

Cortex-A57 Cortex-A57 Cortex-A57 Cortex-A57

Cortex-A53 Cortex-A53 Cortex-A53 Cortex-A53

NIC-400

Memory System DDR4/3

DRAM

22

ARMv8-A for Software and System Developers

ARM Compiler 6 for ARMv8-A

ARM Fast Models Open Source Tools SW Evolution

DS-5 Ultimate Edition

Full suite of

professional software

development tools

Full ARMv8-A support

Validated by lead

partners for > 2 years

SW Evolution

Software partners

driving more 64bit

optimizations

Test silicon available

Server Base System

Architecture

Linux Kernel and tools

Open source tools and

compilers

Linux kernel support

Custom virtual

platforms

Platform for early

software development

Reduce

time-to-market

Bring-up silicon in days

Today

23

2014 2013 2012

AOSP on 64-bit platforms History

32-bit AOSP on

an AArch64 Kernel

64-bit AOSP - VM Prototype

AArch64 Webkit

64-bit AOSP

(Jellybean)

64-bit AOSP 64bit on HW

(Jellybean)

64-bit ART Booting (KitKat)

First 64-bit Kernel patches Start of upstreaming to AOSP

Collaboration with Google on ART Begins

AArch64 Google V8 merged Code Published

Achievements

24

Enabling the developer ecosystem to build on ARM

Linux & Android

OS awareness

System-wide performance

analysis and optimization

Userspace app optimization

Kernel profiling

CPU & MCU

Mali GPU (OpenCL & OpenGL ES)

Real-world energy consumption

Comprehensive

device debug support

Off-the-shelf dev boards

Custom devices

big.LITTLE & multicore

Debug over USB & TCP/IP

Making software as efficient as the hardware it’s running on

Process & thread activity

Native app debug

Runs on ARM FVPs

Reversible debugging

25

Building on the ARM ecosystem for Android

0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

7/1/2013 8/1/2013 9/1/2013 10/1/2013 11/1/2013 12/1/2013 1/1/2014 2/1/2014 3/1/2014 4/1/2014 5/1/2014

ARM Native Only ARM & x86 Native Bytecode Only Unknown

26

x86 Android App Compatibility - China

Chinese Android app ecosystem is predominately optimized for ARM.

Source: http://www.igao7.com/x86.html

27

200+ China Ecosystem Partners

Enabling the ARM software ecosystem

28

Enabling the next generation of mobile experience

Software for professionals demanding compute intensive tasks:

for e.g. modelling tools (AutoCAD)& RAW image editing

Enterprise productivity suite for word processing, data

crunching, database management and editing on the go.

Productivity and Compute Tools Tools for Professionals

Graphics capability becoming a key factor in consumer

purchasing decisions

Driving 4K video/ Digital life experience

Combine ARM Cortex and Mali processors into an efficient

unified computing subsystem

Realistic, Immersive gaming

29

The ARMv8-A architecture is a major step

forward for mobile computing

ARMv8-A processors and systems are being

deployed today

In production and in development

Bringing high end capabilities to mobile systems

and beyond

The ARMv8-A software ecosystem provides

the broadest access to devices

With uncompromising performance

A Premium Experience For All Devices

30

ARMv8-A Everywhere From entry-level smartphones to high-end servers

31

Thank you

The trademarks featured in this presentation are registered and/or unregistered trademarks of ARM Limited (or its subsidiaries) in the EU

and/or elsewhere. All rights reserved. Any other marks featured may be trademarks of their respective owners