unleash the dsp performance of arm · dolby® digital plus measured on arm juno board cortex-a57...

42
© 2017 Arm Limited Lionel Belnet | Senior Product Manager Unleash the DSP performance of Arm Cortex processors Arm Tech Symposia 2017

Upload: others

Post on 21-Jun-2020

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Unleash the DSP performance of Arm · Dolby® Digital Plus Measured on Arm Juno board Cortex-A57 @1.1 GHz Cortex-A53 @850 MHz Measured on MPS2 Cortex-M7 @25MHz 0 50 100 150 200 5.1-ch

© 2017 Arm Limited Lionel Belnet | Senior Product Manager

Unleash the DSP performance of Arm

Cortex processors

Arm Tech Symposia 2017

Page 2: Unleash the DSP performance of Arm · Dolby® Digital Plus Measured on Arm Juno board Cortex-A57 @1.1 GHz Cortex-A53 @850 MHz Measured on MPS2 Cortex-M7 @25MHz 0 50 100 150 200 5.1-ch

© 2017 Arm Limited 2

AgendaUnleash the DSP performance of Cortex processors

Introducing Arm Cortex technology for DSP applications

Selecting the right Cortex processor for your algorithms

Understanding NEON acceleration for actual and emerging use cases

1

2

3

Benefiting from a wide ecosystem for all Cortex processors4

Page 3: Unleash the DSP performance of Arm · Dolby® Digital Plus Measured on Arm Juno board Cortex-A57 @1.1 GHz Cortex-A53 @850 MHz Measured on MPS2 Cortex-M7 @25MHz 0 50 100 150 200 5.1-ch

© 2017 Arm Limited 3

The most widely deployed processing platformGaining traction in DSP applications

Page 4: Unleash the DSP performance of Arm · Dolby® Digital Plus Measured on Arm Juno board Cortex-A57 @1.1 GHz Cortex-A53 @850 MHz Measured on MPS2 Cortex-M7 @25MHz 0 50 100 150 200 5.1-ch

© 2017 Arm Limited 4

Addressing a wide range of performance points

NEON

Cortex-RCortex-M Cortex-A

Optimized DSP extensions(8-bit, 16-bit SIMD capability)

Designed for high-level operating systems

Designed for high performance,

hard real-time applications

Designed fordiscrete processing and

microcontrollers

Optimized DSP extensions(8-bit, 16-bit SIMD capability)

NEON

SVE

Incr

easi

ng

DSP

pe

rfo

rman

ce

Optimized DSP extensions(8-bit, 16-bit SIMD capability)

Page 5: Unleash the DSP performance of Arm · Dolby® Digital Plus Measured on Arm Juno board Cortex-A57 @1.1 GHz Cortex-A53 @850 MHz Measured on MPS2 Cortex-M7 @25MHz 0 50 100 150 200 5.1-ch

© 2017 Arm Limited 5

Pick the right CPU for your DSP algorithmExample use case

Dolby Digital+

Often run on dedicated DSPs

Dolby Audio Processing AC4

What if you could run it on a CPU?

Reduced software development costs

Simplified toolchain

Reduced system-level complexity

Development and BoM cost savings

Benefits of CPU with DSP capabilities

Page 6: Unleash the DSP performance of Arm · Dolby® Digital Plus Measured on Arm Juno board Cortex-A57 @1.1 GHz Cortex-A53 @850 MHz Measured on MPS2 Cortex-M7 @25MHz 0 50 100 150 200 5.1-ch

© 2017 Arm Limited 6

Select the most efficient CPU for your DSP use caseDolby® Digital Plus

Measured on Arm Juno board

Cortex-A57 @1.1 GHz

Cortex-A53 @850 MHz

Measured on MPS2

Cortex-M7 @25MHz 0

50

100

150

200

5.1-ch singledecode

5.1-ch singledecode to 2-ch

downmix

7.1-ch singledecode

5.1-ch dualdecode

7.1-ch main +5.1-ch assoc dual

decode

Req

uir

ed M

Hz

Dolby Digital Plus(required MHz, lower is better)

Cortex-A57 Cortex-A53 Cortex-M7

Page 7: Unleash the DSP performance of Arm · Dolby® Digital Plus Measured on Arm Juno board Cortex-A57 @1.1 GHz Cortex-A53 @850 MHz Measured on MPS2 Cortex-M7 @25MHz 0 50 100 150 200 5.1-ch

© 2017 Arm Limited 7

Run the latest advanced audio codec on your CPUDolby® performance requirements

Measured on Arm Juno board

Cortex-A57 @1.1 GHz

Cortex-A53 @850 MHz

0

50

100

150

200

250

Main decoding w/o rand memory Associated audio decoding w/o randmemory

MH

z (L

ow

er is

bet

ter)

Dolby® AC-4 performance requirements on Arm V8-A processors

Cortex-A53 Cortex-A57

Page 8: Unleash the DSP performance of Arm · Dolby® Digital Plus Measured on Arm Juno board Cortex-A57 @1.1 GHz Cortex-A53 @850 MHz Measured on MPS2 Cortex-M7 @25MHz 0 50 100 150 200 5.1-ch

© 2017 Arm Limited 8

Arm CPUs can handle demanding DSP workloads

Dolby Digital Plus Dolby Audio Processing Dolby AC4

Cortex-A + NEON

Cortex-M + DSP extension

Address new markets

and applications

Simpler systems and faster

time to market

Innovationthrough

collaboration

✓✓

Increasing performance requirements

Page 9: Unleash the DSP performance of Arm · Dolby® Digital Plus Measured on Arm Juno board Cortex-A57 @1.1 GHz Cortex-A53 @850 MHz Measured on MPS2 Cortex-M7 @25MHz 0 50 100 150 200 5.1-ch

© 2017 Arm Limited 9

Extending NEON computing to new use cases

Armv7-A/R NEON including:

• 32x64-bit register• 8-bit to 64-bit integer support• FP32 support

Armv8.0-A NEON including:

• AArch32 and AArch64• Optional cryptography• 32x128-bit register in AArch64• FP64 support

Armv8.2-A NEON including:

• FP16 support• 8-bit dot product instructions

Page 10: Unleash the DSP performance of Arm · Dolby® Digital Plus Measured on Arm Juno board Cortex-A57 @1.1 GHz Cortex-A53 @850 MHz Measured on MPS2 Cortex-M7 @25MHz 0 50 100 150 200 5.1-ch

© 2017 Arm Limited 10

The right processors for your DSP applicationMultimedia

1.5

1.3

10.9

0.60.5

0

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

Req

uir

ed M

Hz

(lo

wer

is b

ette

r)

FFMPEG (relative to Cortex-A53, lower is better)

Cortex-A7 Cortex-A35 Cortex-A53 Cortex-A55 Cortex-A73 Cortex-A75

Page 11: Unleash the DSP performance of Arm · Dolby® Digital Plus Measured on Arm Juno board Cortex-A57 @1.1 GHz Cortex-A53 @850 MHz Measured on MPS2 Cortex-M7 @25MHz 0 50 100 150 200 5.1-ch

© 2017 Arm Limited 11

FP16 FP16

1.000.90

0.700.59

0.440.35

0

0.2

0.4

0.6

0.8

1

1.2

Tim

e (l

ow

er is

bet

ter)

Harris Corners (relative to Cortex-A53)

Cortex-A53 (FP32) Cortex-A55 (FP32) Cortex-A55(FP16)

Cortex-A73 (FP32) Cortex-A75(FP32) Cortex-A75(FP16)

8-bit dot product

Enhanced architecture for emerging use cases

Computer Vision Machine Learning

1 1.2

2.5

5.5

0

1

2

3

4

5

6

MA

C/c

ycle

General Matrix Multiply

Cortex-A53 (FP32) Cortex-A55 (FP32)Cortex-A55 (FP16) Cortex-A55 (8-bit)

Page 13: Unleash the DSP performance of Arm · Dolby® Digital Plus Measured on Arm Juno board Cortex-A57 @1.1 GHz Cortex-A53 @850 MHz Measured on MPS2 Cortex-M7 @25MHz 0 50 100 150 200 5.1-ch

© 2017 Arm Limited 13

Pick the right Cortex-M for your DSP algorithmR

elat

ive

DSP

/MH

z p

erfo

rman

ce t

o C

ort

ex-M

4

0

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

1.8

2

CFFT Q31 RFFT Q31 CFFT F32 RFFT F32 FIR Q31 FIR F32

*estimate for Cortex-M33, all based on CMSIS-DSP libraryCortex-M7 total DSP performance ~2x of Cortex-M4 (due to higher max frequency)

Cortex-M4

Mainstream applications

Cortex-M7

High DSP performance, SP + DP FPU

Cortex-M33

Security in DSP applications,

co-processor IF

TrustZone

Page 14: Unleash the DSP performance of Arm · Dolby® Digital Plus Measured on Arm Juno board Cortex-A57 @1.1 GHz Cortex-A53 @850 MHz Measured on MPS2 Cortex-M7 @25MHz 0 50 100 150 200 5.1-ch

© 2017 Arm Limited 14

A versatile DSP ecosystem for Cortex-M

Fundamental DSP Functions on Cortex-M

– available for free!

Examples of ecosystem solutions and partners

CMSIS-DSP library

Transforms

Matrix functionsStatistical functions

Controller functions

Support functions

Interpolator functions

Complex math functions

Filters

Basic math functions

Fast math functions

Voice codecs

Image processing

Audio codecs

Keyword spotting

Sensor fusion

Motor control

Audio enhancement

Connectivity

Simulation tools

Page 15: Unleash the DSP performance of Arm · Dolby® Digital Plus Measured on Arm Juno board Cortex-A57 @1.1 GHz Cortex-A53 @850 MHz Measured on MPS2 Cortex-M7 @25MHz 0 50 100 150 200 5.1-ch

© 2017 Arm Limited 15

Use your CPU to unleash DSP to new marketsFoster innovation with partnerships in the world’s #1 ecosystem

Standardized architecture, proven in many markets and DSP applications

Simplifies software portability across different device solutions

Largest ecosystem of silicon vendors, compilers, tools, libraries and software

Save development and BOM cost by using a homogeneous system

Developer

Siliconvendors

Compiler& tools

Software

Operatingsystem

Page 16: Unleash the DSP performance of Arm · Dolby® Digital Plus Measured on Arm Juno board Cortex-A57 @1.1 GHz Cortex-A53 @850 MHz Measured on MPS2 Cortex-M7 @25MHz 0 50 100 150 200 5.1-ch

© 2017 Arm Limited 16

Summary

Cortex processors address a wide range of DSP performance points

• From high-end Cortex-A to efficient Cortex-M

Comprehensive ecosystem and library support for DSP application

• Simplifies and accelerates new use cases and applications

Continued investment in SIMD capabilities

• Strong roadmap for demanding future applications

Page 17: Unleash the DSP performance of Arm · Dolby® Digital Plus Measured on Arm Juno board Cortex-A57 @1.1 GHz Cortex-A53 @850 MHz Measured on MPS2 Cortex-M7 @25MHz 0 50 100 150 200 5.1-ch

© 2017 Arm Limited 17

Thank You

Page 18: Unleash the DSP performance of Arm · Dolby® Digital Plus Measured on Arm Juno board Cortex-A57 @1.1 GHz Cortex-A53 @850 MHz Measured on MPS2 Cortex-M7 @25MHz 0 50 100 150 200 5.1-ch

© 2017 Arm Limited

Page 19: Unleash the DSP performance of Arm · Dolby® Digital Plus Measured on Arm Juno board Cortex-A57 @1.1 GHz Cortex-A53 @850 MHz Measured on MPS2 Cortex-M7 @25MHz 0 50 100 150 200 5.1-ch

© 2017 Arm Limited

Page 20: Unleash the DSP performance of Arm · Dolby® Digital Plus Measured on Arm Juno board Cortex-A57 @1.1 GHz Cortex-A53 @850 MHz Measured on MPS2 Cortex-M7 @25MHz 0 50 100 150 200 5.1-ch

© 2017 Arm Limited

Page 21: Unleash the DSP performance of Arm · Dolby® Digital Plus Measured on Arm Juno board Cortex-A57 @1.1 GHz Cortex-A53 @850 MHz Measured on MPS2 Cortex-M7 @25MHz 0 50 100 150 200 5.1-ch

© 2017 Arm Limited

Section Divider Slide

Page 22: Unleash the DSP performance of Arm · Dolby® Digital Plus Measured on Arm Juno board Cortex-A57 @1.1 GHz Cortex-A53 @850 MHz Measured on MPS2 Cortex-M7 @25MHz 0 50 100 150 200 5.1-ch

© 2017 Arm Limited

Section Divider Slide

Page 23: Unleash the DSP performance of Arm · Dolby® Digital Plus Measured on Arm Juno board Cortex-A57 @1.1 GHz Cortex-A53 @850 MHz Measured on MPS2 Cortex-M7 @25MHz 0 50 100 150 200 5.1-ch

© 2017 Arm Limited

Section Divider Slide

Page 24: Unleash the DSP performance of Arm · Dolby® Digital Plus Measured on Arm Juno board Cortex-A57 @1.1 GHz Cortex-A53 @850 MHz Measured on MPS2 Cortex-M7 @25MHz 0 50 100 150 200 5.1-ch

© 2017 Arm Limited

Section Divider Slide

Page 25: Unleash the DSP performance of Arm · Dolby® Digital Plus Measured on Arm Juno board Cortex-A57 @1.1 GHz Cortex-A53 @850 MHz Measured on MPS2 Cortex-M7 @25MHz 0 50 100 150 200 5.1-ch

© 2017 Arm Limited 25

Page 26: Unleash the DSP performance of Arm · Dolby® Digital Plus Measured on Arm Juno board Cortex-A57 @1.1 GHz Cortex-A53 @850 MHz Measured on MPS2 Cortex-M7 @25MHz 0 50 100 150 200 5.1-ch

© 2017 Arm Limited 26

One Column Content

We utilize bullet level one as plain text becauseit’s meant to be written in paragraph form.

• Here we insert our first bullet

• Try to keep bullets short and to a minimum

– Next bullet level is slightly smaller for hierarchy

Page 27: Unleash the DSP performance of Arm · Dolby® Digital Plus Measured on Arm Juno board Cortex-A57 @1.1 GHz Cortex-A53 @850 MHz Measured on MPS2 Cortex-M7 @25MHz 0 50 100 150 200 5.1-ch

© 2017 Arm Limited 27

Two-up SlideWrite in your subtitle here

Column Headline

We utilize bullet level one as plain text because it’s meant to be written in paragraph form.

• Here we insert our first bullet

• Try to keep bullets short and to a minimum

– Next bullet level is slightly smaller for hierarch

Column Headline

We utilize bullet level one as plain text because it’s meant to be written in paragraph form.

• Here we insert our first bullet

• Try to keep bullets short and to a minimum

– Next bullet level is slightly smaller for hierarch

Page 28: Unleash the DSP performance of Arm · Dolby® Digital Plus Measured on Arm Juno board Cortex-A57 @1.1 GHz Cortex-A53 @850 MHz Measured on MPS2 Cortex-M7 @25MHz 0 50 100 150 200 5.1-ch

© 2017 Arm Limited 28

Three-up SlideWrite in your subtitle here

Column Headline Column Headline Column Headline

We utilize bullet level one as plain text becauseit’s meant to be written in paragraph form.

• Here we insert our first bullet

• Try to keep bullets short and to a minimum

– Next bullet level is slightly smaller for hierarch

We utilize bullet level one as plain text becauseit’s meant to be written in paragraph form.

• Here we insert our first bullet

• Try to keep bullets short and to a minimum

– Next bullet level is slightly smaller for hierarch

We utilize bullet level one as plain text becauseit’s meant to be written in paragraph form.

• Here we insert our first bullet

• Try to keep bullets short and to a minimum

– Next bullet level is slightly smaller for hierarch

Page 29: Unleash the DSP performance of Arm · Dolby® Digital Plus Measured on Arm Juno board Cortex-A57 @1.1 GHz Cortex-A53 @850 MHz Measured on MPS2 Cortex-M7 @25MHz 0 50 100 150 200 5.1-ch

© 2017 Arm Limited 29

Narrow Column and ContentWrite in your subtitle here

We utilize bullet level one as plain text becauseit’s meant to be written in paragraph form.

• Here we insert our first bullet

• Try to keep bullets short and to a minimum

– Next bullet level is slightly smaller for hierarch

Page 30: Unleash the DSP performance of Arm · Dolby® Digital Plus Measured on Arm Juno board Cortex-A57 @1.1 GHz Cortex-A53 @850 MHz Measured on MPS2 Cortex-M7 @25MHz 0 50 100 150 200 5.1-ch

© 2017 Arm Limited 30

Narrow Column and ContentWrite in your subtitle here

We utilize bullet level one as plain text becauseit’s meant to be written in paragraph form.

• Here we insert our first bullet

• Try to keep bullets short and to a minimum

– Next bullet level is slightly smaller for hierarch

Page 31: Unleash the DSP performance of Arm · Dolby® Digital Plus Measured on Arm Juno board Cortex-A57 @1.1 GHz Cortex-A53 @850 MHz Measured on MPS2 Cortex-M7 @25MHz 0 50 100 150 200 5.1-ch

© 2017 Arm Limited 31

Three Column SlideWith Image Placeholders

Column Headline Column Headline Column Headline

Page 32: Unleash the DSP performance of Arm · Dolby® Digital Plus Measured on Arm Juno board Cortex-A57 @1.1 GHz Cortex-A53 @850 MHz Measured on MPS2 Cortex-M7 @25MHz 0 50 100 150 200 5.1-ch

© 2017 Arm Limited 32

Two Columns with ImagesWrite in your subtitle here

We utilize bullet level one as plain text becauseit’s meant to be written in paragraph form.

• Here we insert our first bullet

• Try to keep bullets short and to a minimum

– Next bullet level is slightly smaller for hierarch

We utilize bullet level one as plain text becauseit’s meant to be written in paragraph form.

• Here we insert our first bullet

• Try to keep bullets short and to a minimum

– Next bullet level is slightly smaller for hierarch

Page 33: Unleash the DSP performance of Arm · Dolby® Digital Plus Measured on Arm Juno board Cortex-A57 @1.1 GHz Cortex-A53 @850 MHz Measured on MPS2 Cortex-M7 @25MHz 0 50 100 150 200 5.1-ch

© 2017 Arm Limited 33

Title, Subtitle and Content SlideWrite in your subtitle here

We utilize bullet level one as plain text becauseit’s meant to be written in paragraph form.

• Here we insert our first bullet

• Try to keep bullets short and to a minimum

– Next bullet level is slightly smaller for hierarch

Page 34: Unleash the DSP performance of Arm · Dolby® Digital Plus Measured on Arm Juno board Cortex-A57 @1.1 GHz Cortex-A53 @850 MHz Measured on MPS2 Cortex-M7 @25MHz 0 50 100 150 200 5.1-ch

© 2017 Arm Limited 34

Bar Chart Example

Page 35: Unleash the DSP performance of Arm · Dolby® Digital Plus Measured on Arm Juno board Cortex-A57 @1.1 GHz Cortex-A53 @850 MHz Measured on MPS2 Cortex-M7 @25MHz 0 50 100 150 200 5.1-ch

© 2017 Arm Limited 35

Three-up Chart SlideWrite in your subtitle here

Chart Headline Chart Headline Chart Headline

Page 36: Unleash the DSP performance of Arm · Dolby® Digital Plus Measured on Arm Juno board Cortex-A57 @1.1 GHz Cortex-A53 @850 MHz Measured on MPS2 Cortex-M7 @25MHz 0 50 100 150 200 5.1-ch

© 2017 Arm Limited 36

Bar Chart Example

Page 37: Unleash the DSP performance of Arm · Dolby® Digital Plus Measured on Arm Juno board Cortex-A57 @1.1 GHz Cortex-A53 @850 MHz Measured on MPS2 Cortex-M7 @25MHz 0 50 100 150 200 5.1-ch

© 2017 Arm Limited 37

Three-up Chart SlideWrite in your subtitle here

Chart Headline Chart Headline Chart Headline

Page 38: Unleash the DSP performance of Arm · Dolby® Digital Plus Measured on Arm Juno board Cortex-A57 @1.1 GHz Cortex-A53 @850 MHz Measured on MPS2 Cortex-M7 @25MHz 0 50 100 150 200 5.1-ch

© 2017 Arm Limited 38

2017 2018

Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec Jan Feb Mar

Release Schedule

We utilize bullet level one as plain text becauseit’s meant to be written in paragraph form.

• Here we insert our first bullet

• Try to keep bullets short and to a minimum

– Next bullet level is slightly smaller for hierarch

1980Product17wk10, w/e 10 Mar 1980

Product17wk24 w/e 16 Jun 1980

Product17wk38, w/e 22 Sep 1980

Product17wk50 , w/e 15 Dec

MWC27 Feb–2 Mar

EmbeddedWorld 14-16 Mar

APM8–10 Aug

TechCon24-26 Oct

Computex30 May–3 Jun

CES9-12 Jan

Page 39: Unleash the DSP performance of Arm · Dolby® Digital Plus Measured on Arm Juno board Cortex-A57 @1.1 GHz Cortex-A53 @850 MHz Measured on MPS2 Cortex-M7 @25MHz 0 50 100 150 200 5.1-ch

© 2017 Arm Limited 39

Color Palatte

RGB: 255, 107, 0 RGB: 255, 199, 0

RGB: 0, 193, 222 RGB: 0, 145, 189

RGB: 51, 62, 72 RGB: 125, 134, 140

RGB: 149, 214, 0

RGB: 0, 43, 73

RGB: 229, 2364, 235

Page 40: Unleash the DSP performance of Arm · Dolby® Digital Plus Measured on Arm Juno board Cortex-A57 @1.1 GHz Cortex-A53 @850 MHz Measured on MPS2 Cortex-M7 @25MHz 0 50 100 150 200 5.1-ch

© 2017 Arm Limited 40

Page 41: Unleash the DSP performance of Arm · Dolby® Digital Plus Measured on Arm Juno board Cortex-A57 @1.1 GHz Cortex-A53 @850 MHz Measured on MPS2 Cortex-M7 @25MHz 0 50 100 150 200 5.1-ch

4141 © 2017 Arm Limited

The Arm trademarks featured in this presentation are registered trademarks or trademarks of Arm Limited (or its subsidiaries) in the US and/or elsewhere. All rights reserved. All other marks featured may be trademarks of their respective owners.

www.arm.com/company/policies/trademarks

Page 42: Unleash the DSP performance of Arm · Dolby® Digital Plus Measured on Arm Juno board Cortex-A57 @1.1 GHz Cortex-A53 @850 MHz Measured on MPS2 Cortex-M7 @25MHz 0 50 100 150 200 5.1-ch

4242

Thank You!Danke!Merci!谢谢!ありがとう!Gracias!Kiitos!감사합니다धन्यवाद

© 2017 Arm Limited