cool compute: strategies for managing power vs …...c++ amp hsail is a ... there is a move towards...

21
© Imagination Technologies p1 www.imgtec.com Doug Watt May 21, 2013 Cool Compute: Strategies for Managing Power vs Performance in Mobile Devices

Upload: others

Post on 14-Jul-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Cool Compute: Strategies for Managing Power vs …...C++ AMP HSAIL is a ... There is a move towards simplifying and standardising the use of all computing resources on board an SoC

© Imagination Technologies p1 www.imgtec.com

Doug Watt

May 21, 2013

Cool Compute: Strategies for Managing Power vs Performance in Mobile Devices

Page 2: Cool Compute: Strategies for Managing Power vs …...C++ AMP HSAIL is a ... There is a move towards simplifying and standardising the use of all computing resources on board an SoC

© Imagination Technologies p2

So what’s today’s problem?

First it was transistors…

There just were never enough to do everything we wanted but Moore’s law has been our

friend

Then it was bandwidth…

Which is still an issue but less so, and actually using it burns power which never did scale

with process

Now it’s thermal shutdown

The last couple of process generations broke the link between geometry and power scaling

Performance is now limited by thermal envelope of the device

Keeping all those transistors on long enough to enjoy their performance is the issue

Page 3: Cool Compute: Strategies for Managing Power vs …...C++ AMP HSAIL is a ... There is a move towards simplifying and standardising the use of all computing resources on board an SoC

© Imagination Technologies p3

It’s everyone’s problem

…and that’s just the CPU

But some have it worse than others so there are clearly ways to combat this

Public domain screen captures from http://www.youtube.com/watch?v=f4qu915Wj1U

Page 4: Cool Compute: Strategies for Managing Power vs …...C++ AMP HSAIL is a ... There is a move towards simplifying and standardising the use of all computing resources on board an SoC

© Imagination Technologies p4

How does this affect performance? Once the thermal limit is reached, power management kicks in

Thermal shutdown of GPU >25% performance hit

Data courtesy of Anandtech.com

Page 5: Cool Compute: Strategies for Managing Power vs …...C++ AMP HSAIL is a ... There is a move towards simplifying and standardising the use of all computing resources on board an SoC

© Imagination Technologies p5

Best way to get Maximum Performance?

(results of an actual experiment carried out by our friends at Futuremark!)

Page 6: Cool Compute: Strategies for Managing Power vs …...C++ AMP HSAIL is a ... There is a move towards simplifying and standardising the use of all computing resources on board an SoC

© Imagination Technologies p6

Looking for some solutions

Go Parallel

VLIW

SIMD

Multiple threads

Multiple cores

But you can’t just keep adding more cores…

They will just get shut down if they are power hogs

Brute force is a losing strategy

Go Parallel

Page 7: Cool Compute: Strategies for Managing Power vs …...C++ AMP HSAIL is a ... There is a move towards simplifying and standardising the use of all computing resources on board an SoC

© Imagination Technologies p7

Looking for some solutions

Modern SoCs are heterogeneous

Integrate CPU, GPU, DSP, ISP, video decoders, I/O interfaces

Different blocks optimized for different types of computation

Different SoCs provide different balances of heterogeneous resources for different

classes of product

Each application task can be targeted at the hardware block which can execute it

most efficiently

Granularity of tasks determined by architecture

Go Heterogeneous

Page 8: Cool Compute: Strategies for Managing Power vs …...C++ AMP HSAIL is a ... There is a move towards simplifying and standardising the use of all computing resources on board an SoC

© Imagination Technologies p8

Looking for some solutions

GPUs now employ the same ‘parallel features’ as CPUs

SIMD and Threads (SIMT)

Multiple cores

GPUs were special-purpose (shader) compute engines dedicated to graphics

But are now evolving into more general-purpose programmable devices

Enable area-efficient heterogeneous SoC designs that provide both

CPU for control and I/O (efficient latency processing)

GP-GPU for graphics and data crunching (efficient throughput processing)

Use the appropriate compute unit for each task – including the GPU

Page 9: Cool Compute: Strategies for Managing Power vs …...C++ AMP HSAIL is a ... There is a move towards simplifying and standardising the use of all computing resources on board an SoC

© Imagination Technologies p9

Heterogeneous CPU/GPU architectures

CPU

Execution Queues

Unified System Memory

SIMD Processing

Element

SIMD Processing

Element

Small Cache Small Cache

Large

Cache

CoreFew

Threads

CoreFew

Threads

GPU

Control

ManyThreads

ManyThreads

Page 10: Cool Compute: Strategies for Managing Power vs …...C++ AMP HSAIL is a ... There is a move towards simplifying and standardising the use of all computing resources on board an SoC

© Imagination Technologies p10

Mobile GPU compute must be “practical”

Use Case Example

Augmented reality Augmented reality shopping app

Computational Photography Post-processing effects (HDR, panoramic stitching)

Lens correction

Computer Vision Product recognition (display in webstore)

Automotive ADAS

Defence/Security systems

Audio Multi-microphone beamforming (noise reduction)

Video HEVC decoder

Real-time camera preview window effects

Risk Mitigation Kishonti and Google benchmarks

Page 11: Cool Compute: Strategies for Managing Power vs …...C++ AMP HSAIL is a ... There is a move towards simplifying and standardising the use of all computing resources on board an SoC

© Imagination Technologies p11

Mobile GPU compute must be “practical”

Practical compute is not…

Game physics (for the most part)

High-performance scientific computing

For many practical use cases, programmability is a requirement

Algorithms are constantly changing and improving

Standards take time to be ratified

GP-GPU architectures optimized for practical use cases provide the most efficient

performance-power profiles

Page 12: Cool Compute: Strategies for Managing Power vs …...C++ AMP HSAIL is a ... There is a move towards simplifying and standardising the use of all computing resources on board an SoC

© Imagination Technologies p12 www.imgtec.com

Page 13: Cool Compute: Strategies for Managing Power vs …...C++ AMP HSAIL is a ... There is a move towards simplifying and standardising the use of all computing resources on board an SoC

© Imagination Technologies p13

Mobile GPU compute must be “practical”

HPC Use Cases

(Full Profile)

Mobile and Embedded Use Cases

(Embedded Profile)

3D Images 64-bit floating point

BGRA channel order image formats

Image sharing with OpenGL ES

Built-in atomic functions

High-precision rounding support

Optimal GPU Design Parameters

Page 14: Cool Compute: Strategies for Managing Power vs …...C++ AMP HSAIL is a ... There is a move towards simplifying and standardising the use of all computing resources on board an SoC

© Imagination Technologies p14

GPU Compute is a POWER play! Using available heterogeneous resources saves energy

Trial run on TI OMAP 4 ‘Panda’ board;

Free running suite of image enhancement

functions, written in three versions.

Single and dual threaded CPU-only

versions allowed to saturate CPU;

GPU version in OpenCL 1.0 EP, with

minimal CPU loading. 0 0.1 0.2

Single Thread

Two Threads

GPU

Energy per processed frame

Page 15: Cool Compute: Strategies for Managing Power vs …...C++ AMP HSAIL is a ... There is a move towards simplifying and standardising the use of all computing resources on board an SoC

© Imagination Technologies p15

Heterogeneous System Architecture

We are on an evolution path towards processor designs that combine…

Efficient latency processing (CPU)

Efficient throughput processing (GPU)

Tighter integration of all heterogeneous components (CPU, GPU, DSP, ISP, …)

Need to analyse performance of emerging use cases to determine the right balance

of hardware resources for each new processor design – for example

Local memory – quantity and type (registers, SRAM)

ALU – quantity, type (int, float), precision (rtn, rtz)

ISA – instructions that enable efficient compilation of application code

Page 16: Cool Compute: Strategies for Managing Power vs …...C++ AMP HSAIL is a ... There is a move towards simplifying and standardising the use of all computing resources on board an SoC

© Imagination Technologies p16

Heterogeneous System Architecture

CPU-GPU coherency on mobile

processors today is one-way

GPU can set flags on memory accesses,

indicating data it’s fetching may already

be within CPU cache

Infrastructure will snoop the CPU cache

before looking in system memory

A few use cases for graphics exist, but

not compute

Today’s Mobile CPU+GPU designs are loosely integrated

Page 17: Cool Compute: Strategies for Managing Power vs …...C++ AMP HSAIL is a ... There is a move towards simplifying and standardising the use of all computing resources on board an SoC

© Imagination Technologies p17

Heterogeneous System Architecture

HSA should reduce overall system power

Infrastructure adds area and power

Zero-copy throughout entire system reduces bandwidth

Tomorrow’s Mobile CPU+GPU designs will be more tightly integrated

Page 18: Cool Compute: Strategies for Managing Power vs …...C++ AMP HSAIL is a ... There is a move towards simplifying and standardising the use of all computing resources on board an SoC

© Imagination Technologies p18

Heterogeneous System Architecture Simplifies effective use of heterogeneous computing

Designed for C99, C++ 2011,

Java, Renderscript, OpenCL,

C++ AMP

HSAIL is a virtual ISA for

parallel programs

Finalized to ISA by a JIT

compiler or “Finalizer”

ISA independent by

design for CPU & GPU

Explicitly parallel

Designed for data parallel

programming CPU(s) GPU(s) Other Accelerators

HSA Finalizer

Legacy Driver

Application

Domain Specific Libs (Bolt, OpenCV™, … many others)

HSA Runtime

Application

SW

Drivers

Differentiated HW

OpenGL-ES Runtime

Other Runtime

HSAIL

GPU ISA

Renderscript /OpenCl

Runtimes

HSA Software

Kernel Driver

Ctl

Legacy Driver

Dalvik Runtime

Page 19: Cool Compute: Strategies for Managing Power vs …...C++ AMP HSAIL is a ... There is a move towards simplifying and standardising the use of all computing resources on board an SoC

© Imagination Technologies p19

Heterogeneous System Architecture

IP Vendor Silicon

Vendor

OEM App

Writer

3D Graphics CPU Use Cases Augmented Reality Photography Computer Vision Audio Processing Video Processing

Segment types and sizes illustrative only

Page 20: Cool Compute: Strategies for Managing Power vs …...C++ AMP HSAIL is a ... There is a move towards simplifying and standardising the use of all computing resources on board an SoC

© Imagination Technologies p20

Summary

Thermal envelope has become the limiting factor in performance

It is necessary to use all the tools at our disposal to address this

Macro, micro architecture, dynamic resource management and application partitioning all

play a role

There is a move towards simplifying and standardising the use of all computing resources

on board an SoC

What works on the desktop does not always work for embedded devices

It is still necessary to use resources wisely, going for maximum precision, maximum

dynamic range is never a no-brainer

The future is increasingly heterogeneous

Page 21: Cool Compute: Strategies for Managing Power vs …...C++ AMP HSAIL is a ... There is a move towards simplifying and standardising the use of all computing resources on board an SoC

© Imagination Technologies p21 www.imgtec.com

Doug Watt

May 21, 2013

Cool Compute: Strategies for Managing Power vs Performance in Mobile Devices