codeplay ceo © copyright 2012 codeplay software ltd 45 york place edinburgh eh1 3hp united kingdom...

Codeplay CEO

45 York PlaceEdinburgh

EH1 3HPUnited Kingdom

Visit us atwww.codeplay.com

The unique challenges of producing compilers for GPUs

Andrew Richards

The GPU is taking over from the CPUWhy? How?

And what does this mean for the compiler developer?

Growth of the GPU in HPC

Source: NVIDIAhttp://blogs.nvidia.com/2011/11/gpu-supercomputers-show-exponential-growth-in-top500-list/

GPU Computing taking over Supercomputing conference floor

The growth of the GPU in mobile:

Apple’s A4-A6X

Source: Chipworkshttp://www.chipworks.com/en/technical-competitive-analysis/resources/recent-teardowns/2012/03/the-apple-a5x-versus-the-a5-and-a4-%E2%80%93-big-is-beautiful/

CPU CPU

GPUGPU

What is all this power being used for?• Motion blur• Depth of field• Bloom

1920x1080x60fpsx 3 (RGB) x 4x4 (sample) x 4 (flops) = ~23 GFLOPS & ~23GB/s

This is just a simple example!

Source: Guerrilla Games, Killzone 2

Why is this happening?1. Because once software is parallel, it might as well

be very parallel– The ease of programming reason

2. Because GPUs run existing graphics software much faster, whereas CPUs only run existing parallel software faster– The business reason

• Because of power consumption

History of Power consumption

198319861990199520012004200720082009

1,000W

PSXboxNintendox86Amiga

198319861990199520012004200720082009

1,000,000MHz

10,000,000MHz

100,000,000MHz

1,000,000,000MHz

10,000,000,000MHz

PSXboxNintendox86AmigaSega

We have probably hit peak power consumption with current console generation. Unlikely to hit >180W launch of next console generation. Also, hit peak clock frequency. Increases above 3.2GHz will happen slowly. Therefore, all future increases in performance will come from parallelism

Power consumption over time Increase in CPU clock frequency over time

How do we keep GPU power efficiency high?

• Cost of data movement is much higher than computation cost

• GPUs control data movement distances carefully

• Preserve locality explicitly instead of caching

Source: NVIDIA: Bill Dally’s presentation at SC10

What does this mean for the compiler developer?

CPUs• Widely understood and

standardized• Can test by running existing

software• Instruction sets only add new

instructions• Separated from hardware by OS• Only data-movement compiler

needs to handle is register/mem

GPUs• New technologies and standards

every year• Need to write new test software for

new features• New GPUs completely change ISAs• Compilers, drivers and OS tightly

integrated and developed rapidly• Need to handle data movement

explicitly

New Technologies and Standards

• New graphics standards need to be implemented very fast to be competitive

• Need to write new front-ends, libraries and runtimes very quickly

• OpenCL/OpenGL• DirectX/C++ AMP/

HLSL/DirectCompute• Renderscript• Proprietary graphics

technologies

Need to write new tests for new features• When writing a compiler for existing language, can run

existing software as tests• With a new standard, need to write new tests• GPUs have varying specifications of accuracy, meaning testing

needs to show whether ‘good enough’• Tests need to cover full graphics pipeline, as well as compute

capability, so not just purely compiler tests• Graphics and compiler test processes are very different

New GPUs completely change ISAs• GPUs are programmed in high-level languages, or in virtual

ISAs– So can change ISA and run old software– But correctness is a critical problem

• Need to write GPU back-ends very fast (1-2 years, instead of 1-20 years of CPU back-ends…)

• GPU back-ends are complex because of extent of optimizations for power and area

Compilers, drivers & OS tightly integrated• We have not standardized the interface between

GPU compilers and the OS or drivers– Instead, we standardize the API, compiler and driver as a

• CPU compilers can be written independently of the OS (mostly) and with little to no runtime API– But GPU compilers must be written in tandem with

runtime API, driver and OS

Need to handle data movement explicitly• Register allocation in a GPU compiler is complex

because of trade-offs for power and area– Typically there are multiple register files with different

• Memory handling is more complex– Typically there are multiple memory spaces with different

instructions– Affects both compiler front-end and back-end

What problems is Codeplay working on?• Higher-level C++ programming model for GPUs– Generic programming: parallel reduce algorithms– Abstracting details of GPU hardware: memory sizes, tile

sizes, execution models– Data structures shareable between host and device– Performance portability– Standardization

Conclusions

GPU compilers are little understood but critical to future innovation and performance

Don’t forget that GPUs are mostly for graphics!

Questions?

codeplay ceo © copyright 2012 codeplay software ltd 45 york place edinburgh eh1 3hp united kingdom...

new test software

existing graphics software

new instructionsseparated

new standard

new featureswhen

new testsgpus

new featuresnew gpus

compiler developer

Documents

unit 1 108a west bow edinburgh eh1 2hh · unit 1 , 108a...

distribuidora 3hp · created date: 12/12/2017 10:02:47 am

3hp cyclone - laguna tools...may 22, 2020 · 3. this 3hp...

wisbech access strategy phase 1 eh1: elm high road/a47...

rs510 series 200v 0.4~2.2kw (0.5~3hp) 400v 0.75~11kw...

distribuidora 3hp€¦ · created date: 5/20/2009 3:08:23...

harrlsonburg, virginia 3hp

eh1: sb topic 3 cotton textiles and the great divergence

português interaÇÕes medicamentosas com 3hp, …

michael wong codeplay software, vp of research and...

risc-v swerv™ eh1 programmer's reference manual · risc-v...

eh1 - reduced-order modelling for vibration energy...

experiences with a 3hp twin

eh1 le cybercafÉ · 2020. 11. 26. · eh1 le cybercafÉ...

3hp drug-drug interactions, including art · 2020. 4....

3hp mobile cyclone dust collector manual

cps33 1/3hp - champion pump

road experiences with a 3hp royal enfield

3hp airolator solar technical brochure

eh1 milano series - prensilia · eh1 milano series...