heterogeneous system architecture · 2012-09-14 · heterogeneous system architecture ... bolt...

26
HETEROGENEOUS SYSTEM ARCHITECTURE Mike Houston AMD Fellow Platform of the Future

Upload: others

Post on 25-Apr-2020

8 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: HETEROGENEOUS SYSTEM ARCHITECTURE · 2012-09-14 · HETEROGENEOUS SYSTEM ARCHITECTURE ... Bolt library works with OpenCL or C++ AMP Enjoy the unique advantages of the HSA platform

HETEROGENEOUS SYSTEM

ARCHITECTURE

Mike Houston

AMD

Fellow

Platform of the Future

Page 2: HETEROGENEOUS SYSTEM ARCHITECTURE · 2012-09-14 · HETEROGENEOUS SYSTEM ARCHITECTURE ... Bolt library works with OpenCL or C++ AMP Enjoy the unique advantages of the HSA platform

2 | XLDB - Stanford | Sept. 12, 2012

A NEW ERA OF PROCESSOR PERFORMANCE

?

Sin

gle

-thre

ad

Perf

orm

ance

Time

we are

here

Enabled by:

Moore’s Law

Voltage

Scaling

Constrained by:

Power

Complexity

Single-Core Era

Modern

Applic

ation

Perf

orm

ance

Time (Data-parallel exploitation)

we are

here

Heterogeneous

Systems Era

Enabled by:

Abundant data

parallelism

Power efficient

GPUs

Temporarily

Constrained by:

Programming

models

Comm.overhead

Thro

ughput

Perf

orm

ance

Time (# of processors)

we are

here

Enabled by:

Moore’s Law

SMP

architecture

Constrained by:

Power

Parallel SW

Scalability

Multi-Core Era

Assembly C/C++ Java … pthreads OpenMP / TBB … Shader CUDA OpenCL !!!

Page 3: HETEROGENEOUS SYSTEM ARCHITECTURE · 2012-09-14 · HETEROGENEOUS SYSTEM ARCHITECTURE ... Bolt library works with OpenCL or C++ AMP Enjoy the unique advantages of the HSA platform

3 | XLDB - Stanford | Sept. 12, 2012

Most parallel code runs on CPUs designed for scalar workloads

Page 4: HETEROGENEOUS SYSTEM ARCHITECTURE · 2012-09-14 · HETEROGENEOUS SYSTEM ARCHITECTURE ... Bolt library works with OpenCL or C++ AMP Enjoy the unique advantages of the HSA platform

4 | XLDB - Stanford | Sept. 12, 2012

HETEROGENEOUS SYSTEM ARCHITECTURE – AN OPEN PLATFORM

Open Architecture, published specifications

HSAIL virtual ISA

HSA memory model

HSA dispatch

ISA agnostic for both CPU and GPU

Inviting partners to join us, in all areas

Hardware companies

Operating Systems

Tools and Middleware

Applications

Page 5: HETEROGENEOUS SYSTEM ARCHITECTURE · 2012-09-14 · HETEROGENEOUS SYSTEM ARCHITECTURE ... Bolt library works with OpenCL or C++ AMP Enjoy the unique advantages of the HSA platform

5 | XLDB - Stanford | Sept. 12, 2012

www.hsafoundation.com

… to define the next generation

of computing platforms for all devices

Page 6: HETEROGENEOUS SYSTEM ARCHITECTURE · 2012-09-14 · HETEROGENEOUS SYSTEM ARCHITECTURE ... Bolt library works with OpenCL or C++ AMP Enjoy the unique advantages of the HSA platform

6 | XLDB - Stanford | Sept. 12, 2012

Make the unprecedented processing

capability of the APU as accessible to

programmers as the CPU is today

Dramatically expand the APU software

ecosystem in client and server

Enable immersive applications whether

hosted locally or in the cloud

GOALS

Page 7: HETEROGENEOUS SYSTEM ARCHITECTURE · 2012-09-14 · HETEROGENEOUS SYSTEM ARCHITECTURE ... Bolt library works with OpenCL or C++ AMP Enjoy the unique advantages of the HSA platform

7 | XLDB - Stanford | Sept. 12, 2012

APU: ACCELERATED PROCESSING UNIT

The APU has arrived and it is a great advance

over previous platforms

Combines scalar processing on CPU with

parallel processing on the GPU and high

bandwidth access to memory

How do we make it even better going forward?

Easier to program

Easier to optimize

Easier to load balance

Higher performance

Lower power

Page 8: HETEROGENEOUS SYSTEM ARCHITECTURE · 2012-09-14 · HETEROGENEOUS SYSTEM ARCHITECTURE ... Bolt library works with OpenCL or C++ AMP Enjoy the unique advantages of the HSA platform

8 | XLDB - Stanford | Sept. 12, 2012

HETEROGENEOUS SYSTEM ARCHITECTURE ROADMAP

Page 9: HETEROGENEOUS SYSTEM ARCHITECTURE · 2012-09-14 · HETEROGENEOUS SYSTEM ARCHITECTURE ... Bolt library works with OpenCL or C++ AMP Enjoy the unique advantages of the HSA platform

9 | XLDB - Stanford | Sept. 12, 2012

ACCELERATING MEMCACHED

CLOUD SERVER WORKLOAD

Page 10: HETEROGENEOUS SYSTEM ARCHITECTURE · 2012-09-14 · HETEROGENEOUS SYSTEM ARCHITECTURE ... Bolt library works with OpenCL or C++ AMP Enjoy the unique advantages of the HSA platform

10 | XLDB - Stanford | Sept. 12, 2012

DATACENTER WORKLOAD

Generally used for short-term storage and caching, handling requests

that would otherwise require database or file system accesses

Used by Facebook, YouTube, Twitter, Wikipedia, Flickr, and others

Effectively a large distributed hash table

Responds to store and get requests received over the network

Conceptually:

store(key, object)

object = get(key)

Page 11: HETEROGENEOUS SYSTEM ARCHITECTURE · 2012-09-14 · HETEROGENEOUS SYSTEM ARCHITECTURE ... Bolt library works with OpenCL or C++ AMP Enjoy the unique advantages of the HSA platform

11 | XLDB - Stanford | Sept. 12, 2012

100%

80%

60%

40%

20%

0 0

1

2

3

4

Key Look Up Performance Execution Breakdown

Data Transfer Execution

OFFLOADING MEMCACHED KEY LOOKUP TO THE GPU

T. H. Hetherington, T. G. Rogers, L. Hsu, M. O’Connor, and T. M. Aamodt, “Characterizing and Evaluating a Key-Value Store Application on Heterogeneous CPU-GPU Systems,”

Proceedings of the 2012 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS 2012), April 2012.

http://ieeexplore.ieee.org/xpl/articleDetails.jsp?tp=&arnumber=6189209

Multithreaded CPU Radeon HD 5870 “Trinity” A10-5800K Zacate E-350

Page 12: HETEROGENEOUS SYSTEM ARCHITECTURE · 2012-09-14 · HETEROGENEOUS SYSTEM ARCHITECTURE ... Bolt library works with OpenCL or C++ AMP Enjoy the unique advantages of the HSA platform

12 | XLDB - Stanford | Sept. 12, 2012

ACCELERATING JAVA

GOING BEYOND NATIVE LANGUAGES

Page 13: HETEROGENEOUS SYSTEM ARCHITECTURE · 2012-09-14 · HETEROGENEOUS SYSTEM ARCHITECTURE ... Bolt library works with OpenCL or C++ AMP Enjoy the unique advantages of the HSA platform

13 | XLDB - Stanford | Sept. 12, 2012

JAVA ENABLEMENT BY APARAPI

Developer creates Java™ source Source compiled to class files (bytecode)

using standard compiler (javac)

Classes packaged and deployed using established Java™ tool chain

Aparapi = Runtime capable of converting Java™ bytecode to OpenCL™

For execution on any OpenCL™ 1.1+ capable device

OR execute via a thread pool if OpenCL™ is not available

Page 14: HETEROGENEOUS SYSTEM ARCHITECTURE · 2012-09-14 · HETEROGENEOUS SYSTEM ARCHITECTURE ... Bolt library works with OpenCL or C++ AMP Enjoy the unique advantages of the HSA platform

14 | XLDB - Stanford | Sept. 12, 2012

JAVA AND APARAPI HSA ENABLEMENT ROADMAP

HSAIL

HSA-Enabled JVM

Application

HSA GPU HSA CPU

HSA Finalizer

CPU ISA GPU ISA

HSA Runtime

LLVM Optimizer

HSAIL

IR

JVM

Application

Aparapi

HSA GPU HSA CPU

HSA Finalizer

CPU ISA GPU ISA CPU ISA GPU ISA

JVM

Application

Aparapi

GPU CPU

OpenCL™

HSAIL

JVM

Application

Aparapi

HSA GPU HSA CPU

HSA Finalizer

CPU ISA GPU ISA

Page 15: HETEROGENEOUS SYSTEM ARCHITECTURE · 2012-09-14 · HETEROGENEOUS SYSTEM ARCHITECTURE ... Bolt library works with OpenCL or C++ AMP Enjoy the unique advantages of the HSA platform

15 | XLDB - Stanford | Sept. 12, 2012

HSA SOFTWARE STACKS

Page 16: HETEROGENEOUS SYSTEM ARCHITECTURE · 2012-09-14 · HETEROGENEOUS SYSTEM ARCHITECTURE ... Bolt library works with OpenCL or C++ AMP Enjoy the unique advantages of the HSA platform

16 | XLDB - Stanford | Sept. 12, 2012

INTRODUCING HSA BOLT – PARALLEL PRIMITIVES LIBRARY FOR HSA

Easily leverage the inherent power efficiency of GPU computing

Common routines such as scan, sort, reduce, transform

More advanced routines like heterogeneous pipelines

Bolt library works with OpenCL or C++ AMP

Enjoy the unique advantages of the HSA platform

Move the computation not the data

Finally a single source code base for the CPU and GPU!

Developers can focus on core algorithms See Ben Sander’s session tomorrow

for a deep dive on HSA Bolt!

Page 17: HETEROGENEOUS SYSTEM ARCHITECTURE · 2012-09-14 · HETEROGENEOUS SYSTEM ARCHITECTURE ... Bolt library works with OpenCL or C++ AMP Enjoy the unique advantages of the HSA platform

17 | XLDB - Stanford | Sept. 12, 2012

HSA SOLUTION STACK

CPU(s) GPU(s) Other

Accelerators

HSA Finalizer

Legacy Drivers

Application

Domain Specific Libs (Bolt, OpenCV™, … many others)

HSA Runtime

Application SW

Drivers

Differentiated HW

DirectX Runtime

Other Runtime

HSAIL

GPU ISA

OpenCL™ Runtime

HSA Software

Knl Driver

Ctl

Page 18: HETEROGENEOUS SYSTEM ARCHITECTURE · 2012-09-14 · HETEROGENEOUS SYSTEM ARCHITECTURE ... Bolt library works with OpenCL or C++ AMP Enjoy the unique advantages of the HSA platform

18 | XLDB - Stanford | Sept. 12, 2012

AMD’S OPEN SOURCE COMMITMENT TO HSA

Component Name AMD Specific Rationale

HSA Bolt Library No Enable understanding and debug

OpenCL HSAIL Code Generator No Enable research

LLVM Contributions No Industry and academic collaboration

HSA Assembler No Enable understanding and debug

HSA Runtime No Standardize on a single runtime

HSA Finalizer Yes Enable research and debug

HSA Kernel Driver Yes For inclusion in linux distros

We will open source our linux execution and compilation stack

Jump start the ecosystem

Allow a single shared implementation where appropriate

Enable university research in all areas

Page 19: HETEROGENEOUS SYSTEM ARCHITECTURE · 2012-09-14 · HETEROGENEOUS SYSTEM ARCHITECTURE ... Bolt library works with OpenCL or C++ AMP Enjoy the unique advantages of the HSA platform

19 | XLDB - Stanford | Sept. 12, 2012

SEA MICRO

HIGH DENSITY COMPUTING

Page 20: HETEROGENEOUS SYSTEM ARCHITECTURE · 2012-09-14 · HETEROGENEOUS SYSTEM ARCHITECTURE ... Bolt library works with OpenCL or C++ AMP Enjoy the unique advantages of the HSA platform

20 | XLDB - Stanford | Sept. 12, 2012

THE SM15K PRODUCT FAMILY

10 Rack Units

64 Server Cards

1.28 Terabit fabric interconnect

Up to 160GbE Uplink (16 x 10GbE or 64 x 1 GbE)

0-64 Internal 2.5” SAS/SATA HDD/SSD

Up to 1344 External 3.5” SAS/SATA HDD/SSD

Up to 16 x4 3Gbps SAS interfaces for External Storage

Hardware RAID module w/RAID 1,5,6 and 10

Hot swappable modules with in-service upgrades

Runs off the shelf OS and hypervisors

Redundant Power 100-208V AC, 48V DC

3.0 to 3.5 KW Power Consumption (25-85% Util)

Page 21: HETEROGENEOUS SYSTEM ARCHITECTURE · 2012-09-14 · HETEROGENEOUS SYSTEM ARCHITECTURE ... Bolt library works with OpenCL or C++ AMP Enjoy the unique advantages of the HSA platform

21 | XLDB - Stanford | Sept. 12, 2012

AMD SERVER BLADES SHIPS Q4 2012

64 Opteron EE-4365 Servers per 10RU

512 cores in 10 RU; 2,048 cores in a rack

64GB ECC DRAM/Server (4TB per 10RU, 16TB per rack)

8 x 1GbE per server

AMD Opteron Blade

1 Octal Core 2.0/2.3/2.8GHz Opteron EE-4365 processor

per server blade

SM15K-OP

Page 22: HETEROGENEOUS SYSTEM ARCHITECTURE · 2012-09-14 · HETEROGENEOUS SYSTEM ARCHITECTURE ... Bolt library works with OpenCL or C++ AMP Enjoy the unique advantages of the HSA platform

22 | XLDB - Stanford | Sept. 12, 2012

SEAMICRO FABRIC STORAGE ENCLOSURE FAMILY

FS 5084-L FS 2012-L FS 2024-S

Positioning High Capacity Low Upfront Cost Performance Optimized

Height (RU) 5RU 2RU 2RU

Disk Count 84 12 24

Disk Types Supported 3.5” / 2.5” SAS/SATA 3.5”

SAS/SATA

2.5”

SAS/SATA

Controller Dual HA Storage Bridge Bay (SBB) 2.0 Compatible controllers

Interfaces Three x4 6Gb mini-SAS connectors per controller

Max Storage per Enclosure* 336 TB 48 TB 24 TB

Max Storage per SM15K* 5,376 TB (5.3 PB) 768 TB 384 TB

16

min

i-S

AS

connecto

rs

SM

15

K

*Based on 4TB 3.5” and 1TB 2.5” HDD

Page 23: HETEROGENEOUS SYSTEM ARCHITECTURE · 2012-09-14 · HETEROGENEOUS SYSTEM ARCHITECTURE ... Bolt library works with OpenCL or C++ AMP Enjoy the unique advantages of the HSA platform

23 | XLDB - Stanford | Sept. 12, 2012

Hot Plug upto 16

x 10GbE

FABRIC ENABLES FLEXIBLE COMPUTE, NETWORK AND

STORAGE RATIOS

Freedom ASICs create

1.28Tbps Bandwidth &

0.5 to 6µs Fabric

Hot plug up

to 5.3 Peta

bytes

storage

Freedom Fabric enables any server to access any uplink or storage

Page 24: HETEROGENEOUS SYSTEM ARCHITECTURE · 2012-09-14 · HETEROGENEOUS SYSTEM ARCHITECTURE ... Bolt library works with OpenCL or C++ AMP Enjoy the unique advantages of the HSA platform

24 | XLDB - Stanford | Sept. 12, 2012

SM15000: 512 OPTERON "Piledriver" CORES IN 10 RU OPTERON PROCESSORS BASED ON THE NEW PILE DRIVER CORE

64 sockets, each with a new Octal core Opteron

processor: 64-bit, x86, 2.0/2.3/2.8 GHz

512 cores in 10 RU; 2,048 cores in a rack

DRAM: 64GB/socket; 4 terabytes/system,

– Industry leading DRAM density: 400 GB/RU

Freedom supercompute Fabric

10 GigE bandwidth to each socket

16 x 10GbE uplinks

Supports 1,408 drives, linking up to 5 petabytes of

fabric storage

Runs standard OS including Windows, Linux and

VMware and Citrix hypervisors

Page 25: HETEROGENEOUS SYSTEM ARCHITECTURE · 2012-09-14 · HETEROGENEOUS SYSTEM ARCHITECTURE ... Bolt library works with OpenCL or C++ AMP Enjoy the unique advantages of the HSA platform

25 | XLDB - Stanford | Sept. 12, 2012

SM15000 LEADS THE INDUSTRY IN STORAGE CAPACITY

5 PETABYTE CLUSTER COMPARISON

6 Racks

112 2 RU Dual Socket Octal Core “Sandy Bridge”

Servers each w/12 3.5” SATA/SAS Disks

224 OS/Big Data SW Licenses

12 10GbE Switches

6 Terminal Servers

224 Power Cables, 248 Networking cables

40 KW

2 Racks (1/3 the space)

1 SM 15000 + 16 Freedom Fabric Storage

Enclosures

64 OS/Big Data SW Licenses

38 power cords. 32 Fabric Extender Cables

20 KW

1/2 the Price

Page 26: HETEROGENEOUS SYSTEM ARCHITECTURE · 2012-09-14 · HETEROGENEOUS SYSTEM ARCHITECTURE ... Bolt library works with OpenCL or C++ AMP Enjoy the unique advantages of the HSA platform

26 | XLDB - Stanford | Sept. 12, 2012

Disclaimer & Attribution The information presented in this document is for informational purposes only and may contain technical inaccuracies, omissions

and typographical errors.

The information contained herein is subject to change and may be rendered inaccurate for many reasons, including but not limited

to product and roadmap changes, component and motherboard version changes, new model and/or product releases, product

differences between differing manufacturers, software changes, BIOS flashes, firmware upgrades, or the like. There is no

obligation to update or otherwise correct or revise this information. However, we reserve the right to revise this information and to

make changes from time to time to the content hereof without obligation to notify any person of such revisions or changes.

NO REPRESENTATIONS OR WARRANTIES ARE MADE WITH RESPECT TO THE CONTENTS HEREOF AND NO

RESPONSIBILITY IS ASSUMED FOR ANY INACCURACIES, ERRORS OR OMISSIONS THAT MAY APPEAR IN THIS

INFORMATION.

ALL IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR ANY PARTICULAR PURPOSE ARE EXPRESSLY

DISCLAIMED. IN NO EVENT WILL ANY LIABILITY TO ANY PERSON BE INCURRED FOR ANY DIRECT, INDIRECT, SPECIAL

OR OTHER CONSEQUENTIAL DAMAGES ARISING FROM THE USE OF ANY INFORMATION CONTAINED HEREIN, EVEN IF

EXPRESSLY ADVISED OF THE POSSIBILITY OF SUCH DAMAGES.

AMD, the AMD arrow logo, the HSA logo, and combinations thereof are trademarks of Advanced Micro Devices, Inc. OpenCL™ is

a trademark of Apple Corp. which is licensed to the Khronos Organization. All other names used in this presentation are for

informational purposes only and may be trademarks of their respective owners.

© 2012 Advanced Micro Devices, Inc.