technology impacts from the new wave of architectures for media-rich workloads

22
1 VLSI Technology Symposium | June 2011 | Public TECHNOLOGY IMPACTS FROM THE NEW WAVE OF ARCHITECTURES FOR MEDIA-RICH WORKLOADS Samuel Naffziger AMD Corporate Fellow June 14 th , 2011 VLSI TECHNOLOGY SYMPOSIUM 2011

Upload: boone

Post on 23-Feb-2016

22 views

Category:

Documents


0 download

DESCRIPTION

VLSI Technology Symposium 2011. Technology Impacts from the New Wave of Architectures for Media-rich Workloads. Samuel Naffziger AMD Corporate Fellow June 14 th , 2011. Outline. Introduction The new workloads and demands on computation Characteristics of serial and parallel computation - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Technology Impacts from the New Wave of Architectures for Media-rich Workloads

1VLSI Technology Symposium | June 2011 | Public

TECHNOLOGY IMPACTS FROM THE NEW WAVE OF ARCHITECTURES FOR MEDIA-RICH WORKLOADS

Samuel NaffzigerAMD Corporate Fellow

June 14th, 2011

VLSI TECHNOLOGY SYMPOSIUM 2011

Page 2: Technology Impacts from the New Wave of Architectures for Media-rich Workloads

2VLSI Technology Symposium | June 2011 | Public

Introduction

The new workloads and demands on computation

Characteristics of serial and parallel computation

The Accelerated Processing Unit (APU) architecture

APU architecture implications for technology

Summary

Outline

Page 3: Technology Impacts from the New Wave of Architectures for Media-rich Workloads

3VLSI Technology Symposium| June 2011 | Public

Now: Parallel/Data-Dense

16:9 @ 7 megapixels

HD video flipcams, phones, webcams (1GB)

3D Internet apps and HD video online, social networking w/HD files

3D Blu-ray HD

Multi-touch, facial/gesture/voice recognition + mouse & keyboard

All day computing (8+ Hours)

Immersive and interactive performance

Wor

kloa

ds

The Big Experience/Small Form Factor ParadoxMid 2000s4:3 @ 1.2 megapixelsDigital cameras, SD webcams (1-5 MB files)WWW and streaming SD video

DVDs

Mouse & keyboard

3-4 HoursStandard-definition

Internet

Technology Mid 1990s

Display 4:3 @ 0.5 megapixel

Content Email, film & scanners

Online Text and low res photos

Multimedia CD-ROM

Interface Mouse & keyboard

Battery Life* 1-2 Hours

Form

Fact

ors

Early Internet and Multimedia Experiences

*Resting battery life as measured with industry standard tests.

Page 4: Technology Impacts from the New Wave of Architectures for Media-rich Workloads

4VLSI Technology Symposium| June 2011 | Public

Focusing on the experiences that matter

EmailWeb browsing

Office productivityListen to music

Online chatWatching online video

Photo editingPersonal finances

Taking notesOnline web-based

gamesSocial networking

Calendar managementLocally installed games

Educational appsVideo editing

Internet phone

Consumer PC Usage

0% 20% 40% 60% 80% 100%

New Experiences

ImmersiveGaming

Simplified Content Management

Accelerated Internet and HD Video

Source: IDC's 2009 Consumer PC Buyer Survey

Page 5: Technology Impacts from the New Wave of Architectures for Media-rich Workloads

5VLSI Technology Symposium | June 2011 | Public

People Prefer Visual Communications

Visual PerceptionVerbal Perception

Words are processedat only 150 wordsper minute

Pictures and videoare processed 400 to

2000 times faster

Augmenting Today’s Content:

Rich visual experiences Multiple content sources Multi-Display Stereo 3D

Page 6: Technology Impacts from the New Wave of Architectures for Media-rich Workloads

6VLSI Technology Symposium | June 2011 | Public

Communicating• IM, Email, Facebook• Video Chat, NetMeeting

Gaming• Mainstream Games• 3D games

The Emerging World of New Data Rich Applications

ArcSoft TotalMedia® 

Theatre 5

ArcSoft Media

Converter® 7

CyberLink Media

Espresso 6

CyberLink Power

Director 9

Corel VideoStudio Pro

Corel Digital Studio 2010

Internet Explorer 9

Microsoft® PowerPoint® 2010 

Windows Live

Essentials

CodemastersF1 2010Nuvixa

Be Present

ViVu Desktop

TelepresenceViewdle

Uploader

Using photos• Viewing& Sharing• Search, Recognition, Labeling? • Advanced Editing

Using video• DVD, BLU-RAY™, HD • Search, Recognition, Labeling • Advanced Editing & Mixing

The Ultimate Visual Experience™

Fast Rich Web content, favorite HD Movies, games with realistic

graphics

Music• Listening and Sharing• Editing and Mixing• Composing and compositing

Page 7: Technology Impacts from the New Wave of Architectures for Media-rich Workloads

7| 2011 VLSI Symposium| June 2011 | Public

New Workload Examples: Changing Consumer Behavior

24 hoursof video

uploaded to YouTubeevery minute

50 million +digital media files

added to personal content librariesevery day

Approximately

9 billionvideo files owned are

high-definition

1000 images

are uploaded to Facebookevery second

Page 8: Technology Impacts from the New Wave of Architectures for Media-rich Workloads

8VLSI Technology Symposium | June 2011 | Public

What Are the Implications for Computation?Insatiable demand for high bandwidth processing–Visual image processing–Natural user interfaces–Massive data mining for

associative searches, recognition

Some of these compute needs can be offloaded to servers, some must be done on the mobile device– Similar compute needs and

massive growth in both spaces

How must CPU architecture change to deal with these trends?

Page 9: Technology Impacts from the New Wave of Architectures for Media-rich Workloads

9VLSI Technology Symposium | June 2011 | Public

Parallel and Serial Computation

Transistors(thousands)

Single-threadPerformance(SpecINT)

Frequency(MHz)

Typical Power(Watts)

Number ofCores

Original data collected and plotted by M. Horowitz, F. Labonte, O. Shacham, K. Olukotun, L. Hammond and C. Batten

35 Years of Microprocessor Trend Data

i=0i++

load x(i)fmulstore

cmp i (16)bc

Loops, branches and conditional evaluation

Serial Code

Conditionalbranches

0

1000

2000

3000

4000

5000

6000

7000

8000

2005 2006 2007 2008 2009 2010 2011 2012 2013 2014

Peak

GFl

ops (

SPFP

)

Years

GFLOPs Trend

GPU

CPU

AMD projections

i=0i++

load x(i)fmulstore

cmp i (1000000)bc

……

……

i,j=0i++j++

load x(i,j)fmulstore

cmp j (100000)bc

cmp i (100000)bc

2D array representingvery large

dataset

Loop 1M times for 1M pieces

of data

DataParallel Code

Page 10: Technology Impacts from the New Wave of Architectures for Media-rich Workloads

10VLSI Technology Symposium | June 2011 | Public

GPU/CPU Design Differences

Lots of instructions little data

• Out of order exec, Branch prediction

• Few hardware threads

Weak performance gains through density

Maximize speed with fast devices

Few instructions lots of data

• Single Instruction Multiple Data• Extensive fine-threading capability

Nearly linear performance gains with density

Maximize density with cool devices

CPU (Serial compute) GPU (parallel compute)

Page 11: Technology Impacts from the New Wave of Architectures for Media-rich Workloads

11VLSI Technology Symposium | June 2011 | Public

Three Eras of Processor Performance

Single-Core Era

Sin

gle-

thre

ad P

erfo

rman

ce

?

Time

we arehere

o

Enabled by: Moore’s Law Voltage & Process Scaling Micro Architecture

Constrained by:PowerComplexity

Multi-Core Era

Thro

ughp

ut P

erfo

rman

ce

Time(# of Processors)

we arehere

o

Enabled by: Moore’s Law Desire for Throughput 20 years of SMP arch

Constrained by:PowerParallel SW availabilityScalability

HeterogeneousSystems Era

Targ

eted

App

licat

ion

Per

form

ance

Time(Data-parallel exploitation)

we arehere

o

Enabled by: Moore’s Law Abundant data parallelism Power efficient GPUs

Temporarily constrained by:Programming modelsCommunication overheadsWorkloads

Page 12: Technology Impacts from the New Wave of Architectures for Media-rich Workloads

12VLSI Technology Symposium | June 2011 | Public

Heterogeneous Computing with an APU Architecture

CPU Cores

GPU UVD

SB Functions

~7 GB/sec

~17 GB/sec

UNB

MC

~17 GB/sec

DDR3 DIMMMemory

CPU Chip

FCH Chip

PCIe®

Bandwidth pinch points and latency hold back the GPU capabilities

Integration Provides Improvement Eliminate power and latency of extra chip

crossing 3X bandwidth between GPU and Memory! Same sized GPU is substantially more effective Power efficient, advanced technology for both

CPU and GPU

Graphics requires memory BW to bring full capabilities

to life~27 GB/sec

~27 GB/sec

DDR3 DIMMMemory

APU Chip

PCIe

2010 IGP-based (“Danube”) Platform 2011 APU-based (“Llano”) Platform

GPU

CPU Cores

UVD

UN

B / M

C

GPU

Optional

Page 13: Technology Impacts from the New Wave of Architectures for Media-rich Workloads

13VLSI Technology Symposium | June 2011 | Public

The Challenges of Integration

Thick, fast metalBig devices

Dense, thin metal, small devices

Per

form

ance

CPU flop area = 2.14

GPU flop area = 1.0

CPU GPU

Flop count for 4 Llano CPU cores=0.66M

Flop count for Llano GPU =3.5M

Density

Page 14: Technology Impacts from the New Wave of Architectures for Media-rich Workloads

14VLSI Technology Symposium | June 2011 | Public

With the 20nm node, even local metal will be seeing large RC increase compromises more difficult

How to Balance the Metal Stack?

Cu Resistivity without barrierWith barrier

1.51.61.71.81.9

22.12.22.32.42.5

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1Line Width (um)

Res

istiv

ity (u

ohm

-cm

)

Add metal layers? Thin, dense layers for the

GPU Thick, low resistance layers

for the CPU Cost issues? Via resistance?Technology improvements in

BEOL are required

Per

form

ance

CPU GPU

Density

Page 15: Technology Impacts from the New Wave of Architectures for Media-rich Workloads

15VLSI Technology Symposium | June 2011 | Public

Device OptimizationP

erfo

rman

ce

CPU GPU

To achieve breakthrough APU performance, the Llano GPU has ~5X the flops and ~5X the device count of the CPUs

0

0.2

0.4

0.6

0.8

1

1.2

Llano CPU Llano GPU

APU Vt Mix

LC-HVT

HVT

RVT

LC-RVT

LVTDevice Ioff

Broader span of devices required

0

50

100

150

200

250

175 50 20 4.3 2.7 0.5 0.4

FPG

Ioff (nA/um room temp)

FPG vs. Ioff

CPU

GPU

desired device range

Speed vs. Leakage

RO

spe

ed

LVT LC-RVTRVT HVT LC-HVT

A broader device suite is required

Page 16: Technology Impacts from the New Wave of Architectures for Media-rich Workloads

16VLSI Technology Symposium | June 2011 | Public

85.0

90.0

95.0

100.0

105.0

110.0

Balanced workload

85.0

90.0

95.0

100.0

105.0

110.0

GPU-centric data parallel workload

85.0

90.0

95.0

100.0

105.0

110.0

CPU-centric serial workload

Power Transfers

GPU

UVD

UNB / MC

CPU

Core+

cache

CPU

Core+

cache

DDR IO

PCIe IO

CPU

Core+

cache

CPU

Core+

cache

Tem

pera

ture

Voltage range is critical to enabling the efficient power transfers that make for compelling APU performance

Page 17: Technology Impacts from the New Wave of Architectures for Media-rich Workloads

17VLSI Technology Symposium | June 2011 | Public

Operating Voltage Range

Operating voltage requirements:

Low voltage necessary for power efficiency

High voltage necessary for a snappy user experience enabled by turbo mode

0

0.5

1

1.5

2

2.5

0.7V 0.8V 0.9V 1.0V 1.1V 1.2V 1.3V

E/op vs. V

Page 18: Technology Impacts from the New Wave of Architectures for Media-rich Workloads

18VLSI Technology Symposium | June 2011 | Public

0.952V

0.915V

0.886V

0.805V

0.700V

0.750V

0.800V

0.850V

0.900V

0.950V

1.000V

0

0.5

1

1.5

2

2.5

3

3.5

4

4.5

5

40nm 28nm 20nm 14nm

Nom

inal

Vol

tage

Power Density Limited GPU40nm to 14nm

Frequency

Power Density

Voltage

Operating Voltage Challenges

To maintain cost effective performance growth with technology node, the GPU must:

– Hold power density constant

– Exploit density gains to add compute units

This necessarily drives operating voltage down

This would be good for energy efficiency except …

– Variation impacts are much greater at low voltage

0MHz

100MHz

200MHz

300MHz

400MHz

500MHz

600MHz

700MHz

800MHz

900MHz

1000MHz

0.85V 0.90V 0.95V 1.00V 1.10V 1.15V

Juniper Frequency Data

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

Frequency spread increases at low voltage

40nm GPU Frequency Data

Page 19: Technology Impacts from the New Wave of Architectures for Media-rich Workloads

19VLSI Technology Symposium | June 2011 | Public

The Operating Voltage Challenge

Many barriers to maintaining both high and low voltage as technology scales

TDDB vs. SCE control ULK breakdown vs. denser pitches Variation control

85.0

90.0

95.0

100.0

105.0

110.0

BOX

Poly

Fin Curre

nt Fl

ow ->

S

D

FD devices should enable maintaining the functional range for a generation or two

Will turbo modes be too compromised?

What’s next?

Page 20: Technology Impacts from the New Wave of Architectures for Media-rich Workloads

20VLSI Technology Symposium | June 2011 | Public

3D Integration to the Rescue?

ThroughSilicon Vias

(TSVs) CPU DieMetal Layers

GPU DieMetal Layers

Analog Die (SB, Power)Metal Layers

Metal Layers

TIM (Thermal Interface Material)

Heat Sink

DRAMMicro-bumps

Package Substrate

GPU

UVD

UN

B / M

C

CPU Core+cache

CPU Core+cache

DDR IO

PCIe IO

CPU Core+cache

CPU Core+cache

DRAM

South Bridge

Stacking offers many attractive benefitsHigher bandwidth to local memory

Enables parallel and serial compute die to be in their own separate optimized technology – interconnect speed vs. density, device optimization etc.

Allows IO and southbridge content to remain in older, more analog-friendly technology

Page 21: Technology Impacts from the New Wave of Architectures for Media-rich Workloads

21VLSI Technology Symposium | June 2011 | Public

3D Integration Challenges Economical 3D stacking in high volume manufacturing presents many challenges

Benefits must exceed the additional costs of TSVs, and yield fallout

Logistics of testing and assembling die from multiple sources can be immense

Countless mechanical and thermal issues to solve in high volume mfg

Clearly 3D provides compelling solutions to many problems, but the barriers to entry mean heavy R&D $$ and partnerships required

ThroughSilicon

Vias(TSVs)

CPU DieMetal Layers

GPU DieMetal Layers

Analog Die (SB, Power)Metal Layers

Metal Layers

TIM (Thermal I nterface Material)

Heat Sink

DRAMDie to Die Vias

Package Substrate

DRAM

South Bridge

Page 22: Technology Impacts from the New Wave of Architectures for Media-rich Workloads

22VLSI Technology Symposium | June 2011 | Public

Summary

Insatiable demand for high bandwidth computation–Visual image processing–Natural user interfaces–Massive data mining for associate searches, recognition

Some of these compute needs can be offloaded to servers, some must be done on the mobile device–Similar compute needs and massive growth in both spaces–Combined serial and parallel computation architectures are

key in both spacesHuge technology challenges to meeting this opportunity

–Interconnect scaling is hitting a wall that must be overcome–A broad device suite is necessary that operates efficiently at

low voltage while enabling high speed for response time–3D integration offers a promising long term solution