feeding the multicore beast: it’s all about the data!...ibm research © 2008 feeding the multicore...

38
IBM Research © 2008 Feeding the Multicore Beast: It’s All About the Data! Michael Perrone IBM Master Inventor Mgr, Cell Solutions Dept.

Upload: others

Post on 08-Jul-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Feeding the Multicore Beast: It’s All About the Data!...IBM Research © 2008 Feeding the Multicore Beast: It’s All About the Data! Michael Perrone IBM Master Inventor Mgr, Cell

IBM Research

© 2008

Feeding theMulticore Beast:It’s All About the Data!

Michael PerroneIBM Master InventorMgr, Cell Solutions Dept.

Page 2: Feeding the Multicore Beast: It’s All About the Data!...IBM Research © 2008 Feeding the Multicore Beast: It’s All About the Data! Michael Perrone IBM Master Inventor Mgr, Cell

IBM Research

© 20082 [email protected]

Outline

History: Data challenge

Motivation for multicore

Implications for programmers

How Cell addresses these implications

Examples

• 2D/3D FFT– Medical Imaging, Petroleum, general HPC…

• Green’s Functions– Seismic Imaging (Petroleum)

• String Matching– Network Processing: DPI & Intrusion Detections

• Neural Networks– Finance

Page 3: Feeding the Multicore Beast: It’s All About the Data!...IBM Research © 2008 Feeding the Multicore Beast: It’s All About the Data! Michael Perrone IBM Master Inventor Mgr, Cell

IBM Research

© 20083 [email protected]

Chapter 1:

The Beast is Hungry!

Page 4: Feeding the Multicore Beast: It’s All About the Data!...IBM Research © 2008 Feeding the Multicore Beast: It’s All About the Data! Michael Perrone IBM Master Inventor Mgr, Cell

IBM Research

© 20084 [email protected]

The Hungry Beast

Processor

(“beast”)

Data

(“food”)Data Pipe

Pipe too small = starved beast

Pipe big enough = well-fed beast

Pipe too big = wasted resources

Page 5: Feeding the Multicore Beast: It’s All About the Data!...IBM Research © 2008 Feeding the Multicore Beast: It’s All About the Data! Michael Perrone IBM Master Inventor Mgr, Cell

IBM Research

© 20085 [email protected]

The Hungry Beast

Processor

(“beast”)

Data

(“food”)Data Pipe

Pipe too small = starved beast

Pipe big enough = well-fed beast

Pipe too big = wasted resources

If flops grow faster than pipe capacity…

… the beast gets hungrier!

Page 6: Feeding the Multicore Beast: It’s All About the Data!...IBM Research © 2008 Feeding the Multicore Beast: It’s All About the Data! Michael Perrone IBM Master Inventor Mgr, Cell

IBM Research

© 20086 [email protected]

Move the food closer

Example: Intel Tulsa

– Xeon MP 7100 series

– 65nm, 349mm2, 2 Cores

– 3.4 GHz @ 150W

– ~54.4 SP GFlops

– http://www.intel.com/products

/processor/xeon/index.htm

Large cache on chip

– ~50% of area

– Keeps data close for

efficient access

If the data is local,

the beast is happy!

– True for many algorithms

Page 7: Feeding the Multicore Beast: It’s All About the Data!...IBM Research © 2008 Feeding the Multicore Beast: It’s All About the Data! Michael Perrone IBM Master Inventor Mgr, Cell

IBM Research

© 20087 [email protected]

What happens if the beast is still hungry?

Data

Cache

If the data set doesn’t fit in cache

– Cache misses

– Memory latency exposed

– Performance degraded

Several important application classes don’t fit

– Graph searching algorithms

– Network security

– Natural language processing

– Bioinformatics

– Many HPC workloads

Page 8: Feeding the Multicore Beast: It’s All About the Data!...IBM Research © 2008 Feeding the Multicore Beast: It’s All About the Data! Michael Perrone IBM Master Inventor Mgr, Cell

IBM Research

© 20088 [email protected]

Make the food bowl larger

Data

Cache

Cache size steadily increasing

Implications

– Chip real estate reserved for cache

– Less space on chip for computes

– More power required for fewer FLOPS

Page 9: Feeding the Multicore Beast: It’s All About the Data!...IBM Research © 2008 Feeding the Multicore Beast: It’s All About the Data! Michael Perrone IBM Master Inventor Mgr, Cell

IBM Research

© 20089 [email protected]

Make the food bowl larger

Data

Cache

Cache size steadily increasing

Implications

– Chip real estate reserved for cache

– Less space on chip for computes

– More power required for fewer FLOPS

But…

– Important application working sets are growing faster

– Multicore even more demanding on cache than uni-core

Page 10: Feeding the Multicore Beast: It’s All About the Data!...IBM Research © 2008 Feeding the Multicore Beast: It’s All About the Data! Michael Perrone IBM Master Inventor Mgr, Cell

IBM Research

© 200810 [email protected]

Chapter 2:

The Beast Has Babies

Page 11: Feeding the Multicore Beast: It’s All About the Data!...IBM Research © 2008 Feeding the Multicore Beast: It’s All About the Data! Michael Perrone IBM Master Inventor Mgr, Cell

IBM Research

© 200811 [email protected]

Power Density – The fundamental problem

1

10

100

1000

1.5 1 0.7 0.5 0.35 0.25 0.18 0.13 0.1 0.07

i386i486

Pentium®

Pentium Pro ®

Pentium II ®

Pentium III®

W/cm2

Hot Plate

Nuclear Reactor

Source: Fred Pollack, Intel. New Microprocessor Challenges in the Coming Generations of CMOS Technologies, Micro32

Page 12: Feeding the Multicore Beast: It’s All About the Data!...IBM Research © 2008 Feeding the Multicore Beast: It’s All About the Data! Michael Perrone IBM Master Inventor Mgr, Cell

IBM Research

© 200812 [email protected]

What’s causing the problem?

10S Tox=11AGate Stack

Gate dielectric approaching a

fundamental limit (a few atomic layers)

Po

wer

Den

sit

y (

W/c

m2)

65 nM

Gate Length (microns)

1 0.010.1

1000

100

10

1

0.1

0.01

0.001

Power, signal jitter, etc...

Page 13: Feeding the Multicore Beast: It’s All About the Data!...IBM Research © 2008 Feeding the Multicore Beast: It’s All About the Data! Michael Perrone IBM Master Inventor Mgr, Cell

IBM Research

© 200813 [email protected]

1.0E+02

1.0E+03

1.0E+04

1990 1995 2000 2005 2010

Clo

ck S

peed

(M

Hz)

Clock Speed

103

102

104

Diminishing Returns on FrequencyIn a power-constrained environment, chip clock speed yields diminishing

returns. The industry has moved to lower frequency multicore architectures.

Frequency-DrivenDesignPoints

Page 14: Feeding the Multicore Beast: It’s All About the Data!...IBM Research © 2008 Feeding the Multicore Beast: It’s All About the Data! Michael Perrone IBM Master Inventor Mgr, Cell

IBM Research

© 200814 [email protected]

Power vs Performance Trade Offs

Relative Performance

0

1

2

3

4

5

Rela

tive P

ow

er

1

1.45

1.3.85 1.7

We need to adapt our algorithms to

get performance out of multicore

Page 15: Feeding the Multicore Beast: It’s All About the Data!...IBM Research © 2008 Feeding the Multicore Beast: It’s All About the Data! Michael Perrone IBM Master Inventor Mgr, Cell

IBM Research

© 200815 [email protected]

Implications of Multicore

There are more mouths to feed

– Data movement will take center stage

Complexity of cores will stop increasing

… and has started to decrease in some cases

Complexity increases will center around communication

Assumption

– Achieving a significant % or peak performance is important

Page 16: Feeding the Multicore Beast: It’s All About the Data!...IBM Research © 2008 Feeding the Multicore Beast: It’s All About the Data! Michael Perrone IBM Master Inventor Mgr, Cell

IBM Research

© 200816 [email protected]

Chapter 3:

The Proper Care and Feedingof Hungry Beasts

Page 17: Feeding the Multicore Beast: It’s All About the Data!...IBM Research © 2008 Feeding the Multicore Beast: It’s All About the Data! Michael Perrone IBM Master Inventor Mgr, Cell

IBM Research

© 200817 [email protected]

Cell/B.E. Processor: 200GFLOPS (SP) @ ~70W

Page 18: Feeding the Multicore Beast: It’s All About the Data!...IBM Research © 2008 Feeding the Multicore Beast: It’s All About the Data! Michael Perrone IBM Master Inventor Mgr, Cell

IBM Research

© 200818 [email protected]

Feeding the Cell Processor

8 SPEs each with

– LS

– MFC

– SXU

PPE

– OS functions

– Disk IO

– Network IO

16B/cycle (2x)16B/cycle

BIC

FlexIOTM

MIC

Dual

XDRTM

16B/cycle

EIB (up to 96B/cycle)

16B/cycle

64-bit Power Architecture with VMX

PPE

SPE

LS

SXU

SPU

MFC

PXUL1

PPU

16B/cycle

L232B/cycle

LS

SXU

SPU

MFC

LS

SXU

SPU

MFC

LS

SXU

SPU

MFC

LS

SXU

SPU

MFC

LS

SXU

SPU

MFC

LS

SXU

SPU

MFC

LS

SXU

SPU

MFC

Page 19: Feeding the Multicore Beast: It’s All About the Data!...IBM Research © 2008 Feeding the Multicore Beast: It’s All About the Data! Michael Perrone IBM Master Inventor Mgr, Cell

IBM Research

© 200819 [email protected]

Cell Approach: Feed the beast more efficiently

Explicitly “orchestrate” the data flow between main

memory and each SPE’s local store

– Use SPE’s DMA engine to gather & scatter data between

memory main memory and local store

– Enables detailed programmer control of data flow

• Get/Put data when & where you want it

• Hides latency: Simultaneous reads, writes & computes

– Avoids restrictive HW cache management

• Unlikely to determine optimal data flow

• Potentially very inefficient

– Allows more efficient use of the existing bandwidth

Page 20: Feeding the Multicore Beast: It’s All About the Data!...IBM Research © 2008 Feeding the Multicore Beast: It’s All About the Data! Michael Perrone IBM Master Inventor Mgr, Cell

IBM Research

© 200820 [email protected]

Cell Approach: Feed the beast more efficiently

Explicitly “orchestrate” the data flow between main

memory and each SPE’s local store

– Use SPE’s DMA engine to gather & scatter data between

memory main memory and local store

– Enables detailed programmer control of data flow

• Get/Put data when & where you want it

• Hides latency: Simultaneous reads, writes & computes

– Avoids restrictive HW cache management

• Unlikely to determine optimal data flow

• Potentially very inefficient

– Allows more efficient use of the existing bandwidth

BOTTOM LINE:

It’s all about the data!

Page 21: Feeding the Multicore Beast: It’s All About the Data!...IBM Research © 2008 Feeding the Multicore Beast: It’s All About the Data! Michael Perrone IBM Master Inventor Mgr, Cell

IBM Research

© 200821 [email protected]

Cell Comparison: ~4x the FLOPS @ ~½ the power

Both 65nm technology

(to scale)

Page 22: Feeding the Multicore Beast: It’s All About the Data!...IBM Research © 2008 Feeding the Multicore Beast: It’s All About the Data! Michael Perrone IBM Master Inventor Mgr, Cell

IBM Research

© 200822 [email protected]

Memory Managing Processor vs. Traditional General Purpose Processor

IBM

AMD

Intel

Cell

BE

Page 23: Feeding the Multicore Beast: It’s All About the Data!...IBM Research © 2008 Feeding the Multicore Beast: It’s All About the Data! Michael Perrone IBM Master Inventor Mgr, Cell

IBM Research

© 200823 [email protected]

Examples of Feeding Cell

2D and 3D FFTs

Seismic Imaging

String Matching

Neural Networks (function approximation)

Page 24: Feeding the Multicore Beast: It’s All About the Data!...IBM Research © 2008 Feeding the Multicore Beast: It’s All About the Data! Michael Perrone IBM Master Inventor Mgr, Cell

IBM Research

© 200824 [email protected]

Feeding FFTs to Cell

Buffer

Input

Image

Transposed

Image

Tile

Transposed

Tile

Transposed

Buffer

SIMDized data

DMAs double buffered

Pass 1: For each buffer

• DMA Get buffer

• Do four 1D FFTs in SIMD

• Transpose tiles

• DMA Put buffer

Pass 2: For each buffer

• DMA Get buffer

• Do four 1D FFTs in SIMD

• Transpose tiles

• DMA Put buffer

Page 25: Feeding the Multicore Beast: It’s All About the Data!...IBM Research © 2008 Feeding the Multicore Beast: It’s All About the Data! Michael Perrone IBM Master Inventor Mgr, Cell

IBM Research

© 200825 [email protected]

3D FFTs

Long stride trashes cache

Cell DMA allows prefetch

Single Element Data envelope

Stride 1

Stride

N2

N

Page 26: Feeding the Multicore Beast: It’s All About the Data!...IBM Research © 2008 Feeding the Multicore Beast: It’s All About the Data! Michael Perrone IBM Master Inventor Mgr, Cell

IBM Research

© 200826 [email protected]

Feeding Seismic Imaging to Cell

(X,Y)

New G at each (x,y)

Radial symmetry of G reduces BW requirements

Data

Green’s Function

ij

jiyxGjyixD ),,,(),(

Page 27: Feeding the Multicore Beast: It’s All About the Data!...IBM Research © 2008 Feeding the Multicore Beast: It’s All About the Data! Michael Perrone IBM Master Inventor Mgr, Cell

IBM Research

© 200827 [email protected]

Feeding Seismic Imaging to Cell Data

SPE 0 SPE 1 SPE 2 SPE 3 SPE 4 SPE 5 SPE 6 SPE 7

Page 28: Feeding the Multicore Beast: It’s All About the Data!...IBM Research © 2008 Feeding the Multicore Beast: It’s All About the Data! Michael Perrone IBM Master Inventor Mgr, Cell

IBM Research

© 200828 [email protected]

Feeding Seismic Imaging to Cell Data

SPE 0 SPE 1 SPE 2 SPE 3 SPE 4 SPE 5 SPE 6 SPE 7

Page 29: Feeding the Multicore Beast: It’s All About the Data!...IBM Research © 2008 Feeding the Multicore Beast: It’s All About the Data! Michael Perrone IBM Master Inventor Mgr, Cell

IBM Research

© 200829 [email protected]

Feeding Seismic Imaging to Cell

For each X

– Load next column of data

– Load next column of indices

– For each Y

• Load Green’s functions

• SIMDize Green’s functions

• Compute convolution at (X,Y)

– Cycle buffers

H

2R+1

1

Data buffer

Green’s Index buffer

(X,Y)

R

2

Page 30: Feeding the Multicore Beast: It’s All About the Data!...IBM Research © 2008 Feeding the Multicore Beast: It’s All About the Data! Michael Perrone IBM Master Inventor Mgr, Cell

IBM Research

© 200830 [email protected]

Feeding String Matching to Cell

Find (lots of) substrings in (long) string

Build graph of words & represent as DFA

Problem: Graph doesn’t fit in LS

Sample Word List:

“the”

“that”

“math”

Page 31: Feeding the Multicore Beast: It’s All About the Data!...IBM Research © 2008 Feeding the Multicore Beast: It’s All About the Data! Michael Perrone IBM Master Inventor Mgr, Cell

IBM Research

© 200831 [email protected]

Feeding String Matching to Cell

Page 32: Feeding the Multicore Beast: It’s All About the Data!...IBM Research © 2008 Feeding the Multicore Beast: It’s All About the Data! Michael Perrone IBM Master Inventor Mgr, Cell

IBM Research

© 200832 [email protected]

Hiding Main Memory Latency

Page 33: Feeding the Multicore Beast: It’s All About the Data!...IBM Research © 2008 Feeding the Multicore Beast: It’s All About the Data! Michael Perrone IBM Master Inventor Mgr, Cell

IBM Research

© 200833 [email protected]

Software Multithreading

Page 34: Feeding the Multicore Beast: It’s All About the Data!...IBM Research © 2008 Feeding the Multicore Beast: It’s All About the Data! Michael Perrone IBM Master Inventor Mgr, Cell

IBM Research

© 200834 [email protected]

Feeding Neural Networks to Cell

Neural net function F(X)

– RBF, MLP, KNN, etc.

If too big for LS, BW Bound

N Basis functions: dot product + nonlinearity

D Input dimensions

DxN Matrix of parameters

Output

F

X

Page 35: Feeding the Multicore Beast: It’s All About the Data!...IBM Research © 2008 Feeding the Multicore Beast: It’s All About the Data! Michael Perrone IBM Master Inventor Mgr, Cell

IBM Research

© 200835 [email protected]

Convert BW Bound to Compute Bound

Split function over multiple SPEs

Avoids unnecessary memory traffic

Reduce compute time per SPE

Minimal merge overhead

Merge

Page 36: Feeding the Multicore Beast: It’s All About the Data!...IBM Research © 2008 Feeding the Multicore Beast: It’s All About the Data! Michael Perrone IBM Master Inventor Mgr, Cell

IBM Research

© 200836 [email protected]

Moral of the Story:It’s All About the Data!

The data problem is growing: multicore

Intelligent software prefetching

– Use DMA engines

– Don’t rely on HW prefetching

Efficient data management

– Multibuffering: Hide the latency!

– BW utilization: Make every byte count!

– SIMDization: Make every vector count!

– Problem/data partitioning: Make every core work!

– Software multithreading: Keep every core busy!

Page 37: Feeding the Multicore Beast: It’s All About the Data!...IBM Research © 2008 Feeding the Multicore Beast: It’s All About the Data! Michael Perrone IBM Master Inventor Mgr, Cell

IBM Research

© 200837 [email protected]

Backup

Page 38: Feeding the Multicore Beast: It’s All About the Data!...IBM Research © 2008 Feeding the Multicore Beast: It’s All About the Data! Michael Perrone IBM Master Inventor Mgr, Cell

IBM Research

© 200838 [email protected]

Abstract

Technological obstacles have prevented the microprocessor

industry from achieving increased performance through increased

chip clock speeds. In a reaction to these restrictions, the industry

has chosen the multicore processors path. Multicore processors

promise tremendous GFLOPS performance but raise the challenge

of how one programs them. In this talk, I will discuss the motivation

for multicore, the implications to programmers and how the

Cell/B.E. processors design addresses these challenges. As an

example, I will review one or two applications that highlight the

strengths of Cell.