department of computer science mapreduce for the cell b. e. architecture marc de kruijf university...

27
Department of Computer Science Department of Computer Science MapReduce for the Cell B. E. Architecture Marc de Kruijf University of Wisconsin−Madison Advised by Professor Sankaralingam

Upload: malcolm-mcdaniel

Post on 17-Jan-2016

214 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Department of Computer Science MapReduce for the Cell B. E. Architecture Marc de Kruijf University of Wisconsin−Madison Advised by Professor Sankaralingam

Department of Computer ScienceDepartment of Computer Science

MapReduce for the Cell B. E. Architecture

Marc de KruijfUniversity of Wisconsin−Madison

Advised by Professor Sankaralingam

Page 2: Department of Computer Science MapReduce for the Cell B. E. Architecture Marc de Kruijf University of Wisconsin−Madison Advised by Professor Sankaralingam

2

Department of Computer ScienceDepartment of Computer Science

MapReduce A model for parallel programming Proposed by Google

Large scale distributed systems – 1,000 node clusters

Applications: Distributed sort Distributed grep Indexing

Simple, high-level interface Runtime handles:

parallelization, scheduling, synchronization, and communication

Page 3: Department of Computer Science MapReduce for the Cell B. E. Architecture Marc de Kruijf University of Wisconsin−Madison Advised by Professor Sankaralingam

3

Department of Computer ScienceDepartment of Computer Science

Cell B. E. Architecture A heterogeneous

computing platform: 1 PPE, 8 SPEs

Programming is hard Multi-threading is

explicit SPE local memories

are software-managed

The Cell is like a “cluster-on-a-chip”

Page 4: Department of Computer Science MapReduce for the Cell B. E. Architecture Marc de Kruijf University of Wisconsin−Madison Advised by Professor Sankaralingam

4

Department of Computer ScienceDepartment of Computer Science

MotivationMapReduce

Scalable parallel modelSimple interface

Cell B. E.Complex parallel

architectureHard to program

MapReduce for the Cell B.E. Architecture

Page 5: Department of Computer Science MapReduce for the Cell B. E. Architecture Marc de Kruijf University of Wisconsin−Madison Advised by Professor Sankaralingam

5

Department of Computer ScienceDepartment of Computer Science

Overview Motivation

MapReduce Cell B.E. Architecture

MapReduce Example Design Evaluation

Workload Characterization Application Performance

Conclusions and Future Work

Page 6: Department of Computer Science MapReduce for the Cell B. E. Architecture Marc de Kruijf University of Wisconsin−Madison Advised by Professor Sankaralingam

6

Department of Computer ScienceDepartment of Computer Science

MapReduce ExampleCounting word occurrences in a set of documents:

Page 7: Department of Computer Science MapReduce for the Cell B. E. Architecture Marc de Kruijf University of Wisconsin−Madison Advised by Professor Sankaralingam

7

Department of Computer ScienceDepartment of Computer Science

Overview Motivation

MapReduce Cell B.E. Architecture

MapReduce Example Design Evaluation

Workload Characterization Application Performance

Conclusions and Future Work

Page 8: Department of Computer Science MapReduce for the Cell B. E. Architecture Marc de Kruijf University of Wisconsin−Madison Advised by Professor Sankaralingam

8

Department of Computer ScienceDepartment of Computer Science

Design

Flow of Execution

Five stages: Map, Partition, Quick-sort, Merge-sort, Reduce

Page 9: Department of Computer Science MapReduce for the Cell B. E. Architecture Marc de Kruijf University of Wisconsin−Madison Advised by Professor Sankaralingam

9

Department of Computer ScienceDepartment of Computer Science

Design

Flow of Execution

Five stages: Map, Partition, Quick-sort, Merge-sort, Reduce

1. Map streams key/value pairs

Page 10: Department of Computer Science MapReduce for the Cell B. E. Architecture Marc de Kruijf University of Wisconsin−Madison Advised by Professor Sankaralingam

10

Department of Computer ScienceDepartment of Computer Science

Design

Flow of Execution

Five stages: Map, Partition, Quick-sort, Merge-sort, Reduce

1. Map streams key/value pairs

Key grouping implemented as:

2. Partition – hash and distribute

3. Quick-sort 4. Merge-sort

two-phase external sort

Page 11: Department of Computer Science MapReduce for the Cell B. E. Architecture Marc de Kruijf University of Wisconsin−Madison Advised by Professor Sankaralingam

11

Department of Computer ScienceDepartment of Computer Science

Design

Flow of Execution

Five stages: Map, Partition, Quick-sort, Merge-sort, Reduce

1. Map streams key/value pairs

Key grouping implemented as:

2. Partition – hash and distribute

3. Quick-sort 4. Merge-sort

two-phase external sort

Page 12: Department of Computer Science MapReduce for the Cell B. E. Architecture Marc de Kruijf University of Wisconsin−Madison Advised by Professor Sankaralingam

12

Department of Computer ScienceDepartment of Computer Science

Design

Flow of Execution

Five stages: Map, Partition, Quick-sort, Merge-sort, Reduce

1. Map streams key/value pairs

Key grouping implemented as:

2. Partition – hash and distribute

3. Quick-sort 4. Merge-sort

two-phase external sort

Page 13: Department of Computer Science MapReduce for the Cell B. E. Architecture Marc de Kruijf University of Wisconsin−Madison Advised by Professor Sankaralingam

13

Department of Computer ScienceDepartment of Computer Science

DesignFlow of Execution

Five stages: Map, Partition, Quick-sort, Merge-sort, Reduce

1. Map streams key/value pairs

Key grouping implemented as:

2. Partition – hash and distribute

3. Quick-sort 4. Merge-sort

5. Reduce “reduces”key/list-of-values pairs tokey/value pairs.

two-phase external sort

Page 14: Department of Computer Science MapReduce for the Cell B. E. Architecture Marc de Kruijf University of Wisconsin−Madison Advised by Professor Sankaralingam

14

Department of Computer ScienceDepartment of Computer Science

Overview Motivation

MapReduce Cell B.E. Architecture

MapReduce Example Design Evaluation

Workload Characterization Application Performance

Conclusions and Future Work

Page 15: Department of Computer Science MapReduce for the Cell B. E. Architecture Marc de Kruijf University of Wisconsin−Madison Advised by Professor Sankaralingam

15

Department of Computer ScienceDepartment of Computer Science

Evaluation Methodology MapReduce Model Characterization

Synthetic micro-benchmark with six parameters

Run on a 3.2 GHz Cell Blade Measured effect of each parameter on execution time

Application Performance Comparison Six full applications

MapReduce versions run on 3.2 GHz Cell Blade Single-threaded versions run on 2.4 GHz Core 2 Duo

Evaluation Measured speedup comparing execution times Measured overheads on the Cell monitoring SPE idle

time Measured ideal speedup assuming no Cell overheads

Page 16: Department of Computer Science MapReduce for the Cell B. E. Architecture Marc de Kruijf University of Wisconsin−Madison Advised by Professor Sankaralingam

16

Department of Computer ScienceDepartment of Computer Science

MapReduce Model Characterization

Model CharacteristicsCharacter

isticDescription

Map intensity Execution cycles per input byte to Map

Reduce intensity

Execution cycles per input byte to Reduce

Map fan-out Ratio of input size to output size in Map

Reduce fan-in Number of values per key in Reduce

Partitions Number of partitions

Input size Input size in bytes

Effect on Execution Time

Page 17: Department of Computer Science MapReduce for the Cell B. E. Architecture Marc de Kruijf University of Wisconsin−Madison Advised by Professor Sankaralingam

17

Department of Computer ScienceDepartment of Computer Science

Application Performance Applications

histogram: counts bitmap RGB occurrences

kmeans: clustering algorithmlinearReg: least-squares linear

regressionwordCount: word countNAS_EP: EP benchmark from NAS

suitedistSort: distributed sort

Page 18: Department of Computer Science MapReduce for the Cell B. E. Architecture Marc de Kruijf University of Wisconsin−Madison Advised by Professor Sankaralingam

18

Department of Computer ScienceDepartment of Computer Science

Speedup Over Core 2 Duo

Page 19: Department of Computer Science MapReduce for the Cell B. E. Architecture Marc de Kruijf University of Wisconsin−Madison Advised by Professor Sankaralingam

19

Department of Computer ScienceDepartment of Computer Science

Runtime Overheads

Page 20: Department of Computer Science MapReduce for the Cell B. E. Architecture Marc de Kruijf University of Wisconsin−Madison Advised by Professor Sankaralingam

20

Department of Computer ScienceDepartment of Computer Science

Overview Motivation

MapReduce Cell B.E. Architecture

MapReduce Example Design Evaluation

Workload Characterization Application Performance

Conclusions and Future Work

Page 21: Department of Computer Science MapReduce for the Cell B. E. Architecture Marc de Kruijf University of Wisconsin−Madison Advised by Professor Sankaralingam

21

Department of Computer ScienceDepartment of Computer Science

Conclusions and Future Work

Conclusions Programmability benefits High-performance on computationally

intensive workloads Not applicable to all application types

Future Work Additional performance tuning Extend for clusters of Cell processors

Hierarchical MapReduce

Page 22: Department of Computer Science MapReduce for the Cell B. E. Architecture Marc de Kruijf University of Wisconsin−Madison Advised by Professor Sankaralingam

Department of Computer ScienceDepartment of Computer Science

Questions?

Page 23: Department of Computer Science MapReduce for the Cell B. E. Architecture Marc de Kruijf University of Wisconsin−Madison Advised by Professor Sankaralingam

Department of Computer ScienceDepartment of Computer Science

Backup Slides

Page 24: Department of Computer Science MapReduce for the Cell B. E. Architecture Marc de Kruijf University of Wisconsin−Madison Advised by Professor Sankaralingam

24

Department of Computer ScienceDepartment of Computer Science

MapReduce API

void MapReduce_exec(MapReduce Specification specification);

The exec function initializes the MapReduce runtime and executes MapReduce according to the user specification.

void MapReduce_emitIntermediate(void **key, void **value);void MapReduce_emit(void **value);

These two functions are called by the user-defined Map and Reduce functions, respectively. These functions take references to pointers as arguments, and modify the referenced pointer to point to pre-allocated storage. It is then the responsibility of the application to provision this storage.

Page 25: Department of Computer Science MapReduce for the Cell B. E. Architecture Marc de Kruijf University of Wisconsin−Madison Advised by Professor Sankaralingam

25

Department of Computer ScienceDepartment of Computer Science

Optimizations1) Priority work queue

Distributes load Avoids serialization

Pipelined execution maximizes concurrency

2) Double-buffering3) Application support

Map only Map with sorted

output Chaining invocations

Page 26: Department of Computer Science MapReduce for the Cell B. E. Architecture Marc de Kruijf University of Wisconsin−Madison Advised by Professor Sankaralingam

26

Department of Computer ScienceDepartment of Computer Science

Optimizations1) Priority work queue

Distributes load Avoids serialization

Pipelined execution maximizes concurrency

2) Double-buffering3) Application support

Map only Map with sorted

output Chaining invocations

Page 27: Department of Computer Science MapReduce for the Cell B. E. Architecture Marc de Kruijf University of Wisconsin−Madison Advised by Professor Sankaralingam

27

Department of Computer ScienceDepartment of Computer Science

Optimizations4) Balanced merge (n / log(n) better bandwidth utilization as n

→ ∞)

5) Map and Reduce output regions pre-allocated. optimal memory alignment bulk memory transfers no user memory management no dynamic allocation overhead