teaching “think parallel”

65
© 2013, Intel Corporation. All righ ts reserved. Intel and the Intel logo are trademarks of Intel Corporation in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others. Teaching “Think Parallel” Four positive trends toward Parallel Programming, including advances in teaching/learning James Reinders, Intel April 2013 1

Upload: others

Post on 16-Oct-2021

5 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Teaching “Think Parallel”

© 2013, Intel Corporation. All righ ts reserved. Intel and the Intel logo are trademarks of Intel Corporation in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others.

Teaching “Think Parallel” Four positive trends toward Parallel Programming, including advances in teaching/learning James Reinders, Intel April 2013

1

Page 2: Teaching “Think Parallel”

Entdecken Sie weitere interessante Artikel und News zum Thema auf all-electronics.de!

Hier klicken & informieren!

Page 3: Teaching “Think Parallel”

© 2013, Intel Corporation. All righ ts reserved. Intel and the Intel logo are trademarks of Intel Corporation in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others. 3

Better Tools for Parallel

Programming

Better Parallel Models

Wildly more Hardware

Parallelism

Better Educated

Programmers

Page 4: Teaching “Think Parallel”

© 2013, Intel Corporation. All righ ts reserved. Intel and the Intel logo are trademarks of Intel Corporation in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others. 4

Better Tools for Parallel

Programming

Better Parallel Models

Wildly more Hardware

Parallelism

Better Educated

Programmers

Parallel Programming is IMPORTANT These FOUR factors combine to help enable parallel programming to be on the rise more quickly.

Page 5: Teaching “Think Parallel”

© 2013, Intel Corporation. All righ ts reserved. Intel and the Intel logo are trademarks of Intel Corporation in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others. 10

Better Tools for Parallel

Programming

Better Parallel Models

Wildly more Hardware

Parallelism

Better Educated

Programmers

• Industry-leading performance from advanced compilers

• Comprehensive libraries

• Parallel programming models

• Insightful analysis tools

Page 6: Teaching “Think Parallel”

© 2013, Intel Corporation. All righ ts reserved. Intel and the Intel logo are trademarks of Intel Corporation in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others. 11

Better Tools for Parallel

Programming

Better Parallel Models

Wildly more Hardware

Parallelism

Better Educated

Programmers

Intel® Advisor XE

Page 7: Teaching “Think Parallel”

© 2013, Intel Corporation. All righ ts reserved. Intel and the Intel logo are trademarks of Intel Corporation in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others. 12

Better Tools for Parallel

Programming

Better Parallel Models

Wildly more Hardware

Parallelism

Better Educated

Programmers

Intel® Composer XE

Intel® C/C++ Compiler Intel® Fortran Compiler

Intel® Math Kernel Library Intel® Integrated Performance Primitives

Page 8: Teaching “Think Parallel”

© 2013, Intel Corporation. All righ ts reserved. Intel and the Intel logo are trademarks of Intel Corporation in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others. 13

Better Tools for Parallel

Programming

Better Parallel Models

Wildly more Hardware

Parallelism

Better Educated

Programmers

Intel® Inspector XE

Page 9: Teaching “Think Parallel”

© 2013, Intel Corporation. All righ ts reserved. Intel and the Intel logo are trademarks of Intel Corporation in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others. 14

Better Tools for Parallel

Programming

Better Parallel Models

Wildly more Hardware

Parallelism

Better Educated

Programmers

Intel® VTune™ Amplifier XE

Page 10: Teaching “Think Parallel”

© 2013, Intel Corporation. All righ ts reserved. Intel and the Intel logo are trademarks of Intel Corporation in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others. 15

Better Tools for Parallel

Programming

Better Parallel Models

Wildly more Hardware

Parallelism

Better Educated

Programmers

Parallel Programming is IMPORTANT Programming models are improving to be more productive.

Page 11: Teaching “Think Parallel”

© 2013, Intel Corporation. All righ ts reserved. Intel and the Intel logo are trademarks of Intel Corporation in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others.

• Standard used by many parallel applications – Supported by every major compiler for Fortran and C

• OpenMP 4.0 standard – new mid-2013

!$omp parallel do do i=1,10 A(i) = B(i) * C(i) enddo !$omp end parallel

OpenMP* (Open Multi-Processing)

Better Tools for Parallel

Programming

Better Parallel Models

Wildly more Hardware

Parallelism

Better Educated

Programmers

Page 12: Teaching “Think Parallel”

© 2013, Intel Corporation. All righ ts reserved. Intel and the Intel logo are trademarks of Intel Corporation in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others.

• Support in Intel Compilers since 2011 • OpenMP 4.0 standard – new mid-2013

– Expect to see in all OpenMP compliant compilers!

#pragma omp simd reduction(+:val) reduction(+:val2) for(int pos = 0; pos < RAND_N; pos++) { float callValue=expectedCall(Sval,Xval,MuByT,VBySqrtT,l_Random[pos]); val += callValue; val2 += callValue * callValue; }

SIMD directives: Intel innovation

Better Tools for Parallel

Programming

Better Parallel Models

Wildly more Hardware

Parallelism

Better Educated

Programmers

Page 13: Teaching “Think Parallel”

© 2013, Intel Corporation. All righ ts reserved. Intel and the Intel logo are trademarks of Intel Corporation in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others.

• OLDER methods (like IVDEP directives, restrict keyword, etc.): – Keep adding directives, keywords, compile time switches, hints, etc.

hoping you code will vectorize – Pro: start with working code, each step of the way it continues to work – Pro: if your algorithm, as written, cannot safely vectorize – the

compiler will never vectorize it (hard for most programmers to see as different than the compiler is being too conservative)

– Con: you are trying to guess what additional information the compiler needs to be comfortable in order to vectorize

SIMD vs. Prior methods (you can now choose!)

Page 14: Teaching “Think Parallel”

© 2013, Intel Corporation. All righ ts reserved. Intel and the Intel logo are trademarks of Intel Corporation in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others.

• OLDER methods (like IVDEP directives, restrict keyword, etc.): – Keep adding directives, keywords, compile time switches, hints, etc.

hoping you code will vectorize – Pro: start with working code, each step of the way it continues to work – Pro: if your algorithm, as written, cannot safely vectorize – the

compiler will never vectorize it (hard for most programmers to see as different than the compiler is being too conservative)

– Con: you are trying to guess what additional information the compiler needs to be comfortable in order to vectorize

• NEW (SIMD directives) method:

– Pro: Add the directive, and the compiler WILL vectorize the code. No time is spent fussing with the compiler and worrying it is too conservative.

– Con: right or wrong, it is vectorized – if wrong, you have debugging work to do to discover how to restructure your algorithm to vectorize without changing results. No more “help” from the compiler at noticing real problems with changing your algorithm to use vector instructions.

SIMD vs. Prior methods (you can now choose!)

Page 15: Teaching “Think Parallel”

© 2013, Intel Corporation. All righ ts reserved. Intel and the Intel logo are trademarks of Intel Corporation in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others.

OpenMP 4.0 offers industry convergence – for a true standard; Intel first to support!

Feature OpenACC LEO Desired Standard

Support for C and C++, Fortran ✔ ✔ ✔ Support single code base of herero-machine ✔ ✔ ✔ Overlap communication and computation ✔ ✔ ✔ Interoperate with MPI ✔ ✔ ✔ Interoperate with OpenMP ✔ ✔ Offload to GPU ✔ ✔ Offload to MIC Co-processor ✔ ✔ Ability to support all accelerators ✔ Ability to support all GPUs ✔ Ability to support all co-processors ✔ Proof of performance portability ✔ Support for nested parallelism ✔ ✔ User-managed memory consistency ✔ ✔ ✔ Multiple vendor support ✔ ✔ Restrict clause support ✔ Support for dynamic dispatch ✔ ✔ Parallel on/off separate from offload ✔ ✔ PGI, CAPS compiler support 2012 ✔ Cray compiler support soon ✔ Intel compiler support 2010* ✔ Broad standards body approval ✔ OpenMP 4.0 (late 2012) maybe * public product availability was 2012

Better Tools for Parallel

Programming

Better Parallel Models

Wildly more Hardware

Parallelism

Better Educated

Programmers

Page 16: Teaching “Think Parallel”

© 2013, Intel Corporation. All righ ts reserved. Intel and the Intel logo are trademarks of Intel Corporation in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others.

threadingbuildingblocks.org

TBB for C++ scaling Most popular solution for C++ parallel programming

Better Tools for Parallel

Programming

Better Parallel Models

Wildly more Hardware

Parallelism

Better Educated

Programmers

Page 17: Teaching “Think Parallel”

© 2013, Intel Corporation. All righ ts reserved. Intel and the Intel logo are trademarks of Intel Corporation in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others.

cilkplus.org

TBB has a “sister” Cilk™ Plus: • Help for C programmers • Involve compiler • Vectorization support

Better Tools for Parallel

Programming

Better Parallel Models

Wildly more Hardware

Parallelism

Better Educated

Programmers

Page 18: Teaching “Think Parallel”

© 2013, Intel Corporation. All righ ts reserved. Intel and the Intel logo are trademarks of Intel Corporation in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others.

Intel® Cilk™ Plus

23

Cilk™ Plus

Tasking

Cilk Keywords

Hyperobjects

Vectorization

Array Notation

SIMD Annotation

Elemental Functions

cilkplus.org

Better Tools for Parallel

Programming

Better Parallel Models

Wildly more Hardware

Parallelism

Better Educated

Programmers

Page 19: Teaching “Think Parallel”

© 2013, Intel Corporation. All righ ts reserved. Intel and the Intel logo are trademarks of Intel Corporation in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others.

Intel products, plus gcc and LLVM branches available

Intel® Cilk™ Plus

24

Cilk™ Plus

Tasking

Cilk Keywords

Hyperobjects

Vectorization

Array Notation

SIMD Annotation

Elemental Functions

cilkplus.org

Better Tools for Parallel

Programming

Better Parallel Models

Wildly more Hardware

Parallelism

Better Educated

Programmers

Page 20: Teaching “Think Parallel”

© 2013, Intel Corporation. All righ ts reserved. Intel and the Intel logo are trademarks of Intel Corporation in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others.

< adopted by OpenMP 4.0 (mid-2013)

< adopted by OpenMP 4.0 (mid-2013)

Intel products, plus gcc and LLVM branches available

Intel® Cilk™ Plus

25

Cilk™ Plus

Tasking

Cilk Keywords

Hyperobjects

Vectorization

Array Notation

SIMD Annotation

Elemental Functions

cilkplus.org

Better Tools for Parallel

Programming

Better Parallel Models

Wildly more Hardware

Parallelism

Better Educated

Programmers

Page 21: Teaching “Think Parallel”

© 2013, Intel Corporation. All righ ts reserved. Intel and the Intel logo are trademarks of Intel Corporation in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others. 26

Better Tools for Parallel

Programming

Better Parallel Models

Wildly more Hardware

Parallelism

Better Educated

Programmers

Parallel Programming Model abstractions that yield portability, performance, productivity, usability, maintainability.

Page 22: Teaching “Think Parallel”

© 2013, Intel Corporation. All righ ts reserved. Intel and the Intel logo are trademarks of Intel Corporation in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others. 27

Better Tools for Parallel

Programming

Better Parallel Models

Wildly more Hardware

Parallelism

Better Educated

Programmers

Parallel Programming is IMPORTANT These FOUR factors combine to help enable parallel programming to be on the rise more quickly.

Page 23: Teaching “Think Parallel”

© 2013, Intel Corporation. All righ ts reserved. Intel and the Intel logo are trademarks of Intel Corporation in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others. 28

Better Tools for Parallel

Programming

Better Parallel Models

Wildly more Hardware

Parallelism

Better Educated

Programmers

Page 24: Teaching “Think Parallel”

© 2013, Intel Corporation. All righ ts reserved. Intel and the Intel logo are trademarks of Intel Corporation in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others.

Intel® Xeon Phi™ Coprocessors Highly-parallel Processing for Unparalleled Discovery

29

Groundbreaking: differences

Up to 61 IA cores/1.1 GHz/ 244 Threads

Up to 8GB memory with up to 352 GB/s bandwidth

512-bit SIMD instructions

Linux operating system, IP addressable

Standard programming languages and tools

Leading to Groundbreaking results

Up to 1 TeraFlop/s double precision peak performance1 Enjoy up to 2.2x higher memory bandwidth than on an Intel® Xeon® processor E5 family-based server.2

Up to 4x more performance per watt than with an Intel® Xeon® processor E5 family-based server. 3

Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. For more information go to http://www.intel.com/performance Notes 1, 2 & 3, see backup for system configuration details.

Page 25: Teaching “Think Parallel”

© 2013, Intel Corporation. All righ ts reserved. Intel and the Intel logo are trademarks of Intel Corporation in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others.

Intel® Xeon Phi™ Coprocessors: They’re So Much More General purpose IA Hardware leads to less idle time for your investment.

30

Intel® Xeon Phi™ Coprocessor Custom HW Acceleration

It’s a supercomputer on a chip

GPU ASIC FPGA

Restrictive architectures

Source: Intel Estimates

Restrictive architectures limit the ability for applications to use arbitrary nested parallelism, functions calls and threading models

Operate as a compute node

Run a full OS

Program to MPI

Run x86 code

Run restricted code Run offloaded code

Page 26: Teaching “Think Parallel”

© 2013, Intel Corporation. All righ ts reserved. Intel and the Intel logo are trademarks of Intel Corporation in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others.

vision span from few cores to many cores with consistent models, languages, tools, and techniques

31

Page 27: Teaching “Think Parallel”

© 2013, Intel Corporation. All righ ts reserved. Intel and the Intel logo are trademarks of Intel Corporation in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others.

Multicore CPU Multicore CPU Intel® MIC architecture coprocessor

Source

Compilers Libraries,

Parallel Models

32

Page 28: Teaching “Think Parallel”

© 2013, Intel Corporation. All righ ts reserved. Intel and the Intel logo are trademarks of Intel Corporation in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others.

Multicore CPU Multicore CPU Intel® MIC architecture coprocessor

Source

Compilers Libraries,

Parallel Models

Game Changer

“Unparalleled productivity… most of this software does not run on a GPU” - Robert Harrison, NICS, ORNL

“R. Harrison, “Opportunities and Challenges Posed by Exascale Computing - ORNL's Plans and Perspectives”, National Institute of Computational Sciences, Nov 2011”

33

Intel® Trace Analyzer and Collector

Intel® MPI Library

Page 29: Teaching “Think Parallel”

© 2013, Intel Corporation. All righ ts reserved. Intel and the Intel logo are trademarks of Intel Corporation in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others.

Intel® Inspector XE, Intel® VTune™ Amplifier XE,

Intel® Advisor

Intel® C/C++ and Fortran Compilers w/OpenMP

Intel® MKL, Intel® Cilk Plus, Intel® TBB, and Intel® IPP

Intel® Parallel Studio XE

+ Intel® Trace Analyzer and Collector

+ Intel® MPI Library

34

Page 30: Teaching “Think Parallel”

© 2013, Intel Corporation. All righ ts reserved. Intel and the Intel logo are trademarks of Intel Corporation in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others.

Intel® Inspector XE, Intel® VTune™ Amplifier XE,

Intel® Advisor

Intel® C/C++ and Fortran Compilers w/OpenMP

Intel® MKL, Intel® Cilk Plus, Intel® TBB, and Intel® IPP

Intel® Parallel Studio XE

Intel® Trace Analyzer and Collector

Intel® MPI Library

35

Page 31: Teaching “Think Parallel”

© 2013, Intel Corporation. All righ ts reserved. Intel and the Intel logo are trademarks of Intel Corporation in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others.

SMP on a chip…

Page 32: Teaching “Think Parallel”

© 2013, Intel Corporation. All righ ts reserved. Intel and the Intel logo are trademarks of Intel Corporation in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others.

Intel® Xeon Phi™ Coprocessor: Increases Application Performance up to 10x

37

Application Performance Examples

* Xeon = Intel® Xeon® processor; * Xeon Phi = Intel® Xeon Phi™ coprocessor

Customer Application Performance Increase1 vs. 2S Xeon*

Los Alamos Molecular Dynamics Up to 2.52x

Acceleware 8th order isotropic variable velocity

Up to 2.05x

Jefferson Labs Lattice QCD Up to 2.27x

Financial Services

BlackScholes SP Monte Carlo SP

Up to 7x

Up to 10.75x

Sinopec Seismic Imaging Up to 2.53x2

Sandia Labs miniFE (Finite Element Solver)

Up to 2x3

Intel Labs Ray Tracing (incoherent rays)

Up to 1.88x4

Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in ully evaluating your contemplated purchases, including the performance of that product when combined with other products. Source: Customer Measured results as of October 22, 2012 Configuration Details: Please reference slide speaker notes. For more information go to http://www.intel.com/performance

Notes: 1. 2S Xeon* vs. 1 Xeon Phi* (preproduction HW/SW & Application running 100% on coprocessor unless otherwise noted) 2. 2S Xeon* vs. 2S Xeon* + 2 Xeon Phi* (offload) 3. 8 node cluster, each node with 2S Xeon* (comparison is cluster performance with and without 1 Xeon Phi* per node) (Hetero) 4. Intel Measured Oct. 2012

• Intel® Xeon Phi™ coprocessor accelerates highly parallel & vectorizable applications. (graph above) • Table provides examples of such applications

Page 33: Teaching “Think Parallel”

640

1,729

0

500

1000

1500

2000

2S Intel® Xeon®Processor

1 Intel® Xeon Phi™ coprocessor

SGEMM (GF/s)

Synthetic Benchmark Summary (Intel® MKL) (5110P)

38

Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. Source: Intel Measured results as of October 26, 2012 Configuration Details: Please reference slide speaker notes.For more information go to http://www.intel.com/performance

Up to 2.7X

309

833

0

200

400

600

800

1000

2S Intel® Xeon®processor

1 Intel® Xeon Phi™ coprocessor

DGEMM (GF/s)

303

722

0

200

400

600

800

1000

2S Intel® Xeon®processor

1 Intel® Xeon Phi™ coprocessor

SMP Linpack (GF/s)

78

159 171

0

50

100

150

200

2S Intel®Xeon®

processor

1 Intel® Xeon Phi™

coprocessor

1 Intel® Xeon Phi™

coprocessor

STREAM Triad (GB/s)

Up to 2.7X Up to 2.3X Up to 2.1X

Notes 1. Intel® Xeon® Processor E5-2670 used for all SGEMM Matrix = 13824 x 13824 , DGEMM Matrix 7936 x 7936, SMP Linpack Matrix 30720 x 30720 2. Intel® Xeon Phi™ coprocessor 5110P (ECC on) with “Gold Release Candidate” SW stack SGEMM Matrix = 11264 x 11264, DGEMM Matrix 7680 x 7680, SMP Linpack Matrix 26872 x 28672

ECC O

n

ECC O

ff

85%

Effic

ient

82%

Effic

ient

71%

Effic

ient

Higher is Better Higher is Better Higher is Better Higher is Better

Coprocessor results: Benchmark run 100% on coprocessor, no help from Intel® Xeon® processor host (aka native)

Page 34: Teaching “Think Parallel”

© 2013, Intel Corporation. All righ ts reserved. Intel and the Intel logo are trademarks of Intel Corporation in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others. 39

Better Tools for Parallel

Programming

Better Parallel Models

Wildly more Hardware

Parallelism

Better Educated

Programmers

Lots of Parallelism is a big deal

Page 35: Teaching “Think Parallel”

© 2013, Intel Corporation. All righ ts reserved. Intel and the Intel logo are trademarks of Intel Corporation in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others.

Picture worth many words

40

Page 36: Teaching “Think Parallel”

© 2013, Intel Corporation. All righ ts reserved. Intel and the Intel logo are trademarks of Intel Corporation in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others.

http://tinyurl.com/inteljames twitter @jamesreinders http://intel.com/software/mic

41

Page 37: Teaching “Think Parallel”

© 2013, Intel Corporation. All righ ts reserved. Intel and the Intel logo are trademarks of Intel Corporation in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others.

Picture worth many words

42

Page 38: Teaching “Think Parallel”

© 2013, Intel Corporation. All righ ts reserved. Intel and the Intel logo are trademarks of Intel Corporation in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others.

Books to Help “Think Parallel”

Page 39: Teaching “Think Parallel”

© 2013, Intel Corporation. All righ ts reserved. Intel and the Intel logo are trademarks of Intel Corporation in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others.

Intel® Xeon Phi™ Coprocessor High Performance Programming, Jim Jeffers, James Reinders, (c) 2013, publisher: Morgan Kaufmann

It all comes down to PARALLEL PROGRAMMING ! (applicable to processors and Intel® Xeon Phi™ coprocessors both) Forward, Preface Chapters: 1. Introduction 2. High Performance Closed Track

Test Drive! 3. A Friendly Country Road Race 4. Driving Around Town:

Optimizing A Real-World Code Example

5. Lots of Data (Vectors) 6. Lots of Tasks (not Threads) 7. Offload 8. Coprocessor Architecture 9. Coprocessor System Software 10. Linux on the Coprocessor 11. Math Library 12. MPI 13. Profiling and Timing 14. Summary Glossary, Index

Available since February 2013.

This book belongs on the bookshelf of every HPC professional. Not only

does it successfully and accessibly teach us how to use and obtain high

performance on the Intel MIC architecture, it is about much more

than that. It takes us back to the universal fundamentals of high-

performance computing including how to think and reason about the

performance of algorithms mapped to modern architectures, and it puts into your hands powerful tools that

will be useful for years to come. —Robert J. Harrison

Institute for Advanced Computational Science, Stony Brook University

Learn more about this book:

lotsofcores.com

“© 2013, James Reinders & Jim Jeffers, book image used with permission

Page 40: Teaching “Think Parallel”
Page 41: Teaching “Think Parallel”
Page 42: Teaching “Think Parallel”
Page 43: Teaching “Think Parallel”

© 2013, Intel Corporation. All righ ts reserved. Intel and the Intel logo are trademarks of Intel Corporation in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others.

Picture worth many words

48

Page 44: Teaching “Think Parallel”
Page 45: Teaching “Think Parallel”
Page 46: Teaching “Think Parallel”
Page 47: Teaching “Think Parallel”
Page 48: Teaching “Think Parallel”

© 2013, Intel Corporation. All righ ts reserved. Intel and the Intel logo are trademarks of Intel Corporation in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others.

It all comes down to PARALLEL PROGRAMMING ! (applicable to processors and Intel® Xeon Phi™ coprocessors both)

Forward, Preface

Chapters:

1.Introduction

2.High Performance Closed Track Test Drive!

3.A Friendly Country Road Race

4.Driving Around Town: Optimizing A Real-World Code Example

5.Lots of Data (Vectors)

6.Lots of Tasks (not Threads)

7.Offload

8.Coprocessor Architecture

9.Coprocessor System Software

10. Linux on the Coprocessor

11. Math Library

12. MPI

13. Profiling and Timing

14. Summary

Glossary, Index

Page 49: Teaching “Think Parallel”

© 2013, Intel Corporation. All righ ts reserved. Intel and the Intel logo are trademarks of Intel Corporation in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others.

This is a really great book…

I've been dreaming for a while of a modern accessible book that I could

recommend to my threading-deprived colleagues and

assorted enquirers to get them up to speed with the core concepts of

multithreading as well as something that covers all the major current

interesting implementations.

Finally I have that book.

—Martin Watt, Principal Engineer,

Dreamworks Animation

Structured Parallel Programming, Michael McCool, Arch Robison, James Reinders (c) 2012, publisher: Morgan Kaufmann

Teaches parallel programming using a new pattern-based approach. Extensive examples in Cilk Plus and TBB. Not about any specific hardware, but relevant to all. It’s about effective parallel programming. Great for teaching!

Learn more about this book:

parallelbook.com

Available since July 2012.

© 2012, Michael McCool, Arch Robison, James Reinders, book image used with permission

Page 50: Teaching “Think Parallel”

© 2013, Intel Corporation. All righ ts reserved. Intel and the Intel logo are trademarks of Intel Corporation in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others.

Structured Parallel Programming, Michael McCool, Arch Robison, James Reinders (c) 2012, publisher: Morgan Kaufmann

Learn more about this book:

parallelbook.com

Available since July 2012 in English. February 2013 in Japanese.

© 2012, Michael McCool, Arch Robison, James Reinders, book image used with permission

Page 51: Teaching “Think Parallel”

© 2013, Intel Corporation. All righ ts reserved. Intel and the Intel logo are trademarks of Intel Corporation in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others.

Teaching Parallelism • Patterns & our parallel programming tools • Map, Reduce:

– Dot product, Cilk Plus • Stencil, Recurrence:

– Forward seismic simulation, Cilk Plus • Pipeline:

– Compression, Cilk Plus and TBB

Page 52: Teaching “Think Parallel”

© 2013, Intel Corporation. All righ ts reserved. Intel and the Intel logo are trademarks of Intel Corporation in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others.

Structured Programming with Patterns • Patterns are “best practices” for solving

specific problems. • Patterns can be used to organize your

code, leading to algorithms that are more scalable and maintainable.

• A pattern supports a particular algorithmic structure with an efficient implementation.

• Intel’s tools support a set of useful parallel patterns with low-overhead implementations.

Page 53: Teaching “Think Parallel”

© 2013, Intel Corporation. All righ ts reserved. Intel and the Intel logo are trademarks of Intel Corporation in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others.

Structured Serial Patterns The following patterns are the basis of “structured programming” for serial computation: • Sequence • Selection • Iteration • Nesting • Functions • Recursion

• Random read • Random write • Stack allocation • Heap allocation • Objects • Closures

Compositions of structured serial control flow patterns can be used in place of unstructured mechanisms such as “goto.” Using these patterns, “goto” can (mostly)

be eliminated and the maintainability of software improved.

Page 54: Teaching “Think Parallel”

© 2013, Intel Corporation. All righ ts reserved. Intel and the Intel logo are trademarks of Intel Corporation in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others.

Structured Parallel Patterns The following additional parallel patterns can be used for “structured parallel programming”:

• Superscalar sequence • Speculative selection • Map • Recurrence • Scan • Reduce • Pack/expand • Fork/join • Pipeline

• Partition • Segmentation • Stencil • Search/match • Gather • Merge scatter • Priority scatter • *Permutation scatter • !Atomic scatter

Using these patterns, threads and vector intrinsics can (mostly) be eliminated and the

maintainability of software improved.

Page 55: Teaching “Think Parallel”

© 2013, Intel Corporation. All righ ts reserved. Intel and the Intel logo are trademarks of Intel Corporation in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others.

• Map invokes a function on every element of an index set.

• The index set may be abstract or associated with the elements of an array.

• Corresponds to “parallel loop” where iterations are independent.

Examples: gamma correction and thresholding in images; color space conversions; Monte Carlo sampling; ray tracing.

Map

Page 56: Teaching “Think Parallel”

© 2013, Intel Corporation. All righ ts reserved. Intel and the Intel logo are trademarks of Intel Corporation in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others.

• Reduce combines every element in a collection into one using an associative operator:

x+(y+z) = (x+y)+z • For example: reduce can be

used to find the sum or maximum of an array.

• Vectorization may require that the operator also be commutative:

x+y = y+x

Examples: averaging of Monte Carlo samples; convergence testing; image comparison metrics; matrix operations.

Reduce

Page 57: Teaching “Think Parallel”

© 2013, Intel Corporation. All righ ts reserved. Intel and the Intel logo are trademarks of Intel Corporation in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others.

• Stencil applies a function to neighbourhoods of an array.

• Neighbourhoods are given by set of relative offsets.

• Boundary conditions need to be considered.

Examples: image filtering including convolution, median, anisotropic diffusion

Stencil

Page 58: Teaching “Think Parallel”

© 2013, Intel Corporation. All righ ts reserved. Intel and the Intel logo are trademarks of Intel Corporation in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others.

• Recurrence results from loop nests with both input and output dependencies between iterations

• Can also result from iterated stencils

Examples: Simulation including

fluid flow, electromagnetic, and financial PDE solvers, lattice QCD, sequence alignment and pattern matching

Recurrence

Page 59: Teaching “Think Parallel”

© 2013, Intel Corporation. All righ ts reserved. Intel and the Intel logo are trademarks of Intel Corporation in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others.

• Pipeline uses a sequence of stages that transform a flow of data

• Some stages may retain state

• Data can be consumed and produced incrementally: “online”

Examples: image filtering, data compression and decompression, signal processing

Pipeline

Page 60: Teaching “Think Parallel”

© 2013, Intel Corporation. All righ ts reserved. Intel and the Intel logo are trademarks of Intel Corporation in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others.

Pipeline: Cilk Plus and TBB

66

parallel_pipeline ( ntoken, make_filter<void,T>( filter::serial_in_order, [&]( flow_control & fc ) -> T{ T item = f(); if( !item ) fc.stop(); return item; } ) & make_filter<T,U>( filter::parallel, g ) & make_filter<U,void>( filter:: serial_in_order, h ) );

Intel® TBB

S s; reducer_consume<S,U> sink ( &s, h ); ... void Stage2( T x ) { sink.consume(g(x)); } ... while( T x = f() ) cilk_spawn Stage2(x); cilk_sync;

Intel® Cilk™ Plus (special case)

Page 61: Teaching “Think Parallel”

© 2013, Intel Corporation. All righ ts reserved. Intel and the Intel logo are trademarks of Intel Corporation in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others.

This is a really great book…

I've been dreaming for a while of a modern accessible book that I could

recommend to my threading-deprived colleagues and

assorted enquirers to get them up to speed with the core concepts of

multithreading as well as something that covers all the major current

interesting implementations.

Finally I have that book.

—Martin Watt, Principal Engineer,

Dreamworks Animation

Structured Parallel Programming, Michael McCool, Arch Robison, James Reinders (c) 2012, publisher: Morgan Kaufmann

Teaches parallel programming using a new pattern-based approach. Extensive examples in Cilk Plus and TBB. Not about any specific hardware, but relevant to all. It’s about effective parallel programming. Great for teaching!

Learn more about this book:

parallelbook.com

Available since July 2012.

© 2012, Michael McCool, Arch Robison, James Reinders, book image used with permission

Page 62: Teaching “Think Parallel”

© 2013, Intel Corporation. All righ ts reserved. Intel and the Intel logo are trademarks of Intel Corporation in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others.

http://intel.com/software/mic

Page 63: Teaching “Think Parallel”

© 2013, Intel Corporation. All righ ts reserved. Intel and the Intel logo are trademarks of Intel Corporation in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others.

parallel programming from few to many cores with consistent models, languages, tools, and techniques

http://intel.com/software/mic

http://tinyurl.com/inteljames twitter @jamesreinders

69

Page 64: Teaching “Think Parallel”
Page 65: Teaching “Think Parallel”

© 2013, Intel Corporation. All righ ts reserved. Intel and the Intel logo are trademarks of Intel Corporation in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others.

INFORMATION IN THIS DOCUMENT IS PROVIDED “AS IS”. NO LICENSE, EXPRESS OR IMPLIED, BY ESTOPPEL OR OTHERWISE, TO ANY INTELLECTUAL PROPERTY RIGHTS IS GRANTED BY THIS DOCUMENT. INTEL ASSUMES NO LIABILITY WHATSOEVER AND INTEL DISCLAIMS ANY EXPRESS OR IMPLIED WARRANTY, RELATING TO THIS INFORMATION INCLUDING LIABILITY OR WARRANTIES RELATING TO FITNESS FOR A PARTICULAR PURPOSE, MERCHANTABILITY, OR INFRINGEMENT OF ANY PATENT, COPYRIGHT OR OTHER INTELLECTUAL PROPERTY RIGHT. Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. Copyright © 2013, Intel Corporation. All rights reserved. Intel, the Intel logo, Xeon, Core, VTune, and Cilk are trademarks of Intel Corporation in the U.S. and other countries.

Optimization Notice

Intel’s compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice.

Notice revision #20110804

Legal Disclaimer & Optimization Notice

Copyright© 2012, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.

72

4/16/201