20160801 0930 atpesc argonne reinders

92
ATPESC (Argonne Training Program on Extreme-Scale Computing) © 2016, James Reinders. All rights reserved. Intel, the Intel logo, Intel Inside, Cilk, VTune, Xeon, and Xeon Phi are trademarks of Intel Corporation in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others. Computer Architecture Essentials James Reinders August 1, 2016, Pheasant Run, St Charles, IL 09:30-10:15

Upload: others

Post on 25-Apr-2022

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: 20160801 0930 ATPESC Argonne Reinders

ATPESC(Argonne Training Program on Extreme-Scale Computing)

© 2016, James Reinders. All rights reserved. Intel, the Intel logo, Intel Inside, Cilk, VTune, Xeon, and Xeon Phi are trademarks of Intel Corporation in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others.

Computer Architecture EssentialsJames ReindersAugust 1, 2016, Pheasant Run, St Charles, IL09:30-10:15

Page 2: 20160801 0930 ATPESC Argonne Reinders

© 2016, James Reinders. All rights reserved. Intel, the Intel logo, Intel Inside, Cilk, VTune, Xeon, and Xeon Phi are trademarks of Intel Corporation in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others.

Knights Landing Clustering and Memory Modes, use and implications on the future of architecture and memory configurations.

Vectorization, current state of the art thinking, use and implications on the future of data parallelism through threading + SIMD instructions.

Page 3: 20160801 0930 ATPESC Argonne Reinders

© 2016, James Reinders. All rights reserved. Intel, the Intel logo, Intel Inside, Cilk, VTune, Xeon, and Xeon Phi are trademarks of Intel Corporation in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others.

3

I have been fortunate, and I like to share. J

Page 4: 20160801 0930 ATPESC Argonne Reinders

© 2016, James Reinders. All rights reserved. Intel, the Intel logo, Intel Inside, Cilk, VTune, Xeon, and Xeon Phi are trademarks of Intel Corporation in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others.

4

2016KNL die.ENZO cosmology simulation of the intergalactic “cosmic web” of gravitational filaments that link galaxies, define the structure of the universe, and indicate dark matter.

Page 5: 20160801 0930 ATPESC Argonne Reinders

© 2016, James Reinders. All rights reserved. Intel, the Intel logo, Intel Inside, Cilk, VTune, Xeon, and Xeon Phi are trademarks of Intel Corporation in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others.

5

2013KNC dies (close in wafer shot).Artist added the cosmic scene.

Page 6: 20160801 0930 ATPESC Argonne Reinders

© 2016, James Reinders. All rights reserved. Intel, the Intel logo, Intel Inside, Cilk, VTune, Xeon, and Xeon Phi are trademarks of Intel Corporation in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others.

6

2014First KNC machine (TACC).

2013KNC dies (close in wafer shot).Artist added the cosmic scene.

Page 7: 20160801 0930 ATPESC Argonne Reinders

© 2016, James Reinders. All rights reserved. Intel, the Intel logo, Intel Inside, Cilk, VTune, Xeon, and Xeon Phi are trademarks of Intel Corporation in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others.

7

2014First KNC machine (TACC).

2013KNC dies (close in wafer shot).Artist added the cosmic scene.

Page 8: 20160801 0930 ATPESC Argonne Reinders

© 2016, James Reinders. All rights reserved. Intel, the Intel logo, Intel Inside, Cilk, VTune, Xeon, and Xeon Phi are trademarks of Intel Corporation in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others.

8

2015SGI machine (Cambridge).3D visualization of statistical fluctuations in the Cosmic Microwave Background, the remnant of the first visible light after the Big Bang. CMB data is from the Planck satellite and is the topic of Chapter 10 providing insights into the new physics and how the universe evolved.

2014First KNC machine (TACC).

2013KNC dies (close in wafer shot).Artist added the cosmic scene.

Page 9: 20160801 0930 ATPESC Argonne Reinders

© 2016, James Reinders. All rights reserved. Intel, the Intel logo, Intel Inside, Cilk, VTune, Xeon, and Xeon Phi are trademarks of Intel Corporation in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others.

9

2016KNL die.ENZO cosmology simulation of the intergalactic “cosmic web” of gravitational filaments that link galaxies, define the structure of the universe, and indicate dark matter.

2015SGI machine (Cambridge).3D visualization of statistical fluctuations in the Cosmic Microwave Background, the remnant of the first visible light after the Big Bang. CMB data is from the Planck satellite and is the topic of Chapter 10 providing insights into the new physics and how the universe evolved.

2014First KNC machine (TACC).

2013KNC dies (close in wafer shot).Artist added the cosmic scene.

Page 10: 20160801 0930 ATPESC Argonne Reinders

© 2016, James Reinders. All rights reserved. Intel, the Intel logo, Intel Inside, Cilk, VTune, Xeon, and Xeon Phi are trademarks of Intel Corporation in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others.

10

I have found...example after example

getting performance andperformance portability with“just parallel programming.”

Page 11: 20160801 0930 ATPESC Argonne Reinders

© 2016, James Reinders. All rights reserved. Intel, the Intel logo, Intel Inside, Cilk, VTune, Xeon, and Xeon Phi are trademarks of Intel Corporation in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others.

< My daughter’s favorite movie.

Page 12: 20160801 0930 ATPESC Argonne Reinders

© 2016, James Reinders. All rights reserved. Intel, the Intel logo, Intel Inside, Cilk, VTune, Xeon, and Xeon Phi are trademarks of Intel Corporation in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others.

Computer Architecture is FUN AGAIN

12

we need to make sure

software is not

collateral damage.

Performance andPerformance Portabilityshould be a requirement.

Page 13: 20160801 0930 ATPESC Argonne Reinders

© 2016, James Reinders. All rights reserved. Intel, the Intel logo, Intel Inside, Cilk, VTune, Xeon, and Xeon Phi are trademarks of Intel Corporation in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others.

Interesting

13

Shared vs. Discrete Memory Spaces / Memory System Design

Integration of combinations vs. Discrete Building Blocks

Fully Capable Programming Support vs. Restrictive Programming

Hardware Configurability

Power Consumption

Scalability

Page 14: 20160801 0930 ATPESC Argonne Reinders

© 2016, James Reinders. All rights reserved. Intel, the Intel logo, Intel Inside, Cilk, VTune, Xeon, and Xeon Phi are trademarks of Intel Corporation in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others.

14

KEEPCALMAND BE

INQUISITIVEENGINEERS

Page 15: 20160801 0930 ATPESC Argonne Reinders

© 2016, James Reinders. All rights reserved. Intel, the Intel logo, Intel Inside, Cilk, VTune, Xeon, and Xeon Phi are trademarks of Intel Corporation in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others.

15

OUR LINE OF

INQUIRY TODAY:COMPUTER ARCHITECTURE

IS IT

FUN AGAINOR

OUT OF CONTROL?

Page 16: 20160801 0930 ATPESC Argonne Reinders

© 2016, James Reinders. All rights reserved. Intel, the Intel logo, Intel Inside, Cilk, VTune, Xeon, and Xeon Phi are trademarks of Intel Corporation in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others.

Three debates changing our lives

16

Scale: “many but lower performance each” vs. “higher performance each but fewer”

Capabilities: “high performance but more niche” vs.“more general but less performance”

Besides the compute (memory / communication / data movement):Oh so many things to consider.We will brush the surface.

Page 17: 20160801 0930 ATPESC Argonne Reinders

© 2016, James Reinders. All rights reserved. Intel, the Intel logo, Intel Inside, Cilk, VTune, Xeon, and Xeon Phi are trademarks of Intel Corporation in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others.

Three debates changing our lives

17

Scale: “many but lower performance each” vs. “higher performance each but fewer”

Capabilities: “high performance but more niche” vs.“more general but less performance”

Besides the compute (memory / communication / data movement):Oh so many things to consider.We will brush the surface.

Page 18: 20160801 0930 ATPESC Argonne Reinders

© 2016, James Reinders. All rights reserved. Intel, the Intel logo, Intel Inside, Cilk, VTune, Xeon, and Xeon Phi are trademarks of Intel Corporation in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others.

Three debates changing our lives

18

Scale: “many but lower performance each” vs. “higher performance each but fewer”

Capabilities: “high performance but more niche” vs.“more general but less performance”

Besides the compute (memory / communication / data movement):Oh so many things to consider.We will brush the surface.

Page 19: 20160801 0930 ATPESC Argonne Reinders

© 2016, James Reinders. All rights reserved. Intel, the Intel logo, Intel Inside, Cilk, VTune, Xeon, and Xeon Phi are trademarks of Intel Corporation in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others.

Three needs affecting the debate

19

Power: 1MW/year electricity costs ~$1M/year

Portability: Coding is expensive, doing it more is moreso.

Performance portability:You can start a bar fights with just trying to define this. *

* if the bar patrons are HPC programmers

Page 20: 20160801 0930 ATPESC Argonne Reinders

© 2016, James Reinders. All rights reserved. Intel, the Intel logo, Intel Inside, Cilk, VTune, Xeon, and Xeon Phi are trademarks of Intel Corporation in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others.

Three needs affecting the debate

20

Power: 1MW/year electricity costs ~$1M/year

Portability: Coding is expensive, doing it more is moreso.

Performance portability:You can start a bar fights with just trying to define this. *

* if the bar patrons are HPC programmers

Page 21: 20160801 0930 ATPESC Argonne Reinders

© 2016, James Reinders. All rights reserved. Intel, the Intel logo, Intel Inside, Cilk, VTune, Xeon, and Xeon Phi are trademarks of Intel Corporation in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others.

Three needs affecting the debate

21

Power: 1MW/year electricity costs ~$1M/year

Portability: Coding is expensive, doing it more is moreso.

Performance portability:You can start a bar fights with just trying to define this. *

* if the bar patrons are HPC programmers

Page 22: 20160801 0930 ATPESC Argonne Reinders

© 2016, James Reinders. All rights reserved. Intel, the Intel logo, Intel Inside, Cilk, VTune, Xeon, and Xeon Phi are trademarks of Intel Corporation in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others.

Why Multicore?

The “Free Lunch” is over, really.But Moore’s Law continues!

Page 23: 20160801 0930 ATPESC Argonne Reinders

© 2016, James Reinders. All rights reserved. Intel, the Intel logo, Intel Inside, Cilk, VTune, Xeon, and Xeon Phi are trademarks of Intel Corporation in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others.

© 2016, James Reinders, used with permission. http://lotsofcores.com

rates over time.

1973 1MHz, 2003 1 GHz

2004 3 GHz, still today…

power wall + ILP wall + memory wallsolve: parallel hardware + explicit parallel software +

software memory optimization

Processor Clock Rate over TimeGrowth haltedaround 2005

Figure 1.1(KNL book)

Page 24: 20160801 0930 ATPESC Argonne Reinders

© 2016, James Reinders. All rights reserved. Intel, the Intel logo, Intel Inside, Cilk, VTune, Xeon, and Xeon Phi are trademarks of Intel Corporation in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others.

© 2016, James Reinders, used with permission. http://lotsofcores.com

Transistors per Processor over TimeContinues to grow exponentially (Moore’s Law)

Figure 1.4(KNL book)

Page 25: 20160801 0930 ATPESC Argonne Reinders

© 2016, James Reinders. All rights reserved. Intel, the Intel logo, Intel Inside, Cilk, VTune, Xeon, and Xeon Phi are trademarks of Intel Corporation in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others.

Core and Thread Counts Width

Single core, single thread, ruled for decades.Multithread: grow die area small % for addition hardware thread(s) sharing resources.Multicore/Many Core: 100% die area for additional hardware thread without sharing,

© 2016, James Reinders, used with permission. http://lotsofcores.com

Data parallelism: handling more data at once,multibyte, multiword, many words.

Figures 1.2 and 1.3(KNL book)

Page 26: 20160801 0930 ATPESC Argonne Reinders

© 2016, James Reinders. All rights reserved. Intel, the Intel logo, Intel Inside, Cilk, VTune, Xeon, and Xeon Phi are trademarks of Intel Corporation in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others.

Core and Thread Counts Width

Single core, single thread, ruled for decades.Multithread: grow die area small % for addition hardware thread(s) sharing resources.Multicore/Many Core: 100% die area for additional hardware thread without sharing,

© 2016, James Reinders, used with permission. http://lotsofcores.com

Data parallelism: handling more data at once,multibyte, multiword, many words.

Figures 1.2 and 1.3(KNL book)

Page 27: 20160801 0930 ATPESC Argonne Reinders

© 2016, James Reinders. All rights reserved. Intel, the Intel logo, Intel Inside, Cilk, VTune, Xeon, and Xeon Phi are trademarks of Intel Corporation in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others.

Why Multicore?

The “Free Lunch” is over, really.But Moore’s Law continues!

Page 28: 20160801 0930 ATPESC Argonne Reinders

© 2016, James Reinders. All rights reserved. Intel, the Intel logo, Intel Inside, Cilk, VTune, Xeon, and Xeon Phi are trademarks of Intel Corporation in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others.

CPU

28

Mem

ory

CPUThese were simpler times.

Page 29: 20160801 0930 ATPESC Argonne Reinders

© 2016, James Reinders. All rights reserved. Intel, the Intel logo, Intel Inside, Cilk, VTune, Xeon, and Xeon Phi are trademarks of Intel Corporation in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others.

CPU + cache

29M

emory

Cache

CPUMemories got “further away”(meaning: CPU speed increasedfaster than memory speeds)

A closer “cache” for frequently useddata helps performance when memoryis no longer a single clock cycle away.

Page 30: 20160801 0930 ATPESC Argonne Reinders

© 2016, James Reinders. All rights reserved. Intel, the Intel logo, Intel Inside, Cilk, VTune, Xeon, and Xeon Phi are trademarks of Intel Corporation in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others.

CPU + caches

30

Mem

ory

(L1)Cache

L2Cache

CPUMemories keep getting “further away”(this trend continues today).

More “caches” help even more(with temporal reuse of data).

Page 31: 20160801 0930 ATPESC Argonne Reinders

© 2016, James Reinders. All rights reserved. Intel, the Intel logo, Intel Inside, Cilk, VTune, Xeon, and Xeon Phi are trademarks of Intel Corporation in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others.

CPU with caches

31

Mem

ory

As transistor density increased (Moore’s Law), cache capabilities were integrated onto CPUs.Higher performance external (discrete) caches persisted for some time while integrated cache capabilities increase.

CPUL1 L2

Page 32: 20160801 0930 ATPESC Argonne Reinders

© 2016, James Reinders. All rights reserved. Intel, the Intel logo, Intel Inside, Cilk, VTune, Xeon, and Xeon Phi are trademarks of Intel Corporation in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others.

CPU / Coprocessors

32

Coprocessors appearing first in 1970s were FP accelerators for CPUs without FP capabilities.

FP

CPUL1 L2

Mem

ory

Page 33: 20160801 0930 ATPESC Argonne Reinders

© 2016, James Reinders. All rights reserved. Intel, the Intel logo, Intel Inside, Cilk, VTune, Xeon, and Xeon Phi are trademarks of Intel Corporation in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others.

CPU / Coprocessors

33

As transistor density increased (Moore’s Law), FP capabilities were integrated onto CPUs.Higher performance discrete FP “accelerators” persisted a little bit while integrated FP capabilities increase.

CPU

FP

L1 L2

Mem

ory

Page 34: 20160801 0930 ATPESC Argonne Reinders

© 2016, James Reinders. All rights reserved. Intel, the Intel logo, Intel Inside, Cilk, VTune, Xeon, and Xeon Phi are trademarks of Intel Corporation in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others.

CPU / Coprocessors

34

Interest to provide hardware support for displays increased as use of graphics grew (games being a key driver).This led to graphics processing units (GPUs) attached to CPUs to create video displays.

Mem

ory

Early Design

Display

GPU(card)

CPU

Page 35: 20160801 0930 ATPESC Argonne Reinders

© 2016, James Reinders. All rights reserved. Intel, the Intel logo, Intel Inside, Cilk, VTune, Xeon, and Xeon Phi are trademarks of Intel Corporation in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others.

CPU / Coprocessors

35

GPU speeds and CPU speeds increase faster than memory speeds. Direct connection to memory best done via caches (on the CPU).

Mem

ory

Display

GPU(card)

CPU

FP

L1 L2

Page 36: 20160801 0930 ATPESC Argonne Reinders

© 2016, James Reinders. All rights reserved. Intel, the Intel logo, Intel Inside, Cilk, VTune, Xeon, and Xeon Phi are trademarks of Intel Corporation in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others.

CPU / Coprocessors

36

GPU speeds and CPU speeds increase faster than memory speeds. Direct connection to memory best done via caches (on the CPU).

Mem

ory

Display

GPU(card)

CPU

FP

L1 L2

Page 37: 20160801 0930 ATPESC Argonne Reinders

© 2016, James Reinders. All rights reserved. Intel, the Intel logo, Intel Inside, Cilk, VTune, Xeon, and Xeon Phi are trademarks of Intel Corporation in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others.

As transistor density increased (Moore’s Law), GPU capabilities were integrated onto CPUs.Higher performance external (discrete) GPUs persistwhile integrated GPU capabilities increase.

Display

CPU

FP

L1 L2

GPU

CPU / Coprocessors

Mem

ory

Page 38: 20160801 0930 ATPESC Argonne Reinders

© 2016, James Reinders. All rights reserved. Intel, the Intel logo, Intel Inside, Cilk, VTune, Xeon, and Xeon Phi are trademarks of Intel Corporation in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others.

A many core coprocessor (Intel® Xeon Phi™) appears, purpose built for accelerating technical computing.

CPU

FP

L1 L2

CPU / Coprocessors

Mem

ory

many core coprocessor

(card)

Page 39: 20160801 0930 ATPESC Argonne Reinders

© 2016, James Reinders. All rights reserved. Intel, the Intel logo, Intel Inside, Cilk, VTune, Xeon, and Xeon Phi are trademarks of Intel Corporation in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others.

As transistor density increased (Moore’s Law), many core capabilities will be integrated to createa many core CPU.(“Knights Landing”)

many core CPU

FP

L1 L2

CPU / Coprocessors

Mem

ory

Page 40: 20160801 0930 ATPESC Argonne Reinders

© 2016, James Reinders. All rights reserved. Intel, the Intel logo, Intel Inside, Cilk, VTune, Xeon, and Xeon Phi are trademarks of Intel Corporation in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others.

“Nodes” are building blocks for clusters.With or without GPUs.Displays not needed.

Nodes

CPU

FP

L1 L2

GPU

Mem

oryM

emory

GPU(card)

CPU

FP

L1 L2

CPU

FP

L1 L2

Mem

oryCPU

FP

L1 L2

Mem

ory

many corecoprocessor

(card)

Page 41: 20160801 0930 ATPESC Argonne Reinders

© 2016, James Reinders. All rights reserved. Intel, the Intel logo, Intel Inside, Cilk, VTune, Xeon, and Xeon Phi are trademarks of Intel Corporation in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others.

Clusters are made by connecting nodes - regardless of “Nodes” type.

Clusters NodeNIC

NodeNIC

NodeNIC

NodeNIC

NodeNIC

NodeNIC

NodeNIC

NodeNIC

NodeNIC

Page 42: 20160801 0930 ATPESC Argonne Reinders

© 2016, James Reinders. All rights reserved. Intel, the Intel logo, Intel Inside, Cilk, VTune, Xeon, and Xeon Phi are trademarks of Intel Corporation in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others.

As transistor density increased (Moore’s Law), NIC capabilities will be integrated onto CPUs.

NIC (Network InterfaceController) integration

CPU

FP

L1 L2

Mem

ory

NIC

Mem

ory

CPU

FP

L1 L2

NIC

GPU

many core CPU

FP

L1 L2

Mem

ory

NIC

Page 43: 20160801 0930 ATPESC Argonne Reinders

© 2016, James Reinders. All rights reserved. Intel, the Intel logo, Intel Inside, Cilk, VTune, Xeon, and Xeon Phi are trademarks of Intel Corporation in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others.

43

Why Intel® Xeon Phi™ Processors?

Page 44: 20160801 0930 ATPESC Argonne Reinders

© 2016, James Reinders. All rights reserved. Intel, the Intel logo, Intel Inside, Cilk, VTune, Xeon, and Xeon Phi are trademarks of Intel Corporation in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others.

4444

If you were plowing a field,which would you rather use…two strong oxen, or1024 chickens?

Page 45: 20160801 0930 ATPESC Argonne Reinders

© 2016, James Reinders. All rights reserved. Intel, the Intel logo, Intel Inside, Cilk, VTune, Xeon, and Xeon Phi are trademarks of Intel Corporation in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others.

45

Design Question - Best for Computing?

A few powerful vs. Many less powerful.

Diagrams for discussion purposes only, not a precise representation of any product of any company.

Page 46: 20160801 0930 ATPESC Argonne Reinders

© 2016, James Reinders. All rights reserved. Intel, the Intel logo, Intel Inside, Cilk, VTune, Xeon, and Xeon Phi are trademarks of Intel Corporation in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others.

46

Diagrams for discussion purposes only, not a precise representation of any product of any company.

Same programming models, languages, optimizations and tools.

A few powerful vs. Many less powerful.

Page 47: 20160801 0930 ATPESC Argonne Reinders

© 2016, James Reinders. All rights reserved. Intel, the Intel logo, Intel Inside, Cilk, VTune, Xeon, and Xeon Phi are trademarks of Intel Corporation in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others.

47

Figure 1.5(KNL book)

Page 48: 20160801 0930 ATPESC Argonne Reinders

© 2016, James Reinders. All rights reserved. Intel, the Intel logo, Intel Inside, Cilk, VTune, Xeon, and Xeon Phi are trademarks of Intel Corporation in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others.

Three debates changing our lives

48

Scale: “many but lower performance each” vs.“higher performance each but fewer”

Capabilities: “high performance but more niche” vs.“more general but less performance”

Page 49: 20160801 0930 ATPESC Argonne Reinders

© 2016, James Reinders. All rights reserved. Intel, the Intel logo, Intel Inside, Cilk, VTune, Xeon, and Xeon Phi are trademarks of Intel Corporation in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others.

Three debates changing our lives

49

Scale: “many but lower performance each” vs. “higher performance each but fewer”

Capabilities: “high performance but more niche” vs.“more general but less performance”

Page 50: 20160801 0930 ATPESC Argonne Reinders

© 2016, James Reinders. All rights reserved. Intel, the Intel logo, Intel Inside, Cilk, VTune, Xeon, and Xeon Phi are trademarks of Intel Corporation in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others.

Three needs affecting the debate

50

Power: 1MW/year electricity costs ~$1M/year

Portability: Coding is expensive, doing it more is moreso.

Performance portability:You can start a bar fights with just trying to define this. *

* if the bar patrons are HPC programmers

Page 51: 20160801 0930 ATPESC Argonne Reinders

© 2016, James Reinders. All rights reserved. Intel, the Intel logo, Intel Inside, Cilk, VTune, Xeon, and Xeon Phi are trademarks of Intel Corporation in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others.

visionspan from few cores to many coreswith consistent models,languages, tools, and techniques

51 51

Page 52: 20160801 0930 ATPESC Argonne Reinders

© 2016, James Reinders. All rights reserved. Intel, the Intel logo, Intel Inside, Cilk, VTune, Xeon, and Xeon Phi are trademarks of Intel Corporation in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others.

52

Knights Landing2nd Generation Intel® Xeon Phi™

ALL ABOUT PARALEL PROGRAMMINGThreading, Vectorization, Data LocalityFortran, C, C++ (plus a little Python)OpenMPI, MPI, TBB

New with Knights Landing:AVX-512, High Bandwidth Memory (MCDRAM),Cluster Mode, Omni-Path

Page 53: 20160801 0930 ATPESC Argonne Reinders

© 2016, James Reinders. All rights reserved. Intel, the Intel logo, Intel Inside, Cilk, VTune, Xeon, and Xeon Phi are trademarks of Intel Corporation in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others.

53

Page 54: 20160801 0930 ATPESC Argonne Reinders

© 2016, James Reinders. All rights reserved. Intel, the Intel logo, Intel Inside, Cilk, VTune, Xeon, and Xeon Phi are trademarks of Intel Corporation in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others.

54

2nd GenerationIntel® Xeon Phi™Products

Codename:

KnightsLanding

Page 55: 20160801 0930 ATPESC Argonne Reinders

© 2016, James Reinders. All rights reserved. Intel, the Intel logo, Intel Inside, Cilk, VTune, Xeon, and Xeon Phi are trademarks of Intel Corporation in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others.

55

Page 56: 20160801 0930 ATPESC Argonne Reinders

© 2016, James Reinders. All rights reserved. Intel, the Intel logo, Intel Inside, Cilk, VTune, Xeon, and Xeon Phi are trademarks of Intel Corporation in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others.

56

Page 57: 20160801 0930 ATPESC Argonne Reinders

© 2016, James Reinders. All rights reserved. Intel, the Intel logo, Intel Inside, Cilk, VTune, Xeon, and Xeon Phi are trademarks of Intel Corporation in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others.

57

Page 58: 20160801 0930 ATPESC Argonne Reinders

© 2016, James Reinders. All rights reserved. Intel, the Intel logo, Intel Inside, Cilk, VTune, Xeon, and Xeon Phi are trademarks of Intel Corporation in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others.

58

Page 59: 20160801 0930 ATPESC Argonne Reinders

© 2016, James Reinders. All rights reserved. Intel, the Intel logo, Intel Inside, Cilk, VTune, Xeon, and Xeon Phi are trademarks of Intel Corporation in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others.

59

AVX-512, High Bandwidth Memory, Cluster Mode, Omni-Path

Page 60: 20160801 0930 ATPESC Argonne Reinders

© 2016, James Reinders. All rights reserved. Intel, the Intel logo, Intel Inside, Cilk, VTune, Xeon, and Xeon Phi are trademarks of Intel Corporation in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others.

60

Page 61: 20160801 0930 ATPESC Argonne Reinders

© 2016, James Reinders. All rights reserved. Intel, the Intel logo, Intel Inside, Cilk, VTune, Xeon, and Xeon Phi are trademarks of Intel Corporation in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others.

61

I say… “Wow!”

Page 62: 20160801 0930 ATPESC Argonne Reinders

© 2016, James Reinders. All rights reserved. Intel, the Intel logo, Intel Inside, Cilk, VTune, Xeon, and Xeon Phi are trademarks of Intel Corporation in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others.

62

512-bitvectorsviaAVX-512means:High performance

and

Binarycompatibility

AVX-512, High Bandwidth Memory, Cluster Mode, Omni-Path

Page 63: 20160801 0930 ATPESC Argonne Reinders

© 2016, James Reinders. All rights reserved. Intel, the Intel logo, Intel Inside, Cilk, VTune, Xeon, and Xeon Phi are trademarks of Intel Corporation in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others.

63

Support ofvectorsunder512-bits insizemeans:

Binarycompatibility

AVX-512, High Bandwidth Memory, Cluster Mode, Omni-Path

Figure 2.1(KNL book)

Page 64: 20160801 0930 ATPESC Argonne Reinders

© 2016, James Reinders. All rights reserved. Intel, the Intel logo, Intel Inside, Cilk, VTune, Xeon, and Xeon Phi are trademarks of Intel Corporation in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others.

64

16MBon packageDRAMmeans:

Performancewithoptions

AVX-512, High Bandwidth Memory, Cluster Mode, Omni-Path

___HybridMode___

DDR4or8GBMCDRAM

8or12GBMCDRAM

16GBMCDRAM

DDR

_FlatMode_

PhysicalAdd

ress

DDR16GB

MCDRAM

___CacheMode___

PhysicalAdd

ress

(cache)(cache)

PhysicalAdd

ress

Figure 4.13(KNL book)

Page 65: 20160801 0930 ATPESC Argonne Reinders

© 2016, James Reinders. All rights reserved. Intel, the Intel logo, Intel Inside, Cilk, VTune, Xeon, and Xeon Phi are trademarks of Intel Corporation in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others.

65

Figure 3.24(KNL book)

Cache mode only available when there is DDR

Page 66: 20160801 0930 ATPESC Argonne Reinders

© 2016, James Reinders. All rights reserved. Intel, the Intel logo, Intel Inside, Cilk, VTune, Xeon, and Xeon Phi are trademarks of Intel Corporation in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others.

66

Figures 3.24 and 3.25(KNL book)

Only available when Memory Mode = HybridCache mode only available when there is DDR

Page 67: 20160801 0930 ATPESC Argonne Reinders

© 2016, James Reinders. All rights reserved. Intel, the Intel logo, Intel Inside, Cilk, VTune, Xeon, and Xeon Phi are trademarks of Intel Corporation in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others.

67

Standard“NUMA”noderecognitionby BIOS, OS,andapplications.

Systems willeventuallyhave this asa “norm.”

AVX-512, High Bandwidth Memory, Cluster Mode, Omni-Path

Page 68: 20160801 0930 ATPESC Argonne Reinders

© 2016, James Reinders. All rights reserved. Intel, the Intel logo, Intel Inside, Cilk, VTune, Xeon, and Xeon Phi are trademarks of Intel Corporation in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others.

68

C/C++“highbandwidth”mallocor new

Fortran“highbandwidth”allocatables

AVX-512, High Bandwidth Memory, Cluster Mode, Omni-Path

Page 69: 20160801 0930 ATPESC Argonne Reinders

© 2016, James Reinders. All rights reserved. Intel, the Intel logo, Intel Inside, Cilk, VTune, Xeon, and Xeon Phi are trademarks of Intel Corporation in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others.

69

Meshinterconnectmeans:

Higher

Performancewithoptions

AVX-512, High Bandwidth Memory, Cluster Mode, Omni-Path

Page 70: 20160801 0930 ATPESC Argonne Reinders

© 2016, James Reinders. All rights reserved. Intel, the Intel logo, Intel Inside, Cilk, VTune, Xeon, and Xeon Phi are trademarks of Intel Corporation in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others.

TweaksIn the end, selecting a memory mode is a PERFORMANCE TWEAK.Code changes may be needed – unless the working set “fits” or can easily be moved by rules.Previously, such changes in policies were only decided when the machine was designed. NEW: Intel made this configurable at BOOT time. Unprecedented configurability. Does it matter enough to bother?My opinion: Probably – but time will tell.

Page 71: 20160801 0930 ATPESC Argonne Reinders

© 2016, James Reinders. All rights reserved. Intel, the Intel logo, Intel Inside, Cilk, VTune, Xeon, and Xeon Phi are trademarks of Intel Corporation in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others.

Figure 3.23(KNL book)

Page 72: 20160801 0930 ATPESC Argonne Reinders

© 2016, James Reinders. All rights reserved. Intel, the Intel logo, Intel Inside, Cilk, VTune, Xeon, and Xeon Phi are trademarks of Intel Corporation in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others.

Cluster Mode:ALL TO ALL

(only usedis DRAM isunbalanced)

Figure 4.5(KNL book)

Page 73: 20160801 0930 ATPESC Argonne Reinders

© 2016, James Reinders. All rights reserved. Intel, the Intel logo, Intel Inside, Cilk, VTune, Xeon, and Xeon Phi are trademarks of Intel Corporation in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others.

Cluster Mode:ALL TO ALL

(only usedis DRAM isunbalanced)

Figure 4.5(KNL book)

(1) L2 cache miss

Page 74: 20160801 0930 ATPESC Argonne Reinders

© 2016, James Reinders. All rights reserved. Intel, the Intel logo, Intel Inside, Cilk, VTune, Xeon, and Xeon Phi are trademarks of Intel Corporation in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others.

Cluster Mode:ALL TO ALL

(only usedis DRAM isunbalanced)

Figure 4.5(KNL book)

(2) Request to the distributed directory - the “responsible” portion happens to be here

Page 75: 20160801 0930 ATPESC Argonne Reinders

© 2016, James Reinders. All rights reserved. Intel, the Intel logo, Intel Inside, Cilk, VTune, Xeon, and Xeon Phi are trademarks of Intel Corporation in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others.

Cluster Mode:ALL TO ALL

(only usedis DRAM isunbalanced)

Figure 4.5(KNL book)

(3) Directory miss sends request to a memory – in this case in the MCDRAM (a “hit” would have sent request to another portion of the distributed L2)

Page 76: 20160801 0930 ATPESC Argonne Reinders

© 2016, James Reinders. All rights reserved. Intel, the Intel logo, Intel Inside, Cilk, VTune, Xeon, and Xeon Phi are trademarks of Intel Corporation in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others.

Cluster Mode:ALL TO ALL

(only usedis DRAM isunbalanced)

Figure 4.5(KNL book)

(4) Memory sends data back to the requestor

Page 77: 20160801 0930 ATPESC Argonne Reinders

© 2016, James Reinders. All rights reserved. Intel, the Intel logo, Intel Inside, Cilk, VTune, Xeon, and Xeon Phi are trademarks of Intel Corporation in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others.

Cluster Mode:QUADRANT

Figure 4.6(KNL book)

Page 78: 20160801 0930 ATPESC Argonne Reinders

© 2016, James Reinders. All rights reserved. Intel, the Intel logo, Intel Inside, Cilk, VTune, Xeon, and Xeon Phi are trademarks of Intel Corporation in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others.

Cluster Mode:QUADRANT

Figure 4.6(KNL book)

Tags and Memory areaffinitized

Page 79: 20160801 0930 ATPESC Argonne Reinders

© 2016, James Reinders. All rights reserved. Intel, the Intel logo, Intel Inside, Cilk, VTune, Xeon, and Xeon Phi are trademarks of Intel Corporation in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others.

Cluster Mode:SNC-4

Figure 4.7(KNL book)

Page 80: 20160801 0930 ATPESC Argonne Reinders

© 2016, James Reinders. All rights reserved. Intel, the Intel logo, Intel Inside, Cilk, VTune, Xeon, and Xeon Phi are trademarks of Intel Corporation in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others.

Cluster Mode:SNC-4

Figure 4.7(KNL book)

Tags and Memory and NUMA nodes (close memory) areaffinitized

ALL memory is available COHERENTLY to ALL cores. Only change: some is closeand some is far(hence, NUMA)

Page 81: 20160801 0930 ATPESC Argonne Reinders

© 2016, James Reinders. All rights reserved. Intel, the Intel logo, Intel Inside, Cilk, VTune, Xeon, and Xeon Phi are trademarks of Intel Corporation in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others.

TweaksIn the end, selecting a cluster mode is a PERFORMANCE TWEAK.Code changes should not be needed – assuming your MPI code is written so you can parameterize #ranks and placement of ranks.Previously, such changes in policies were only decided when the machine was designed. NEW: Intel made this configurable at BOOT time. Unprecedented configurability. Does it matter enough to bother?My opinion: Probably – but time will tell.

Page 82: 20160801 0930 ATPESC Argonne Reinders

© 2016, James Reinders. All rights reserved. Intel, the Intel logo, Intel Inside, Cilk, VTune, Xeon, and Xeon Phi are trademarks of Intel Corporation in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others.

AdviceDefault to CACHE (memory mode) + QUADRANT (cluster mode)

Use CACHE mode, unless you have cache-unfriendly code and you will program to explicitly use FLAT memory.

Try SNC if you will divide your tiles neatly into 4x (or 2x) ranks (MPI).

Page 83: 20160801 0930 ATPESC Argonne Reinders

© 2016, James Reinders. All rights reserved. Intel, the Intel logo, Intel Inside, Cilk, VTune, Xeon, and Xeon Phi are trademarks of Intel Corporation in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others.

83

IntegratedFabricmeansLower Cost,power;Higher Density;Plus theAdvancementsof Omni-Pathin latency,bandwidth &scaling

AVX-512, High Bandwidth Memory, Cluster Mode, Omni-Path

Page 84: 20160801 0930 ATPESC Argonne Reinders

© 2016, James Reinders. All rights reserved. Intel, the Intel logo, Intel Inside, Cilk, VTune, Xeon, and Xeon Phi are trademarks of Intel Corporation in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others.

84

Page 85: 20160801 0930 ATPESC Argonne Reinders

© 2016, James Reinders. All rights reserved. Intel, the Intel logo, Intel Inside, Cilk, VTune, Xeon, and Xeon Phi are trademarks of Intel Corporation in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others.

85

Page 86: 20160801 0930 ATPESC Argonne Reinders

© 2016, James Reinders. All rights reserved. Intel, the Intel logo, Intel Inside, Cilk, VTune, Xeon, and Xeon Phi are trademarks of Intel Corporation in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others.

86

Page 87: 20160801 0930 ATPESC Argonne Reinders

© 2016, James Reinders. All rights reserved. Intel, the Intel logo, Intel Inside, Cilk, VTune, Xeon, and Xeon Phi are trademarks of Intel Corporation in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others.

87

1 2 3 4

5 6 7 8 9 10

11 12 13 14 15 16

17 18 19 20

21 22 23 24 25 26

27 28 29 30 31 32

33 34 35 36 37 38

Page 88: 20160801 0930 ATPESC Argonne Reinders

© 2016, James Reinders. All rights reserved. Intel, the Intel logo, Intel Inside, Cilk, VTune, Xeon, and Xeon Phi are trademarks of Intel Corporation in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others.

88

KEEPCALM

ANDASK SOME

QUESTIONS

Page 89: 20160801 0930 ATPESC Argonne Reinders

© 2016, James Reinders. All rights reserved. Intel, the Intel logo, Intel Inside, Cilk, VTune, Xeon, and Xeon Phi are trademarks of Intel Corporation in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others.

89

KEEPCALM

ANDASK SOME

QUESTIONS

Page 90: 20160801 0930 ATPESC Argonne Reinders

© 2016, James Reinders. All rights reserved. Intel, the Intel logo, Intel Inside, Cilk, VTune, Xeon, and Xeon Phi are trademarks of Intel Corporation in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others.

90

KEEPCALM

ANDASK SOME

QUESTIONS

Page 91: 20160801 0930 ATPESC Argonne Reinders

© 2016, James Reinders. All rights reserved. Intel, the Intel logo, Intel Inside, Cilk, VTune, Xeon, and Xeon Phi are trademarks of Intel Corporation in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others.

91

KEEPCALM

ANDASK SOME

QUESTIONS

Page 92: 20160801 0930 ATPESC Argonne Reinders

© 2016, James Reinders. All rights reserved. Intel, the Intel logo, Intel Inside, Cilk, VTune, Xeon, and Xeon Phi are trademarks of Intel Corporation in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others.

James Reinders. Parallel Programming EnthusiastJames has been involved in multiple engineering, research and educational efforts to increase use of parallel programming throughout the industry. James worked 10,001 days as an Intel employee 1989-2016, and contributed to numerous projects including the world's first TeraFLOP/s supercomputer (ASCI Red), first 3 TeraFLOP/s supercomputer (ASCI Red upgrade), the world's first TeraFLOP/s microprocessor (Intel® Xeon Phi™ coprocessor) and the world's first 3 TeraFLOP/s microprocessor (Intel® Xeon Phi™ Processor). James been an author on numerous technical books, including VTune™ Performance Analyzer Essentials (Intel Press, 2005), Intel® Threading Building Blocks (O'Reilly Media, 2007), Structured Parallel Programming (Morgan Kaufmann, 2012), Intel® Xeon Phi™ Coprocessor High Performance Programming (Morgan Kaufmann, 2013), Multithreading for Visual Effects (A K Peters/CRC Press, 2014), High Performance Parallelism Pearls Volume 1 (Morgan Kaufmann, Nov. 2014), High Performance Parallelism Pearls Volume 2 (Morgan Kaufmann, Aug. 2015), and Intel® Xeon Phi™ Processor High Performance Programming - Knights Landing Edition (Morgan Kaufmann, 2016).