xeon phi: architecture general...

Xeon Phi: ArchitectureXeon Phi: ArchitectureGeneral informationGeneral information

Philipp BartelsPhilipp BartelsThomas LangeThomas Lange

TIANHE-2TIANHE-2

32.000 CPUs: XEON E5-2692 v2

48.000 Accelerators: XEON PHI 31S1P

Theoretical Peak: 54,902.4 Tflop/s (double)

Linpack Performance: 33,862.7 TFlop/s

- Accelerator / Co-Prozessor- Accelerator / Co-Prozessor

- general purpose cores (57-61)- general purpose cores (57-61)

- embedded Linux- embedded Linux

More characteristicsMore characteristics

Can get an IP-Adress

x86-64 instruction set

Extension: Initial Many Core Instructions (IMCI)

Quad-Hyperthreading

512-Bit Vektor Registers

Xeon Phi 7120D Xeon Phi 7120D

RCP: 4,235$ (Amazon.com: 3,507.82$)

61 cores (each 1.238 GHz)

Overall 30.5 MB L2-Cache

Main memory: 16GB GDDR5

TDP: 300W

Parallel programming modelsParallel programming models

OpenMPOpenMP

OpenACCOpenACC

Intel Cilk PlusIntel Cilk Plus

Intel TBBIntel TBB

OpenCLOpenCL

Pragma ExamplePragma Example

#pragma offload target (mic) #pragma offload target (mic) in(...) inout(...)in(...) inout(...) {{

#pragma omp parallel for#pragma omp parallel for

for(i=0; i<n; i++){for(i=0; i<n; i++){

c[i] = 2 * a[i] + b[i];c[i] = 2 * a[i] + b[i];

Intel Cilk PlusIntel Cilk Plus

3 simple keywords3 simple keywordscilk_forcilk_forcilk_spawncilk_spawncilk_synccilk_sync

Array notationArray notation

SIMD-enabled functionsSIMD-enabled functions

#pragma simd#pragma simd

exampleexample

cilk_forcilk_for (int i = 0; i < 8; ++i) (int i = 0; i < 8; ++i){{ do_work(i);do_work(i);}}

int fib(int n)int fib(int n){{ if (n < 2)if (n < 2) return n;return n; int x = int x = cilk_spawncilk_spawn fib(n-1);fib(n-1); int y = fib(n-2);int y = fib(n-2); cilk_sync;cilk_sync; return x + y;return x + y;}}

VectorizationVectorization

perform the same operation on multiple data perform the same operation on multiple data elements in a single instructionelements in a single instruction

#pragma omp simd #pragma omp simd for (i = 0; i < 1024; i++)for (i = 0; i < 1024; i++)

C[i] = A[i]*B[i];C[i] = A[i]*B[i];

//array notation in Intel Cilk Plus//array notation in Intel Cilk Plusfor (i = 0; i < 1024; i+=4)for (i = 0; i < 1024; i+=4)

C[i] = A[i:i+3]*B[i:i+3];C[i] = A[i:i+3]*B[i:i+3];

Vectorization of a loopVectorization of a loop

AutovectorizationAutovectorization

execute more than one iteration of the loop at the execute more than one iteration of the loop at the same timesame time

requirements:requirements:

straight-line codestraight-line code number of iterations must be knownnumber of iterations must be known no loop-carried dependenciesno loop-carried dependencies no special operators no special operators Must be the inner loopMust be the inner loop

ExampleExample

Can be vectorized by compilerCan be vectorized by compiler

for (i=1; i<MAX; i++) {for (i=1; i<MAX; i++) { a[i] = b[i] + c[i]a[i] = b[i] + c[i] d[i] = e[i] – a[i-1]d[i] = e[i] – a[i-1]}}

Cannot be vectorized by compilerCannot be vectorized by compiler

for (i=1; i<MAX; i++) for (i=1; i<MAX; i++) d[i] = e[i] – a[i-1]d[i] = e[i] – a[i-1] a[i] = b[i] + c[i]a[i] = b[i] + c[i]}}

Price $# Cores

Base core clock MHzsingle GFlops

double GFlopsAmount Main Mem.

Mem-BandwidthTDP

5000Xeon Phi 7120A Tesla K40

Comparison with Tesla K40 Comparison with Tesla K40

Who did whatWho did what

Thomas Lange: slide 9 to 17

Philipp Bartels: slide 18 and 1 to 8

Who did whatWho did what

Thomas Lange: slide 9 to 17

Philipp Bartels: slide 18 and 1 to 8

xeon phi: architecture general...

Documents

intel xeon phi: architecture and programming€¦ · intel...

intel® xeon phi™ coprocessor datasheet · reference...

elmer on intel xeon phi - umu.se

intel xeon phi programming models · xeon phi online" intel...

intel xeon phi

optimizing vlpl-s pic on intel xeon & xeon phi

intel® omni-path architecture, intel® xeon phi® processor...

xeon phi - odd dwarfs

Программирование для intel xeon phi

towards compiling sac for the xeon phi knights corner and...

xeon phi™ product family

an overview of technological trends in supercomputing ·...

intel® xeon phi™ coprocessor: introductionŸ7.pdf ·...

intel xeon phi...

intel xeon phi coprocessor instruction

parallel graph algorithms on the xeon phi...

bit-parallel approximate pattern matching on the xeon phi...

xeon phi™–архитектура модели …agenda...

intel® xeon phi™...

introduction to xeon phi