hpc technology track: foundations of computational science lecture 2 dr. greg wettstein, ph.d....

21
HPC Technology Track: Foundations of Computational Science Lecture 2 Dr. Greg Wettstein, Ph.D. Research Support Group Leader Division of Information Technology Adjunct Professor Department of Computer Science North Dakota State University

Upload: damian-goodwin

Post on 28-Dec-2015

219 views

Category:

Documents


3 download

TRANSCRIPT

Page 1: HPC Technology Track: Foundations of Computational Science Lecture 2 Dr. Greg Wettstein, Ph.D. Research Support Group Leader Division of Information Technology

HPC Technology Track:Foundations of Computational Science

Lecture 2

Dr. Greg Wettstein, Ph.D.

Research Support Group LeaderDivision of Information Technology

Adjunct ProfessorDepartment of Computer Science

North Dakota State University

Page 2: HPC Technology Track: Foundations of Computational Science Lecture 2 Dr. Greg Wettstein, Ph.D. Research Support Group Leader Division of Information Technology

What is High Performance Computing?

Definition:

The solution of problems involving highdegrees of computational complexityor data analysis which require specializedhardware and software systems.

Page 3: HPC Technology Track: Foundations of Computational Science Lecture 2 Dr. Greg Wettstein, Ph.D. Research Support Group Leader Division of Information Technology

What is Parallel Computing?

Definition:

A strategy of decreasing the time to solutionof a computational problem by carrying outmultiple elements of the computationat the same time.

Page 4: HPC Technology Track: Foundations of Computational Science Lecture 2 Dr. Greg Wettstein, Ph.D. Research Support Group Leader Division of Information Technology

Does HPC imply Parallel Computing?

Typically but not always. HPC solutions may require specialized systems due

to memory and/or I/O performance issues.

Conversely parallel computing does not necessarily imply high performance computing.

Page 5: HPC Technology Track: Foundations of Computational Science Lecture 2 Dr. Greg Wettstein, Ph.D. Research Support Group Leader Division of Information Technology

Flynn's Taxonomy:Classification Strategy for Concurrent Execution

SISD Single Instruction, Single Data

MISD Multiple Instruction, Single Data

SIMD * Single Instruction, Multiple Data

MIMD * Multiple Instruction, Multiple Data

* = Relevant to HPC

Page 6: HPC Technology Track: Foundations of Computational Science Lecture 2 Dr. Greg Wettstein, Ph.D. Research Support Group Leader Division of Information Technology

SIMDThe Origin of HPC

Architectural model at the heart of 'vector processors'.

Performance enhancement in machines at origin of HPC:

CDC STAR-100 and Cray-1 Utility predicated on fact that mathematical

operations on vectors or vector spaces are at the heart of linear algebra.

Page 7: HPC Technology Track: Foundations of Computational Science Lecture 2 Dr. Greg Wettstein, Ph.D. Research Support Group Leader Division of Information Technology

Vector Processing Diagram

534

310

21

74 67

21

25

2 34

7 4 87

14

Vector Length = 8 'words'

Vector elements

Vector elements

Parallel mathematicaloperations +,-,*,/

Page 8: HPC Technology Track: Foundations of Computational Science Lecture 2 Dr. Greg Wettstein, Ph.D. Research Support Group Leader Division of Information Technology

Current SIMD Examples Embedded in modern x86 and x86_64 architectures.

primarily focus on graphics/signal processing MMX, PNI, SSE2-4, AVX

Foundation for current trend in 'GPGPU computing' NVIDIA Tesla architecture

Component of Larrabee architecture.

Page 9: HPC Technology Track: Foundations of Computational Science Lecture 2 Dr. Greg Wettstein, Ph.D. Research Support Group Leader Division of Information Technology

SSE Implementation

534

310

21

74 67

21

25

2 34

7 4 87

14

Vector elements

Vector elements

Parallel operations100+ (SSE4)

128 bit XMM register 128 bit XMM register

Stride Length

Page 10: HPC Technology Track: Foundations of Computational Science Lecture 2 Dr. Greg Wettstein, Ph.D. Research Support Group Leader Division of Information Technology

MIMDMultiple Instruction Multiple Data

Characterized by multiple execution threads operating on separate data elements.

Threads may operate in shared or disjoint (distributed) memory configurations.

Implementation example SMP (Symmetric Multi-Processing)

Page 11: HPC Technology Track: Foundations of Computational Science Lecture 2 Dr. Greg Wettstein, Ph.D. Research Support Group Leader Division of Information Technology

SPMDThe Basis for Modern HPC

Defined as a single process executing a common program at different points.

Different from SIMD in that execution is not in lockstep format.

Common implementations: shared memory:

OpenMP Pthreads

distributed memory MPI

Page 12: HPC Technology Track: Foundations of Computational Science Lecture 2 Dr. Greg Wettstein, Ph.D. Research Support Group Leader Division of Information Technology

Characteristics of MD Models

MIMD/SPMD requires active participation by programmer to implement 'orthogonalization'.

SIMD requires active participation by the compiler with consideration by the programmer to support orthogonalization.

Orthogonalization defn: The isolation ofa problem into discrete elementscapable of being independentlyresolved.

Page 13: HPC Technology Track: Foundations of Computational Science Lecture 2 Dr. Greg Wettstein, Ph.D. Research Support Group Leader Division of Information Technology

The Real World - A Continuum

Practical programs do not exhibit strict model partitioning.

More pragmatic model is to consider 'dimensions' of parallelism available to a program.

Currently a total of four dimensions of parallelism are exploitable.

Page 14: HPC Technology Track: Foundations of Computational Science Lecture 2 Dr. Greg Wettstein, Ph.D. Research Support Group Leader Division of Information Technology

Dimensions of Parallelism

First dimension. Standard sequential programming with processor

supplied ILP (Instruction Level Parallelism) Referred to as 'free' or 'invisible' parallelism.

Second dimension. SIMD or OpenMP loop parallelism characterized by isolation of the problem into a

single system image primarily supported by programming language or

compiler

Page 15: HPC Technology Track: Foundations of Computational Science Lecture 2 Dr. Greg Wettstein, Ph.D. Research Support Group Leader Division of Information Technology

Dimensions of Parallelism - cont.

Third dimension – Two subtypes. use of MPI to partition problem into orthogonal

elements partitioning is frequently implemented on multiple

system images

MIMD threading on a single system image separate threads dispatched to handle separate tasks

which can execute asynchronously Common HPC example is to 'thread' computation

and Input/Output (I/O)

Page 16: HPC Technology Track: Foundations of Computational Science Lecture 2 Dr. Greg Wettstein, Ph.D. Research Support Group Leader Division of Information Technology

Dimensions of Parallelism - cont.

Fourth dimension partitioning of the problem into orthogonal

elements which can be dispatched to a heterogeneous instruction architecture.

examples: GPGPU/CUDA PowerXcell SPU FPGA

Page 17: HPC Technology Track: Foundations of Computational Science Lecture 2 Dr. Greg Wettstein, Ph.D. Research Support Group Leader Division of Information Technology

Depth of Parallelism

Measure of the complexity of parallelism implemented.

Simplest metric is the count of the number of programmer implemented dimensions of parallelism on a single system image.

Example MPI implementation with SIMD loop vectorization

on each node Parallelism depth is two

Page 18: HPC Technology Track: Foundations of Computational Science Lecture 2 Dr. Greg Wettstein, Ph.D. Research Support Group Leader Division of Information Technology

Parallelism Analysis Example

Process based MIMD application. Depth = 1

MPI simulation with OpenMP loop vectorization. Depth = 2

MPI partitioning with CUDA PTree offload and SIMD loop vectorization.

Depth = 3

Page 19: HPC Technology Track: Foundations of Computational Science Lecture 2 Dr. Greg Wettstein, Ph.D. Research Support Group Leader Division of Information Technology

Escalation of Complexity

Dimension

Architectural decisions must be basedon cost benefit analysis of performancereturns.

Depth

1

N

Least

Most

1 4

Page 20: HPC Technology Track: Foundations of Computational Science Lecture 2 Dr. Greg Wettstein, Ph.D. Research Support Group Leader Division of Information Technology

Exercise

Verify you have changeset which adds experimental code for SSE/SIMD based boolean PTree operators.

Study the class methods implementing the AND and OR operators.

Review and understood how vector and stride length effect the number of times a loop needs to be executed.

Page 21: HPC Technology Track: Foundations of Computational Science Lecture 2 Dr. Greg Wettstein, Ph.D. Research Support Group Leader Division of Information Technology

goto skills_lecture1;