presentation - programming a heterogeneous computing cluster

36
Programming a Heterogeneous Computing Cluster PRESENTED BY AASHRITH H. GOVINDRAJ

Upload: aashrith-setty

Post on 11-Feb-2017

49 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Presentation - Programming a Heterogeneous Computing Cluster

Programming a Heterogeneous Computing ClusterPRESENTED BY AASHRITH H. GOVINDRAJ

Page 2: Presentation - Programming a Heterogeneous Computing Cluster

We’ll discuss the following today• Background of Heterogeneous Computing• Message Passing Interface(MPI)• Vector Addition Example(MPI Implementation)• More implementation details of MPI

Page 3: Presentation - Programming a Heterogeneous Computing Cluster

Background• Heterogeneous Computing System(HCS)• High Performance Computing & its uses• Supercomputer vs. HCS• Why use Heterogeneous Computers in HCS?• MPI is the predominant message passing system for

Clusters

Page 4: Presentation - Programming a Heterogeneous Computing Cluster

Introduction to MPI• MPI stands for Message Passing Interface• Predominant API• Runs on virtually any hardware platform• Programming Model – Distributed Memory Model• Supports Explicit Parallelism• Multiple Languages supported

Page 5: Presentation - Programming a Heterogeneous Computing Cluster

Reasons for using MPI• Standardization• Portability• Performance Opportunities• Functionality• Availability

Page 6: Presentation - Programming a Heterogeneous Computing Cluster

MPI Model• Flat view of the cluster to

programmer• SPMD Programming Model• No Global Memory• Inter-process

Communication is possible & required

• Process Synchronization Primitives

Page 7: Presentation - Programming a Heterogeneous Computing Cluster

MPI Program Structure

• Required Header File• C - mpi.h• Fortran - mpif.h

Page 8: Presentation - Programming a Heterogeneous Computing Cluster

MPI Thread Support

• Level 0 • Level 1• Level 2• Level 3

Page 9: Presentation - Programming a Heterogeneous Computing Cluster

Format of MPI Calls• Format of MPI Calls

• Case Sensitivity• C – Yes• Fortran – No

• Name Restrictions• MPI_ *• PMPI_* ( Profiling

interface)

• Error Handling• Handled via return

parameter

Page 10: Presentation - Programming a Heterogeneous Computing Cluster

Groups & CommunicatorsGroups – Ordered set of processesCommunicators – Handle to a group of processesMost MPI Routines require a communicator as argumentMPI_COMM_WORLD – Predefined Communicator that includes all processesRank – Unique ID

Page 11: Presentation - Programming a Heterogeneous Computing Cluster

Environment Management Routines

• MPI_Init (&argc,&argv)• MPI_Comm_size (comm,&size)• MPI_Comm_rank (comm,&rank)• MPI_Abort (comm,errorcode)• MPI_Get_processor_name (&name,&resultlength)

Page 12: Presentation - Programming a Heterogeneous Computing Cluster

Environment Management Routines (contd.)

• MPI_Get_version (&version,&subversion)• MPI_Initialized (&flag) • MPI_Wtime ()• MPI_Wtick ()• MPI_Finalize ()• Fortran – Extra parameter ierr in all functions

except time functions

Page 13: Presentation - Programming a Heterogeneous Computing Cluster

Vector Addition Example

Page 14: Presentation - Programming a Heterogeneous Computing Cluster

Vector Addition Example(contd.)

Page 15: Presentation - Programming a Heterogeneous Computing Cluster

MPI Sending Data

Page 16: Presentation - Programming a Heterogeneous Computing Cluster

MPI Receiving Data

Page 17: Presentation - Programming a Heterogeneous Computing Cluster

Vector Addition Example(contd.)

Page 18: Presentation - Programming a Heterogeneous Computing Cluster

Vector Addition Example(contd.)

Page 19: Presentation - Programming a Heterogeneous Computing Cluster

Vector Addition Example(contd.)

Page 20: Presentation - Programming a Heterogeneous Computing Cluster

MPI Barriers• int MPI_Barrier (comm)• comm – communicator

• This is very similar to barrier synchronization in CUDA• __syncthreads( )

Page 21: Presentation - Programming a Heterogeneous Computing Cluster

Vector Addition Example(contd.)

Page 22: Presentation - Programming a Heterogeneous Computing Cluster

Vector Addition Example(contd.)

Page 23: Presentation - Programming a Heterogeneous Computing Cluster

Vector Addition Example(contd.)

Page 24: Presentation - Programming a Heterogeneous Computing Cluster

Vector Addition Example(contd.)

Page 25: Presentation - Programming a Heterogeneous Computing Cluster

Point-to-Point Operations• Typically involve two, and only two, different MPI threads• Different types of send and receive routines• Synchronous send• Blocking send / blocking receive• Non-blocking send / non-blocking receive• Buffered send• Combined send/receive• "Ready" send

• Send/Receive Routines not tightly coupled

Page 26: Presentation - Programming a Heterogeneous Computing Cluster

Buffering• Why is buffering required?• It is Implementation Dependent• Opaque to the programmer and

managed by the MPI library• Advantages

• Can exist on the sending side, the receiving side, or both

• Improves program performance• Disadvantages

• A finite resource that can be easy to exhaust

• Often mysterious and not well documented

Page 27: Presentation - Programming a Heterogeneous Computing Cluster

Blocking vs. Non-blockingBlocking Non Blocking

Send will only return after it’s safe to modify application buffer

Send/Receive return almost immediately

Receive returns after the data has arrived and ready for use by the application

Unsafe to modify our variables till we know send operation has been completed

Synchronous Communication is possible

Only Asynchronous Communication possible

Asynchronous Communication is also possible

Primarily used to overlap computation with communication to get performance gain

Page 28: Presentation - Programming a Heterogeneous Computing Cluster

Order and Fairness• Order• MPI guarantees that messages will not overtake each other• Order rules do not apply if there are multiple threads participating

in the communication operations• Fairness• MPI does not guarantee fairness - it's up to the programmer to

prevent "operation starvation"

Page 29: Presentation - Programming a Heterogeneous Computing Cluster

Types of Collective Communication Routines

Page 30: Presentation - Programming a Heterogeneous Computing Cluster

Collective Communication Routines(contd.)• Scope• Must involve all processes within the scope of a communicator• Unexpected behavior, including program failure, can occur if even

one task in the communicator doesn't participate• Programmer's responsibility to ensure that all processes within a

communicator participate in any collective operations.• Collective communication functions are highly optimized

Page 31: Presentation - Programming a Heterogeneous Computing Cluster

Groups & Communicators(additional details)• Group • Represented within system memory as an object• Only accessible as a handle• Always associated with a communicator object

• Communicator• Represented within system memory as an object.• In the simplest sense, the communicator is an extra "tag" that must be

included with MPI calls• Inter-group and Intra-group communicators available

• From the programmer's perspective, a group and a communicator are one

Page 32: Presentation - Programming a Heterogeneous Computing Cluster

Primary Purposes of Group and Communicator Objects1. Allows you to organize tasks, based upon function, into

task groups.2. Enable Collective Communications operations across a

subset of related tasks.3. Provide basis for implementing user defined virtual

topologies4. Provide for safe communications

Page 33: Presentation - Programming a Heterogeneous Computing Cluster

Programming Considerations and Restrictions• Groups/communicators are dynamic• Processes may be in more than one group/communicator• MPI provides over 40 routines related to groups, communicators, and virtual

topologies.• Typical usage:• Extract handle of global group from MPI_COMM_WORLD using MPI_Comm_group• Form new group as a subset of global group using MPI_Group_incl• Create new communicator for new group using MPI_Comm_create• Determine new rank in new communicator using MPI_Comm_rank• Conduct communications using any MPI message passing routine• When finished, free up new communicator and group (optional) using MPI_Comm_free

and MPI_Group_free

Page 34: Presentation - Programming a Heterogeneous Computing Cluster

Virtual Topologies• Mapping/ordering of MPI processes into a geometric "shape“• Similar to CUDA Grid / Block 2D/3D structure• They are only Virtual• Two Main Types• Cartesian(grid)• Graph

• Virtual topologies are built upon MPI communicators and groups.• Must be "programmed" by the application developer.

Page 35: Presentation - Programming a Heterogeneous Computing Cluster

Why use Virtual Topologies?• Convenience• Useful for applications with specific communication patterns

• Communication Efficiency• Penalty avoided on some hardware architectures for

communication between distant nodes• Process Mapping may be optimized based on physical

characteristics of the machine• MPI Implementation decides if VT is ignored or not

Page 36: Presentation - Programming a Heterogeneous Computing Cluster

Pheew!…All done!Thank You!ANY QUESTIONS?