introduction in algorithms and applications introduction in algorithms and applications parallel...

24
Introduction in algorithms and applications Introduction in algorithms and applications Parallel machines and architectures Parallel machines and architectures Programming methods, languages, and environments Programming methods, languages, and environments Message passing (SR, MPI, Java) Message passing (SR, MPI, Java) Higher-level languages (HPF) Higher-level languages (HPF) Applications Applications N-body problems, search algorithms N-body problems, search algorithms Many-core programming (CUDA) Many-core programming (CUDA) Course Outline

Post on 19-Dec-2015

218 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Introduction in algorithms and applications Introduction in algorithms and applications Parallel machines and architectures Parallel machines and architectures

Introduction in algorithms and applicationsIntroduction in algorithms and applications

Parallel machines and architecturesParallel machines and architectures

Programming methods, languages, and environmentsProgramming methods, languages, and environments

Message passing (SR, MPI, Java)Message passing (SR, MPI, Java)

Higher-level languages (HPF)Higher-level languages (HPF)

ApplicationsApplications

N-body problems, search algorithmsN-body problems, search algorithms

Many-core programming (CUDA)Many-core programming (CUDA)

Course Outline

Page 2: Introduction in algorithms and applications Introduction in algorithms and applications Parallel machines and architectures Parallel machines and architectures

Approaches to Parallel Programming

Sequential language + librarySequential language + libraryMPI, PVMMPI, PVM

Extend sequential languageExtend sequential languageC/Linda, Concurrent C++, HPFC/Linda, Concurrent C++, HPF

New languages designed for parallel or distributed New languages designed for parallel or distributed programmingprogrammingSR, occam, Ada, OrcaSR, occam, Ada, Orca

Page 3: Introduction in algorithms and applications Introduction in algorithms and applications Parallel machines and architectures Parallel machines and architectures

Paradigms for Parallel Programming

Processes + shared variablesProcesses + shared variables

Processes + message passingProcesses + message passing

Concurrent object-oriented languagesConcurrent object-oriented languages

Concurrent functional languagesConcurrent functional languages

Concurrent logic languagesConcurrent logic languages

Data-parallelism (SPMD model)Data-parallelism (SPMD model)

--

SR and MPISR and MPI

JavaJava

- -

--

HPF, CUDAHPF, CUDA

Page 4: Introduction in algorithms and applications Introduction in algorithms and applications Parallel machines and architectures Parallel machines and architectures

Interprocess Communicationand Synchronization based on

Message Passing

Henri Bal

Page 5: Introduction in algorithms and applications Introduction in algorithms and applications Parallel machines and architectures Parallel machines and architectures

Overview

Message passingMessage passingNaming the sender and receiverNaming the sender and receiverExplicit or implicit receipt of messagesExplicit or implicit receipt of messagesSynchronous versus asynchronous messagesSynchronous versus asynchronous messages

NondeterminismNondeterminismSelect statementSelect statement

Example language: SR (Synchronizing Resources)Example language: SR (Synchronizing Resources)

Example library: MPI (Message Passing Interface)Example library: MPI (Message Passing Interface)

Page 6: Introduction in algorithms and applications Introduction in algorithms and applications Parallel machines and architectures Parallel machines and architectures

Point-to-point Message Passing

Basic primitives: send & receiveBasic primitives: send & receiveAs library routines:As library routines:

send(destination, & MsgBuffer)send(destination, & MsgBuffer)

receive(source, &MsgBuffer)receive(source, &MsgBuffer)

As language constructsAs language constructs

send MsgName(arguments) to destinationsend MsgName(arguments) to destination

receive MsgName(arguments) from sourcereceive MsgName(arguments) from source

Page 7: Introduction in algorithms and applications Introduction in algorithms and applications Parallel machines and architectures Parallel machines and architectures

Direct naming

Sender and receiver directly name each otherSender and receiver directly name each otherS: S: sendsend M M toto R RR: R: receivereceive M M fromfrom S S

Asymmetric direct naming (more flexible):Asymmetric direct naming (more flexible):S: S: sendsend M M toto R RR: R: receivereceive M M

Direct naming is easy to implementDirect naming is easy to implementDestination of message is know in advanceDestination of message is know in advanceImplementation just maps logical names to machine addressesImplementation just maps logical names to machine addresses

Page 8: Introduction in algorithms and applications Introduction in algorithms and applications Parallel machines and architectures Parallel machines and architectures

Indirect naming

Indirect naming uses extra indirection levelIndirect naming uses extra indirection levelR: send M to P -- P is a port nameR: send M to P -- P is a port nameS: receive M from PS: receive M from P

Sender and receiver need not know each otherSender and receiver need not know each otherPort names can be moved around (e.g., in a message)Port names can be moved around (e.g., in a message)send ReplyPort(P) to U -- P is name of reply portsend ReplyPort(P) to U -- P is name of reply port

Most languages allow only a single process at a time to Most languages allow only a single process at a time to receive from any given portreceive from any given portSome languages allow multiple receivers that service Some languages allow multiple receivers that service messages on demand -> called a mailboxmessages on demand -> called a mailbox

Page 9: Introduction in algorithms and applications Introduction in algorithms and applications Parallel machines and architectures Parallel machines and architectures

Explicit Message Receipt

Explicit receive by an existing processExplicit receive by an existing processReceiving process only handles message when it is willing to do soReceiving process only handles message when it is willing to do so

process main()

{

// regular computation here

receive M( ….); // explicit message receipt

// code to handle message

// more regular computations

….

}

Page 10: Introduction in algorithms and applications Introduction in algorithms and applications Parallel machines and architectures Parallel machines and architectures

Implicit message receipt

Receipt by a new thread of control, created for handling the Receipt by a new thread of control, created for handling the incoming messageincoming message

int X;process main( ){

// just regular computations, this code can access X}message-handler M( ) // created whenever a message M arrives{

// code to handle the message, can also access X}

Page 11: Introduction in algorithms and applications Introduction in algorithms and applications Parallel machines and architectures Parallel machines and architectures

Threads

Threads run in (pseudo-) parallel on the same nodeThreads run in (pseudo-) parallel on the same node

Each thread has its own program counter and local variablesEach thread has its own program counter and local variables

Threads share global variablesThreads share global variables

main M M

X

tim

e

Page 12: Introduction in algorithms and applications Introduction in algorithms and applications Parallel machines and architectures Parallel machines and architectures

Differences (1)

Implicit receipt is used if it’s unknown when a message will Implicit receipt is used if it’s unknown when a message will arrive; example: global bound in TSParrive; example: global bound in TSP

int Minimum;process main( ){ // regular computations}message-handler Update(m: int){ if (m<Minimum) Minimum = m}

process main(){int Minimum; while (true) {

if (there is a message Update) { receive Update(m); if (m<Minimum) Minimum = m } // regular computations

} }

Page 13: Introduction in algorithms and applications Introduction in algorithms and applications Parallel machines and architectures Parallel machines and architectures

Differences (2)

Explicit receive gives more control over when to accept Explicit receive gives more control over when to accept which messages; e.g., SR allows:which messages; e.g., SR allows:receive ReadFile(file, offset, NrBytes) by NrBytesreceive ReadFile(file, offset, NrBytes) by NrBytes

// sorts messages by (increasing) 3rd parameter, i.e. small reads go first// sorts messages by (increasing) 3rd parameter, i.e. small reads go first

MPI has explicit receive (+ polling for implicit receive)MPI has explicit receive (+ polling for implicit receive)

Java has implicit receive: Remote Method Invocation (RMI)Java has implicit receive: Remote Method Invocation (RMI)

SR has bothSR has both

Page 14: Introduction in algorithms and applications Introduction in algorithms and applications Parallel machines and architectures Parallel machines and architectures

Synchronous vs. asynchronous Message Passing

Synchronous message passing:Synchronous message passing:Sender is blocked until receiver has accepted the messageSender is blocked until receiver has accepted the message

Too restrictive for many parallel applicationsToo restrictive for many parallel applications

Asynchronous message passing:Asynchronous message passing:Sender continues immediatelySender continues immediately

More efficientMore efficient

Ordering problemsOrdering problems

Buffering problemsBuffering problems

Page 15: Introduction in algorithms and applications Introduction in algorithms and applications Parallel machines and architectures Parallel machines and architectures

Ordering with asynchronous message passingOrdering with asynchronous message passing

SENDER:SENDER: RECEIVER: RECEIVER:

send message(1)send message(1) receive message(N); print Nreceive message(N); print N

send message(2)send message(2) receive message(M); print Mreceive message(M); print M

Messages may be received in any order, depending on the Messages may be received in any order, depending on the protocolprotocol

Message ordering

message(1)

message(2)

Page 16: Introduction in algorithms and applications Introduction in algorithms and applications Parallel machines and architectures Parallel machines and architectures

Example: AT&T crash

P2P1

Are you still alive?

P2P1P1 crashes P1 is dead

P2P1

I’m back

Regular messageSomething’s wrong,I’d better crash!

P2P1P2 is dead

Page 17: Introduction in algorithms and applications Introduction in algorithms and applications Parallel machines and architectures Parallel machines and architectures

Message buffering

Keep messages in a buffer until the receive( ) is doneKeep messages in a buffer until the receive( ) is done

What if the buffer overflows?What if the buffer overflows?Continue, but delete some messages (e.g., oldest one), orContinue, but delete some messages (e.g., oldest one), or

Use flow control: block the sender temporarilyUse flow control: block the sender temporarily

Flow control changes the semantics since it introduces Flow control changes the semantics since it introduces synchronizationsynchronizationS: send zillion messages to R; receive messagesS: send zillion messages to R; receive messages

R: send zillion messages to S; receive messagesR: send zillion messages to S; receive messages

-> deadlock!-> deadlock!

Page 18: Introduction in algorithms and applications Introduction in algorithms and applications Parallel machines and architectures Parallel machines and architectures

Nondeterminism

Interactions may depend on run-time conditionsInteractions may depend on run-time conditionse.g.: wait for a message from either A or B, whichever comes firste.g.: wait for a message from either A or B, whichever comes first

Need to express and control nondeterminismNeed to express and control nondeterminismspecify when to accept which messagespecify when to accept which message

Example (bounded buffer):Example (bounded buffer):do simultaneouslydo simultaneously

when buffer not full: accept request to store messagewhen buffer not full: accept request to store message

when buffer not empty: accept request to fetch messagewhen buffer not empty: accept request to fetch message

Page 19: Introduction in algorithms and applications Introduction in algorithms and applications Parallel machines and architectures Parallel machines and architectures

Select statement

several alternatives of the form:several alternatives of the form:WHEN condition => RECEIVE message DO statementWHEN condition => RECEIVE message DO statement

Each alternative mayEach alternative maysucceed, if condition=true & a message is availablesucceed, if condition=true & a message is available

fail, if condition=falsefail, if condition=false

suspend, if condition=true & no message available yetsuspend, if condition=true & no message available yet

Entire select statement mayEntire select statement maysucceed, if any alternative succeeds -> pick one nondeterministicallysucceed, if any alternative succeeds -> pick one nondeterministically

fail, if all alternatives failfail, if all alternatives fail

suspend, if some alternatives suspend and none succeeds yetsuspend, if some alternatives suspend and none succeeds yet

Page 20: Introduction in algorithms and applications Introduction in algorithms and applications Parallel machines and architectures Parallel machines and architectures

Example: bounded buffer

selectwhen not FULL(BUFFER) =>

receive STORE_ITEM(X: INTEGER) do‘store X in buffer’

end;or

when not EMPTY(BUFFER) =>receive FETCH_ITEM(X: out INTEGER) do

X := ‘first item from buffer’end;

end select;

Page 21: Introduction in algorithms and applications Introduction in algorithms and applications Parallel machines and architectures Parallel machines and architectures

Synchronizing Resources (SR)

Developed at University of ArizonaDeveloped at University of Arizona

Goals of SR:Goals of SR:ExpressivenessExpressiveness

Many message passing primitivesMany message passing primitives

Ease of useEase of use

Minimize number of underlying conceptsMinimize number of underlying concepts

Clean integration of language constructsClean integration of language constructs

EfficiencyEfficiency

Each primitive must be efficientEach primitive must be efficient

Page 22: Introduction in algorithms and applications Introduction in algorithms and applications Parallel machines and architectures Parallel machines and architectures

Overview of SR

Multiple forms of message passingMultiple forms of message passingAsynchronous message passingAsynchronous message passing

Rendezvous (synchronous send, explicit receipt)Rendezvous (synchronous send, explicit receipt)

Remote Procedure Call (synchronous send, implicit receipt)Remote Procedure Call (synchronous send, implicit receipt)

Multicast (many receivers)Multicast (many receivers)

Powerful receive-statementPowerful receive-statementConditional & ordered receive, based on contents of messageConditional & ordered receive, based on contents of message

Select statementSelect statement

Resource = module run on 1 node (uni/multiprocessor)Resource = module run on 1 node (uni/multiprocessor)Contains multiple threads that share variablesContains multiple threads that share variables

Page 23: Introduction in algorithms and applications Introduction in algorithms and applications Parallel machines and architectures Parallel machines and architectures

Orthogonality in SR

The send and receive primitives can be combined in all 4 The send and receive primitives can be combined in all 4 possible wayspossible ways

Asynchronous send Synchronous call

Explicitreceive

1.asynchronousmessage passing

3. rendezvous

Implicitreceive

2. fork 4. RPC

Page 24: Introduction in algorithms and applications Introduction in algorithms and applications Parallel machines and architectures Parallel machines and architectures

Example

body S #sender send R.m1 #asynchr. mp send R.m2 # fork call R.m1 # rendezvous call R.m2 # RPCend S

body R #receiver proc M2( ) # implicit receipt # code to handle M2 end

initial # main process of R do true -> #infinite loop in m1( ) # explicit receive # code to handle m1 ni od endend R