1 in-situ model checking of mpi parallel programs ganesh gopalakrishnan joint work with salman...

63
1 In-Situ Model Checking of MPI Parallel Programs Ganesh Gopalakrishnan Joint work with Salman Pervez, Michael DeLisi Sarvani Vakkalanka, Subodh Sharma, Yu Yang, Robert Palmer, Mike Kirby, Guodong Li (http://www.cs.utah.edu/formal_verification ) School of Computing University of Utah Supported by: Microsoft HPC Institutes NSF CNS 0509379

Post on 21-Dec-2015

215 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: 1 In-Situ Model Checking of MPI Parallel Programs Ganesh Gopalakrishnan Joint work with Salman Pervez, Michael DeLisi Sarvani Vakkalanka, Subodh Sharma,

1

In-Situ Model Checkingof MPI Parallel Programs

Ganesh Gopalakrishnan

Joint work with Salman Pervez, Michael DeLisi

Sarvani Vakkalanka, Subodh Sharma, Yu Yang, Robert Palmer, Mike Kirby, Guodong Li

(http://www.cs.utah.edu/formal_verification)

School of ComputingUniversity of Utah

Supported by: Microsoft HPC Institutes

NSF CNS 0509379

Page 2: 1 In-Situ Model Checking of MPI Parallel Programs Ganesh Gopalakrishnan Joint work with Salman Pervez, Michael DeLisi Sarvani Vakkalanka, Subodh Sharma,

2

MPI is the de-facto standard for programming cluster machines

Our focus: Eliminate Concurrency Bugs from HPC Programs !

(BlueGene/L - Image courtesy of IBM / LLNL) (Image courtesy of Steve Parker, CSAFE, Utah)

Page 3: 1 In-Situ Model Checking of MPI Parallel Programs Ganesh Gopalakrishnan Joint work with Salman Pervez, Michael DeLisi Sarvani Vakkalanka, Subodh Sharma,

3

Reason for our interest in MPI verification

Widely felt need – MPI is used on expensive machines for critical simulations

Potential for wider impact– MPI is a success as a standard

– What’s good for MPI may be good for OpenMP, Cuda, Shmem, …

Working in a less crowded but important area– Funding in HW verification decreasing

» We are still continuing two efforts: Verifying hierarchical cache coherence protocols Refinement of cache coherence protocol models to HW implementations

– SW verification in “threading / shared memory” crowded

» Whereas HPC offers LIBRARY BASED concurrent software creation as an unexplored challenge!

Page 4: 1 In-Situ Model Checking of MPI Parallel Programs Ganesh Gopalakrishnan Joint work with Salman Pervez, Michael DeLisi Sarvani Vakkalanka, Subodh Sharma,

4

A highly simplistic view of MPI

Many MPI programs compute something like f o g o h ( x ) in a distributed manner

(think of maps on separate data domains, and later combinations thereof)

Compute h(x) on P1

Start g ( ) on P2

Fire-up f on P1

Use sends , receives , barriers , etc., to maximize computational speeds

This view may help compare it against PThread programs, for instance

Page 5: 1 In-Situ Model Checking of MPI Parallel Programs Ganesh Gopalakrishnan Joint work with Salman Pervez, Michael DeLisi Sarvani Vakkalanka, Subodh Sharma,

5

Some high-level features of MPI Organized as a large library (API)– Over 300 functions in MPI-2 (was 128 in MPI-1)

Most MPI programs use about a dozen

Usually a different dozen for each program

Page 6: 1 In-Situ Model Checking of MPI Parallel Programs Ganesh Gopalakrishnan Joint work with Salman Pervez, Michael DeLisi Sarvani Vakkalanka, Subodh Sharma,

6

MPI programming and optimization MPI includes Message Passing, Shared Memory, and I/O

– We consider C++ MPI programs, largely focussing on msg passing

MPI programs are usually written by hand – Automated generation has been proposed and still seems attractive

Source-to-source optimizations of MPI programs attractive– Break up communications and overlap with computations (ASPHALT)

Many important MPI programs do evolve– Re-tuning after porting to a new cluster, etc.

Correctness expectation varies – Some are throw-away programs; others are long-lasting libraries– Code correctness – not Model Fidelity – is our emphasis

Page 7: 1 In-Situ Model Checking of MPI Parallel Programs Ganesh Gopalakrishnan Joint work with Salman Pervez, Michael DeLisi Sarvani Vakkalanka, Subodh Sharma,

7

Why MPI is Complex: Collision of features

– Send

– Receive

– Send / Receive

– Send / Receive / Replace

– Broadcast

– Barrier

– Reduce

– Rendezvous mode

– Blocking mode

– Non-blocking mode

– Reliance on system buffering

– User-attached buffering

– Restarts/Cancels of MPI Operations

– Non Wildcard receives

– Wildcard receives

– Tag matching

– Communication spaces

An MPI program is an interesting (and legal)combination of elementsfrom these spaces

Page 8: 1 In-Situ Model Checking of MPI Parallel Programs Ganesh Gopalakrishnan Joint work with Salman Pervez, Michael DeLisi Sarvani Vakkalanka, Subodh Sharma,

8

Shared memory “escape” features of MPI

MPI has shared memory (called “one-sided”)

Nodes open shared region thru a “collective”

One process manages the region (“owner”)– Ensures serial access of the window

Within a lock/unlock, a process does puts/gets– There are more functions such as “accumulate” besides puts / gets

The puts/gets are not program-ordered !

Page 9: 1 In-Situ Model Checking of MPI Parallel Programs Ganesh Gopalakrishnan Joint work with Salman Pervez, Michael DeLisi Sarvani Vakkalanka, Subodh Sharma,

9

A Simple Example of Msg Passing MPI Programmer expectation: Integration of a region

// Add-up integrals calculated by each process if (my_rank == 0) {

total = integral;

for (source = 0; source < p; source++) {

MPI_Recv(&integral, 1, MPI_FLOAT,source,

tag, MPI_COMM_WORLD, &status);

total = total + integral;

}

} else {

MPI_Send(&integral, 1, MPI_FLOAT, dest,

tag, MPI_COMM_WORLD);

}

04/18/23

Page 10: 1 In-Situ Model Checking of MPI Parallel Programs Ganesh Gopalakrishnan Joint work with Salman Pervez, Michael DeLisi Sarvani Vakkalanka, Subodh Sharma,

10

A Simple Example of Msg Passing MPI Bug ! Mismatched send/recv causes deadlock

// Add-up integrals calculated by each process

if (my_rank == 0) {

total = integral;

for (source = 0; source < p; source++) { MPI_Recv(&integral, 1, MPI_FLOAT,source,

tag, MPI_COMM_WORLD, &status);

total = total + integral;

}

} else {

MPI_Send(&integral, 1, MPI_FLOAT, dest,

tag, MPI_COMM_WORLD);

}

04/18/23

p1:to 0 p2:to 0 p3:to 0

p0:fr 0 p0:fr 1 p0:fr 2

Page 11: 1 In-Situ Model Checking of MPI Parallel Programs Ganesh Gopalakrishnan Joint work with Salman Pervez, Michael DeLisi Sarvani Vakkalanka, Subodh Sharma,

11

if (my_rank == 0) {

...

for (source = 1; source < p; source++) {

MPI_Recv(..)

..

} else {

MPI_Send(..)

}

04/18/23

Runtime Considerations Does System provide Buffering? What progress engine does MPI have? How does it schedule?

MPI Run-time; there is no separate thread for it…

• Does the system provide buffering ?• If not, a rendezvous behavior is enforced !

• When does the runtime actually process events?• Whenever an MPI operation is issued• Whenever some operations that “poke” the progress engine is issued

Page 12: 1 In-Situ Model Checking of MPI Parallel Programs Ganesh Gopalakrishnan Joint work with Salman Pervez, Michael DeLisi Sarvani Vakkalanka, Subodh Sharma,

12

Differences between MPI and Shared Memory / Thread Parallel programs

Processes with local state communicate by copying– Processes sharing global state, heap

– Synchronization using locks, signals, notifies, waits

Not much dynamic process creation– PThread programs may spawn children dynamically

Control / data dependencies are well confined (often to rank variables and such). – Pervasive decoding of “data” (e.g. through heap storage).

Simple aliasing– Also aliasing relations may flow considerably across pointer

chains, across procedure calls.

Page 13: 1 In-Situ Model Checking of MPI Parallel Programs Ganesh Gopalakrishnan Joint work with Salman Pervez, Michael DeLisi Sarvani Vakkalanka, Subodh Sharma,

13

Conventional debugging of MPI

Inspection– Difficult to carry out on MPI programs (low level notation)

Simulation Based– Run given program with manually selected inputs

– Can give poor coverage in practice

Simulation with runtime heuristics to find bugs– Marmot: Timeout based deadlocks, random executions

– Intel Trace Collector: Similar checks with data checking

– TotalView: Better trace viewing – still no “model checking”(?)

– We don’t know if any formal coverage metrics are offered

Page 14: 1 In-Situ Model Checking of MPI Parallel Programs Ganesh Gopalakrishnan Joint work with Salman Pervez, Michael DeLisi Sarvani Vakkalanka, Subodh Sharma,

14

What should one verify ? The overall computation achieves some “f o g o h”

Symbolic execution of MPI programs may work (Siegel et.al.)

Symbolic execution has its limits – Finding out the “plumbing” of f, g, and h is non-trivial for optimized MPI

programs

So why not look for reactive bugs introduced in the process of erecting the plumbings ?

A common concern: “my code hangs”:– ISends without wait / test– Assuming that system provides buffering for Sends– Wildcard receive non-determinism is unexpected– Incorrect collective semantics assumed (e.g. for barriers)

ISP currently checks for deadlocks (not all procs reach MPI_Finalize). In future, we may check local assertions.

Page 15: 1 In-Situ Model Checking of MPI Parallel Programs Ganesh Gopalakrishnan Joint work with Salman Pervez, Michael DeLisi Sarvani Vakkalanka, Subodh Sharma,

15

Static Analysis for violated usages of the API

Model Checking for Concurrency Bugs

Instrumentation and Trace Checking

Static Analysis to Support Model Checking– Loop transformations

– Strength reduction of code

– …

But, …who gives us the formal models to check !?

What approaches are cost-effective ? Some candidate approaches to MPI verification

Page 16: 1 In-Situ Model Checking of MPI Parallel Programs Ganesh Gopalakrishnan Joint work with Salman Pervez, Michael DeLisi Sarvani Vakkalanka, Subodh Sharma,

16

Will look at C++ MPI programs– Gotta do C++ , alas ; C won’t do

Not ask user to hand-build Promela / Zing models– Do In-Situ Model Checking – run the actual code

May need to simplify code before running– OK, so complementary static analysis methods needed

LOTS of interleavings that do not matter!– Process memory is not shared!

When can we commute two actions?– Need a formal basis for Partial Order Reduction

» Need Formal Semantics for MPI

» Need to Formulate “Independence”

» Need viable model-checking approach

Our initial choices .. and consequences

Page 17: 1 In-Situ Model Checking of MPI Parallel Programs Ganesh Gopalakrishnan Joint work with Salman Pervez, Michael DeLisi Sarvani Vakkalanka, Subodh Sharma,

17

POR

With 3 processes, the size of an interleaved state space is ps=27

Partial-order reduction explores representative sequences from each equivalence class

Delays the execution of independent transitions

In this example, it is possible to “get away” with 7 states (one interleaving)

04/18/23

Page 18: 1 In-Situ Model Checking of MPI Parallel Programs Ganesh Gopalakrishnan Joint work with Salman Pervez, Michael DeLisi Sarvani Vakkalanka, Subodh Sharma,

18

Possible savings in one example

0: MPI_Init1: MPI_Win_lock2: MPI_Accumulate3: MPI_Win_unlock4: MPI_Barrier5: MPI_Finalize

P0 (owner of window) P1 (non-owner of window)

0: MPI_Init1: MPI_Win_lock2: MPI_Accumulate3: MPI_Win_unlock4: MPI_Barrier5: MPI_Finalize

•These are the dependent operations• 504 interleavings without POR in this example• 2 interleavings with POR !!

Page 19: 1 In-Situ Model Checking of MPI Parallel Programs Ganesh Gopalakrishnan Joint work with Salman Pervez, Michael DeLisi Sarvani Vakkalanka, Subodh Sharma,

19

We developed formal semantics of MPI for understanding MPI and also design a POR algorithm…

04/18/23

MPI 1.1 API

Point to Point Operations

Collective Operations

Requests

Communicator

Collective

Context Group

Constants

Page 20: 1 In-Situ Model Checking of MPI Parallel Programs Ganesh Gopalakrishnan Joint work with Salman Pervez, Michael DeLisi Sarvani Vakkalanka, Subodh Sharma,

2020

Simplified Semantics of MPI_Wait

04/18/23

20

Page 21: 1 In-Situ Model Checking of MPI Parallel Programs Ganesh Gopalakrishnan Joint work with Salman Pervez, Michael DeLisi Sarvani Vakkalanka, Subodh Sharma,

21

TLA+ Spec of MPI_Wait (Slide 1/2)

Page 22: 1 In-Situ Model Checking of MPI Parallel Programs Ganesh Gopalakrishnan Joint work with Salman Pervez, Michael DeLisi Sarvani Vakkalanka, Subodh Sharma,

22

TLA+ Spec of MPI_Wait (Slide 2/2)

Page 23: 1 In-Situ Model Checking of MPI Parallel Programs Ganesh Gopalakrishnan Joint work with Salman Pervez, Michael DeLisi Sarvani Vakkalanka, Subodh Sharma,

23

Executable Formal Specification can help validate our understanding of MPI …

04/18/23

TLA+ MPI Library Model

TLA+ Prog. Model

MPIC Program Model

Visual Studio 2005

Phoenix Compiler

TLC Model Checker MPIC Model Checker

Verification Environment

MPIC IR

FMICS 07 PADTAD 07

Page 24: 1 In-Situ Model Checking of MPI Parallel Programs Ganesh Gopalakrishnan Joint work with Salman Pervez, Michael DeLisi Sarvani Vakkalanka, Subodh Sharma,

24

Even 5-line MPI programs may confound!Hence a Litmus-test outcome calculator based on formal semantics is quite handy

04/18/23

p0: { Irecv(rcvbuf1, from p1); Irecv(rcvbuf2, from p1); … }

p1: { sendbuf1 = 6; sendbuf2 = 7; Issend(sendbuf1, to p0); Isend (sendbuf2, to p0); … }

• In-order message delivery (rcvbuf1 == 6)

• Can access the buffers only after a later wait / test

• The second receive may complete before the first

• When Issend (synch.) is posted, all that is guaranteed is that Irecv(rcvbuf1,…) has been posted

Page 25: 1 In-Situ Model Checking of MPI Parallel Programs Ganesh Gopalakrishnan Joint work with Salman Pervez, Michael DeLisi Sarvani Vakkalanka, Subodh Sharma,

25

The Histrionics of FV for HPC (1)

Page 26: 1 In-Situ Model Checking of MPI Parallel Programs Ganesh Gopalakrishnan Joint work with Salman Pervez, Michael DeLisi Sarvani Vakkalanka, Subodh Sharma,

26

The Histrionics of FV for HPC (2)

Page 27: 1 In-Situ Model Checking of MPI Parallel Programs Ganesh Gopalakrishnan Joint work with Salman Pervez, Michael DeLisi Sarvani Vakkalanka, Subodh Sharma,

27

Error-trace Visualization in VisualStudio

Page 28: 1 In-Situ Model Checking of MPI Parallel Programs Ganesh Gopalakrishnan Joint work with Salman Pervez, Michael DeLisi Sarvani Vakkalanka, Subodh Sharma,

28

Alas, MPI’s dependence is not static

04/18/23

Dependencies may not be fully known, JUST by looking at enabled actions

Conservative Assumptions could be made (as in Siegel’s Urgent Algorithm)

The same problem exists with other “dynamic situations”– e.g. MPI_Cancel

Send(to Q) Recv(from *)

Send(to Q)

Some Stmt

Proc P: Proc Q: Proc R:

Page 29: 1 In-Situ Model Checking of MPI Parallel Programs Ganesh Gopalakrishnan Joint work with Salman Pervez, Michael DeLisi Sarvani Vakkalanka, Subodh Sharma,

29

Dynamic Dependence due to MPI Wildcard Communication…

04/18/23

Illustration of a Missed Dependency that would have been detected, had Proc R been scheduled first…

Send(to Q) Recv(from *)

Send(to Q)

Some Stmt

Proc P: Proc Q: Proc R:

Page 30: 1 In-Situ Model Checking of MPI Parallel Programs Ganesh Gopalakrishnan Joint work with Salman Pervez, Michael DeLisi Sarvani Vakkalanka, Subodh Sharma,

30

Dependance in MPI (partial results)

• Wildcard receives and the sends targeting it are dependent• Each send potentially provides a different value to the receive

• For Isend and Irecv, the dependency is induced by wait / test that help complete these operations

• Barrier entry order does not matter

• MPI Win_lock (owner) and Win_unlock (non-owner)

• Need to characterize more MPI ops (future)

04/18/23

Page 31: 1 In-Situ Model Checking of MPI Parallel Programs Ganesh Gopalakrishnan Joint work with Salman Pervez, Michael DeLisi Sarvani Vakkalanka, Subodh Sharma,

31

Situation is similar to that discussed in Flanagan/Godefroid POPL 05 (DPOR)

a[ j ]++ a[ k ]--

• Action Dependence Determines COMMUTABILITY (POR theory is really detailed; it is more than commutability, but let’s pretend it is …)

• Depends on j == k, in this example

• Can be very difficult to determine statically

• Can determine dynamically

Page 32: 1 In-Situ Model Checking of MPI Parallel Programs Ganesh Gopalakrishnan Joint work with Salman Pervez, Michael DeLisi Sarvani Vakkalanka, Subodh Sharma,

32

Hence we turn to their DPOR algorithm

Ample determinedusing “local” criteria

Current State

Next move of Red process

Nearest DependentTransitionLooking Back

Add Red Process to“Backtrack Set”

This builds the “Ample set” incrementally based on observed dependencies

Blue is in “Done” set

{ BT }, { Done }

Page 33: 1 In-Situ Model Checking of MPI Parallel Programs Ganesh Gopalakrishnan Joint work with Salman Pervez, Michael DeLisi Sarvani Vakkalanka, Subodh Sharma,

33

How to instrument?– MPI provides the PMPI mechanism

– For MPI_Send, we have a PMPI_Send that does the same thing

» Over-ride MPI_Send

» Do instrumentation within it

» Launch PMPI_Send when necessary

How to orchestrate schedule?– MPI processes communicate with scheduler through TCP sockets

– MPI processes send MPI envelopes into scheduler

– Scheduler lets whoever it thinks must go

– Execute upto MPI_Finalize

» Naturally an acyclic state space !!

– Replay by restarting the MPI system

» Ouch !! but wait, … the Chinese Postman to the rescue ?

How to make DPOR work for MPI ? (I)

Page 34: 1 In-Situ Model Checking of MPI Parallel Programs Ganesh Gopalakrishnan Joint work with Salman Pervez, Michael DeLisi Sarvani Vakkalanka, Subodh Sharma,

34

How to not get wedged inside MPI progress engine?– Understand MPI’s progress engine

– If in doubt, “poke it” through commands that are known to enter the progress engine

» Some of this has been demonstrated wrt. MPI one-sided

How to deal with system resource issues?

– If the system provides buffering for ‘send’, how do we schedule?» We schedule Sends as soon as they arrive

– If not, then how?» We schedule Sends only as soon as the matching Receives

arrive

How to make DPOR work for MPI ? (II)

Page 35: 1 In-Situ Model Checking of MPI Parallel Programs Ganesh Gopalakrishnan Joint work with Salman Pervez, Michael DeLisi Sarvani Vakkalanka, Subodh Sharma,

35

Simple 1-sided Example…will show advancing computation by Blue marching

0: MPI_Init1: MPI_Win_lock2: MPI_Accumulate3: MPI_Win_unlock4: MPI_Barrier5: MPI_Finalize

P0 (owner of window) P1 (non-owner of window)

0: MPI_Init1: MPI_Win_lock2: MPI_Accumulate3: MPI_Win_unlock4: MPI_Barrier5: MPI_Finalize

Page 36: 1 In-Situ Model Checking of MPI Parallel Programs Ganesh Gopalakrishnan Joint work with Salman Pervez, Michael DeLisi Sarvani Vakkalanka, Subodh Sharma,

36

Simple 1-sided Example…will show advancing computation by Blue marching

0: MPI_Init1: MPI_Win_lock2: MPI_Accumulate3: MPI_Win_unlock4: MPI_Barrier5: MPI_Finalize

P0 (owner of window) P1 (non-owner of window)

0: MPI_Init1: MPI_Win_lock2: MPI_Accumulate3: MPI_Win_unlock4: MPI_Barrier5: MPI_Finalize

Record that owner has acquired window access

Page 37: 1 In-Situ Model Checking of MPI Parallel Programs Ganesh Gopalakrishnan Joint work with Salman Pervez, Michael DeLisi Sarvani Vakkalanka, Subodh Sharma,

37

Simple 1-sided Example…will show advancing computation by Blue marching

0: MPI_Init1: MPI_Win_lock2: MPI_Accumulate3: MPI_Win_unlock4: MPI_Barrier5: MPI_Finalize

P0 (owner of window) P1 (non-owner of window)

0: MPI_Init1: MPI_Win_lock2: MPI_Accumulate3: MPI_Win_unlock4: MPI_Barrier5: MPI_Finalize

Treat non-owner’s win_lock as a no-op

Page 38: 1 In-Situ Model Checking of MPI Parallel Programs Ganesh Gopalakrishnan Joint work with Salman Pervez, Michael DeLisi Sarvani Vakkalanka, Subodh Sharma,

38

Simple 1-sided Example…will show advancing computation by Blue marching

0: MPI_Init1: MPI_Win_lock2: MPI_Accumulate3: MPI_Win_unlock4: MPI_Barrier5: MPI_Finalize

P0 (owner of window) P1 (non-owner of window)

0: MPI_Init1: MPI_Win_lock2: MPI_Accumulate3: MPI_Win_unlock4: MPI_Barrier5: MPI_Finalize

Perform Accumulate from P1

Page 39: 1 In-Situ Model Checking of MPI Parallel Programs Ganesh Gopalakrishnan Joint work with Salman Pervez, Michael DeLisi Sarvani Vakkalanka, Subodh Sharma,

39

Simple 1-sided Example…will show advancing computation by Blue marching

0: MPI_Init1: MPI_Win_lock2: MPI_Accumulate3: MPI_Win_unlock4: MPI_Barrier5: MPI_Finalize

P0 (owner of window) P1 (non-owner of window)

0: MPI_Init1: MPI_Win_lock2: MPI_Accumulate3: MPI_Win_unlock4: MPI_Barrier5: MPI_Finalize

Perform Accumulate from P0

Page 40: 1 In-Situ Model Checking of MPI Parallel Programs Ganesh Gopalakrishnan Joint work with Salman Pervez, Michael DeLisi Sarvani Vakkalanka, Subodh Sharma,

40

Simple 1-sided Example…will show advancing computation by Blue marching

0: MPI_Init1: MPI_Win_lock2: MPI_Accumulate3: MPI_Win_unlock4: MPI_Barrier5: MPI_Finalize

P0 (owner of window) P1 (non-owner of window)

0: MPI_Init1: MPI_Win_lock2: MPI_Accumulate3: MPI_Win_unlock4: MPI_Barrier5: MPI_Finalize

P1 issues Win_unlock; Scheduler traps it; Notesthat P0 has locked window

Page 41: 1 In-Situ Model Checking of MPI Parallel Programs Ganesh Gopalakrishnan Joint work with Salman Pervez, Michael DeLisi Sarvani Vakkalanka, Subodh Sharma,

41

Simple 1-sided Example…will show advancing computation by Blue marching

0: MPI_Init1: MPI_Win_lock2: MPI_Accumulate3: MPI_Win_unlock4: MPI_Barrier5: MPI_Finalize

P0 (owner of window) P1 (non-owner of window)

0: MPI_Init1: MPI_Win_lock2: MPI_Accumulate3: MPI_Win_unlock4: MPI_Barrier5: MPI_Finalize

So scheduler records P1 to be in a “blocked” state;But we do allow P1 to launch its PMPI_Win_unlock(there is nothing else that could be done!)

Page 42: 1 In-Situ Model Checking of MPI Parallel Programs Ganesh Gopalakrishnan Joint work with Salman Pervez, Michael DeLisi Sarvani Vakkalanka, Subodh Sharma,

42

Simple 1-sided Example…will show advancing computation by Blue marching

0: MPI_Init1: MPI_Win_lock2: MPI_Accumulate3: MPI_Win_unlock4: MPI_Barrier5: MPI_Finalize

P0 (owner of window) P1 (non-owner of window)

0: MPI_Init1: MPI_Win_lock2: MPI_Accumulate3: MPI_Win_unlock4: MPI_Barrier5: MPI_Finalize

Now, P0 issues win_unlock

Page 43: 1 In-Situ Model Checking of MPI Parallel Programs Ganesh Gopalakrishnan Joint work with Salman Pervez, Michael DeLisi Sarvani Vakkalanka, Subodh Sharma,

43

Simple 1-sided Example…will show advancing computation by Blue marching

0: MPI_Init1: MPI_Win_lock2: MPI_Accumulate3: MPI_Win_unlock4: MPI_Barrier5: MPI_Finalize

P0 (owner of window) P1 (non-owner of window)

0: MPI_Init1: MPI_Win_lock2: MPI_Accumulate3: MPI_Win_unlock4: MPI_Barrier5: MPI_Finalize

• Recall that P1’s PMPI_Win_unlock has been launched• But, P1 has not reported back to scheduler yet…

Page 44: 1 In-Situ Model Checking of MPI Parallel Programs Ganesh Gopalakrishnan Joint work with Salman Pervez, Michael DeLisi Sarvani Vakkalanka, Subodh Sharma,

44

Simple 1-sided Example…will show advancing computation by Blue marching

0: MPI_Init1: MPI_Win_lock2: MPI_Accumulate3: MPI_Win_unlock4: MPI_Barrier5: MPI_Finalize

P0 (owner of window) P1 (non-owner of window)

0: MPI_Init1: MPI_Win_lock2: MPI_Accumulate3: MPI_Win_unlock4: MPI_Barrier5: MPI_Finalize

• To keep things simple, the scheduler works in “phases”• When Pi…Pj have been “let go” in one phase of the scheduler, no other Pk is “let go” till Pi..Pj have reported back.

Page 45: 1 In-Situ Model Checking of MPI Parallel Programs Ganesh Gopalakrishnan Joint work with Salman Pervez, Michael DeLisi Sarvani Vakkalanka, Subodh Sharma,

45

Simple 1-sided Example…will show advancing computation by Blue marching

0: MPI_Init1: MPI_Win_lock2: MPI_Accumulate3: MPI_Win_unlock4: MPI_Barrier5: MPI_Finalize

P0 (owner of window) P1 (non-owner of window)

0: MPI_Init1: MPI_Win_lock2: MPI_Accumulate3: MPI_Win_unlock4: MPI_Barrier5: MPI_Finalize

• If we now allow P0’s PMPI_Win_unlock to issue, it may zip thru the progress engine and miss P1’s PMPI_Win_unlock• But we HAVE to allow P0 to launch, or else, P1 won’t get access to window!

Page 46: 1 In-Situ Model Checking of MPI Parallel Programs Ganesh Gopalakrishnan Joint work with Salman Pervez, Michael DeLisi Sarvani Vakkalanka, Subodh Sharma,

46

Simple 1-sided Example…will show advancing computation by Blue marching

0: MPI_Init1: MPI_Win_lock2: MPI_Accumulate3: MPI_Win_unlock4: MPI_Barrier5: MPI_Finalize

P0 (owner of window) P1 (non-owner of window)

0: MPI_Init1: MPI_Win_lock2: MPI_Accumulate3: MPI_Win_unlock4: MPI_Barrier5: MPI_Finalize

• So P1 will likely be stuck in the progress engine• But P0 next enters the progress engine only at Barrier• But we don’t schedule P0 till P1 has reported back• But P1 won’t report back (stuck inside progress engine)

Page 47: 1 In-Situ Model Checking of MPI Parallel Programs Ganesh Gopalakrishnan Joint work with Salman Pervez, Michael DeLisi Sarvani Vakkalanka, Subodh Sharma,

47

Simple 1-sided Example…will show advancing computation by Blue marching

0: MPI_Init1: MPI_Win_lock2: MPI_Accumulate3: MPI_Win_unlock4: MPI_Barrier5: MPI_Finalize

P0 (owner of window) P1 (non-owner of window)

0: MPI_Init1: MPI_Win_lock2: MPI_Accumulate3: MPI_Win_unlock4: MPI_Barrier5: MPI_Finalize

Deadlock inside scheduler !

Page 48: 1 In-Situ Model Checking of MPI Parallel Programs Ganesh Gopalakrishnan Joint work with Salman Pervez, Michael DeLisi Sarvani Vakkalanka, Subodh Sharma,

48

Simple 1-sided Example…will show advancing computation by Blue marching

0: MPI_Init1: MPI_Win_lock2: MPI_Accumulate3: MPI_Win_unlock4: MPI_Barrier5: MPI_Finalize

P0 (owner of window) P1 (non-owner of window)

0: MPI_Init1: MPI_Win_lock2: MPI_Accumulate3: MPI_Win_unlock4: MPI_Barrier5: MPI_Finalize

Solution: When P0 comes to scheduler, we do notgive a ‘go-ahead’ to it; so it keeps poking the progressengine; this causes P1 to come back to scheduler; thenwe let P0’s PMPI_Win_unlock to issue

Page 49: 1 In-Situ Model Checking of MPI Parallel Programs Ganesh Gopalakrishnan Joint work with Salman Pervez, Michael DeLisi Sarvani Vakkalanka, Subodh Sharma,

49

P0’s code to handle MPI_Win_unlock(in general, this is how every MPI_SomeFunc is structured…)

MPI_Win_unlock(arg1, arg2...argN) {

sendToSocket(pID, Win_unlock, arg1,...,argN);

while(recvFromSocket(pID) != go-ahead)

MPI_Iprobe(MPI_ANY_SOURCE, 0, MPI_COMM_WORLD...);

return PMPI_Win_unlock(arg1, arg2...argN);

} An innocuous Progress-Engine “Poker”

Page 50: 1 In-Situ Model Checking of MPI Parallel Programs Ganesh Gopalakrishnan Joint work with Salman Pervez, Michael DeLisi Sarvani Vakkalanka, Subodh Sharma,

50

Assessment of Solution to forward-progress

• Solutions may be MPI-library specific

• This is OK so long as we know exactly how the progress engine of the MPI library works

• This needs to be advertised by MPI library designers

• Better still: if they can provide more “hooks”, ISP can be made more successful

04/18/23

Page 51: 1 In-Situ Model Checking of MPI Parallel Programs Ganesh Gopalakrishnan Joint work with Salman Pervez, Michael DeLisi Sarvani Vakkalanka, Subodh Sharma,

51

So how well does ISP work ?• Trapezoidal integration deadlock

• Found in seconds• Total 33 interleavings in 9 seconds after fix• 8.4 seconds spent restarting MPI system

• Monte-carlo computation of Pi• Found three deadlocks we did not know about, in seconds• No modeling effort whatsoever• After fixing, took 3,427 interleavings taking 15.5 mins• About 15 mins restarting MPI system

• For Byte-Range Locking using 1-sided• Deadlock was found by us in previous work• Found again by ISP in 62 interleavings• After fix, 11,000 interleavings… no end in sight

04/18/23

Page 52: 1 In-Situ Model Checking of MPI Parallel Programs Ganesh Gopalakrishnan Joint work with Salman Pervez, Michael DeLisi Sarvani Vakkalanka, Subodh Sharma,

52

How to improve the performance of ISP ?• Minimize restart overhead

• Maybe we don’t need to reset all data before restarting

• Implemented Chinese-Postman-like tour • “Collective goto” to initial state, just before MPI_Finalize

• Trapezoidal finishes in 0.3 seconds (was 9 seconds before)

• Monte-carlo finishes in 63 seconds (was 15 mins)

04/18/23

Page 53: 1 In-Situ Model Checking of MPI Parallel Programs Ganesh Gopalakrishnan Joint work with Salman Pervez, Michael DeLisi Sarvani Vakkalanka, Subodh Sharma,

53

Vary the DPOR dependence matrix during search

Eliminate computations that don’t affect control– Static analysis to remove blocks that won’t deadlock

Loop peeling transformations– Many MPI calls are within loops

– Do not interleave all of them (let some happen w/o trapping)

– This sampling should not confuse our scheduler

Insert barriers to confine search– Analysis to infer concurrent cuts (incomparable clock vectors)

Other ideas to improve ISP (TBD)

Page 54: 1 In-Situ Model Checking of MPI Parallel Programs Ganesh Gopalakrishnan Joint work with Salman Pervez, Michael DeLisi Sarvani Vakkalanka, Subodh Sharma,

54

Summary: Motivations for the In-Situ DPOR (ISP) Approach

Building verification models of MPI programs is not straightforward – The bug may be in the code that looks innocuous

– The bug may be in the MPI library function itself

– The final production code may be hand-tuned

Complementary approaches are possible – Model check using other tools to weed out concurrency errors

– Use static analysis to detect bugs

– Use automated synthesis to guarantee correctness by construction

Page 55: 1 In-Situ Model Checking of MPI Parallel Programs Ganesh Gopalakrishnan Joint work with Salman Pervez, Michael DeLisi Sarvani Vakkalanka, Subodh Sharma,

55

Related work on FV for MPI programs

Main related work is that by Siegel and Avrunin

Provide synchronous channel theorems for blocking and non-blocking MPI constructs– Deadlocks caught iff caught using synchronous channels

Provide a state-machine model for MPI calls– Have built a tool called MPI_Spin that uses C extensions to Promela to encode

MPI state-machine

Provide a symbolic execution approach to check computational results of MPI programs

Define “Urgent Algorithm,” which is a static POR algorithm – Schedules processes in a canonical order– Schedules sends when receives posted – sync channel effect– Wildcard receives handled through over-approximation

Page 56: 1 In-Situ Model Checking of MPI Parallel Programs Ganesh Gopalakrishnan Joint work with Salman Pervez, Michael DeLisi Sarvani Vakkalanka, Subodh Sharma,

56

Initial implementation– Rajeev Thakur (Argonne) proposed ISP idea – instrument + play

– Salman Pervez’s MS thesis – wrote our first ISP

– Robert Palmer provided lots of help / inspiration

– We have a EuroPVM / MPI 2007 paper coauthored by Salman Pervez, Robert Palmer, myself, Mike Kirby – and Rajeev Thakur, Bill Gropp of Argonne National Lab

– Salman moves to Purdue for PhD ; Sarvani takes over ISP

Sarvani’s ISP implementation– A TOTAL rewrite

– Modular for experimentation

– Have collected lots of data

Subodh looking into static analysis support for ISP– Make it easier to do ISP on a given program

Credits for ISP Algorithm

Page 57: 1 In-Situ Model Checking of MPI Parallel Programs Ganesh Gopalakrishnan Joint work with Salman Pervez, Michael DeLisi Sarvani Vakkalanka, Subodh Sharma,

57

Quick demo

Page 58: 1 In-Situ Model Checking of MPI Parallel Programs Ganesh Gopalakrishnan Joint work with Salman Pervez, Michael DeLisi Sarvani Vakkalanka, Subodh Sharma,

58

Overview of Distributed DPOR in Inspect(a tool for PThread Verification – SPIN 07)

Page 59: 1 In-Situ Model Checking of MPI Parallel Programs Ganesh Gopalakrishnan Joint work with Salman Pervez, Michael DeLisi Sarvani Vakkalanka, Subodh Sharma,

59

We first built a sequential DPOR explorer for C / Pthreads programs, called “Inspect”

Multithreaded C/C++ program

Multithreaded C/C++ program

instrumented program

instrumented program

instrumentation

Thread library wrapper

Thread library wrapper

compile

executableexecutable

thread 1

thread n

schedulerrequest/permit

request/permit

Page 60: 1 In-Situ Model Checking of MPI Parallel Programs Ganesh Gopalakrishnan Joint work with Salman Pervez, Michael DeLisi Sarvani Vakkalanka, Subodh Sharma,

60

worker a worker b

Request unloading

idle node id

work description

report result

load balancer

We then devised a work-distribution scheme…

Page 61: 1 In-Situ Model Checking of MPI Parallel Programs Ganesh Gopalakrishnan Joint work with Salman Pervez, Michael DeLisi Sarvani Vakkalanka, Subodh Sharma,

61

Speedup on aget

Page 62: 1 In-Situ Model Checking of MPI Parallel Programs Ganesh Gopalakrishnan Joint work with Salman Pervez, Michael DeLisi Sarvani Vakkalanka, Subodh Sharma,

62

Speedup on bbuf

Page 63: 1 In-Situ Model Checking of MPI Parallel Programs Ganesh Gopalakrishnan Joint work with Salman Pervez, Michael DeLisi Sarvani Vakkalanka, Subodh Sharma,

63

DPOR based search is quite promising– Tool built for MPI exploration is ISP

– Tool built for PThreads exploration is Inspect

– Distributed Inspect helps obtain linear speed-up

– Distributed ISP is within easy reach

– More understanding of Forward Progress and other implementation issues

– More examples

– Any more properties than deadlocks ?

Will improve efficiency of search

Will couple static analysis with DPOR to improve performance

Handling concurrent software written using large APIs remains an important challenge to meet– Need more people to be working on this

Conclusions