02 progpar thread
TRANSCRIPT
-
8/14/2019 02 Progpar Thread
1/76
Outlines
How to Efficiently ProgramHigh Performance Architectures ?
Arnaud LEGRAND, CR CNRS, LIG/INRIA/MescalJean-Louis ROCH, MCF ENSIMAG, LIG/INRIA/Moais
Vincent DANJEAN, MCF UJF, LIG/INRIA/MoaisDerick KONDO, CR INRIA, LIG/INRIA/Mescal
Jean-Franois MHAUT, PR UJF, LIG/INRIA/MescalBruno RAFFIN, CR INRIA, LIG/INRIA/MoaisAlexandre TERMIER, MCF UJF, LIG/Hadas
Some slides come from Samuel THIBAULT, MCF Bordeaux, LaBRI/Runtime
http://find/http://goback/ -
8/14/2019 02 Progpar Thread
2/76
-
8/14/2019 02 Progpar Thread
3/76
Outlines
Part I: High Performance ArchitecturesPart II: Parallelism and ThreadsPart III: SynchronisationPart IV: Multithreading and Networking
High Performance Architectures
1 Parallel Machines with Shared MemoryILP and multi-cores
Symmetric Multi Processors
2 Parallel Machines with Distributed MemoryClusters
Grids
3 Current Architectures in HPC
http://find/http://goback/ -
8/14/2019 02 Progpar Thread
4/76
Outlines
Part I: High Performance ArchitecturesPart II: Parallelism and ThreadsPart III: SynchronisationPart IV: Multithreading and Networking
Parallelism and Threads
4 Introduction to Threads
5 Kinds of threadsUser threads
Kernel threadsMixed models
6 User Threads and Blocking System CallsScheduler Activations
7 Thread Programming InterfacePOSIX ThreadsLinux POSIX Threads LibrariesBasic POSIX Thread API
http://find/http://goback/ -
8/14/2019 02 Progpar Thread
5/76
Outlines
Part I: High Performance ArchitecturesPart II: Parallelism and ThreadsPart III: SynchronisationPart IV: Multithreading and Networking
Synchronisation
8 Hardware Support
9 Busy-waiting Synchronisation
10 High-level Synchronisation PrimitivesSemaphoresMonitors
11 Some examples with LinuxOld Linux libpthreadNew POSIX Thread Library
P I Hi h P f A hi
http://find/http://goback/ -
8/14/2019 02 Progpar Thread
6/76
Outlines
Part I: High Performance ArchitecturesPart II: Parallelism and ThreadsPart III: SynchronisationPart IV: Multithreading and Networking
Multithreading and Networking
12 A Brief Overview of MPI
13 Mixing Threads and Communication in HPCProblems arises
Discussion about solution
http://find/http://goback/ -
8/14/2019 02 Progpar Thread
7/76
Parallel Machines with Shared MemoryParallel Machines with Distributed Memory
Current Architectures in HPC
Part I
High Performance Architectures
http://find/http://goback/ -
8/14/2019 02 Progpar Thread
8/76
Parallel Machines with Shared MemoryParallel Machines with Distributed Memory
Current Architectures in HPC
Outlines: High Performance Architectures
1 Parallel Machines with Shared MemoryILP and multi-cores
Symmetric Multi Processors
2 Parallel Machines with Distributed MemoryClustersGrids
3 Current Architectures in HPC
http://find/http://goback/ -
8/14/2019 02 Progpar Thread
9/76
Parallel Machines with Shared MemoryParallel Machines with Distributed Memory
Current Architectures in HPC
Parallel Architectures
Two main kindsArchitectures with shared memory and architectures withdistributed memory.
Multiprocessors
P
P
P
PMem
Clusters
Fast networkP Mem P Mem P Mem
http://find/http://goback/ -
8/14/2019 02 Progpar Thread
10/76
Parallel Machines with Shared MemoryParallel Machines with Distributed Memory
Current Architectures in HPC
ILP and multi-coresSymmetric Multi Processors
Why several processors/cores ?
Limits for monocore processors
superscalar processors: instruction level parallelism
frequency
electrical power
What to do with place available on chips ?
caches (bigger and quicker)
several series of registers (hyperthreaded processors)
several series of cores (multi-core processors)
all of that
http://find/http://goback/ -
8/14/2019 02 Progpar Thread
11/76
P ll l M hi i h Sh d M
-
8/14/2019 02 Progpar Thread
12/76
Parallel Machines with Shared MemoryParallel Machines with Distributed Memory
Current Architectures in HPC
ILP and multi-coresSymmetric Multi Processors
Towards more and morehierarchical computers
P P M P P M
P P M P P M
P P M P P M
M
M
M
SMT(HyperThreading)
Multi-core
NUMA
P ll l M hi ith Sh d M
http://find/http://goback/ -
8/14/2019 02 Progpar Thread
13/76
Parallel Machines with Shared MemoryParallel Machines with Distributed Memory
Current Architectures in HPC
ILP and multi-coresSymmetric Multi Processors
AMD Quad-Core
M
P PL2 $
P PL2 $ L2 $ L2 $
L3 $
P P L2
P P L2 L2 L2
L3
...
Shared L3 cache
NUMA factor ~1.1-1.5
M
P PL2 $
P PL2 $ L2 $ L2 $
L3 $M
P PL2 $
P PL2 $ L2 $ L2 $
L3 $M
Parallel Machines with Shared Memory
http://find/http://goback/ -
8/14/2019 02 Progpar Thread
14/76
Parallel Machines with Shared MemoryParallel Machines with Distributed Memory
Current Architectures in HPC
ILP and multi-coresSymmetric Multi Processors
Intel Quad-Core
P PL2 $
P PL2 $
MP P
L2 $
P PL2 $
...
Hierarchical cache levelsP P
L2 $
P PL2 $
Parallel Machines with Shared Memory
http://find/http://goback/ -
8/14/2019 02 Progpar Thread
15/76
Parallel Machines with Shared MemoryParallel Machines with Distributed Memory
Current Architectures in HPC
ILP and multi-coresSymmetric Multi Processors
5
dual-quad-core
M
P1 P5 P3 P7
P0 P2P4 P6
Intel
Hierarchical cache levels
Parallel Machines with Shared Memory
http://find/http://goback/ -
8/14/2019 02 Progpar Thread
16/76
Parallel Machines with Shared MemoryParallel Machines with Distributed Memory
Current Architectures in HPC
ClustersGrids
Clusters
Composed of a few to hundreds of machines
often homogeneoussame processor, memory, etc.
often linked with a high speed, low latency networkMyrinet, InfinityBand, Quadrix, etc.
Biggest clusters can be split in several parts
computing nodesI/O nodes
front (interactive) node
http://find/http://goback/ -
8/14/2019 02 Progpar Thread
17/76
Parallel Machines with Shared Memory
-
8/14/2019 02 Progpar Thread
18/76
a a e ac es t S a ed e o yParallel Machines with Distributed Memory
Current Architectures in HPC
Current Architectures in HPC
Hierarchical ArchitecturesHT technology
multi-core processor
multi processors machine
cluster of machines
grid of clusters and individual machines
Even more complexity
computing on GPUrequire specialized codes but hardware far more powerful
FPGAhardware can be specialized on demand
still lots of work on interface programming here
Introduction to ThreadsKinds of threads
http://find/http://goback/ -
8/14/2019 02 Progpar Thread
19/76
Kinds of threadsUser Threads and Blocking System Calls
Thread Programming Interface
Part II
Parallelism and Threads
http://find/http://goback/ -
8/14/2019 02 Progpar Thread
20/76
Introduction to ThreadsKinds of threads
-
8/14/2019 02 Progpar Thread
21/76
Kinds of threadsUser Threads and Blocking System Calls
Thread Programming Interface
Programming on Shared Memory Parallel Machines
Using process
Processors
Operating system Resources management(files, memory, CPU, network, etc)
Process
Mem Mem Mem Mem
Introduction to ThreadsKinds of threads
http://find/http://goback/ -
8/14/2019 02 Progpar Thread
22/76
Kinds of threadsUser Threads and Blocking System Calls
Thread Programming Interface
Programming on Shared Memory Parallel Machines
Using threads
Processors
Operating system Resources management(files, memory, CPU, network, etc)
Process
Multithreaded process
Memory
Introduction to ThreadsKinds of threads
http://find/http://goback/ -
8/14/2019 02 Progpar Thread
23/76
Kinds of threadsUser Threads and Blocking System Calls
Thread Programming Interface
Introduction to Threads
Why threads ?To take profit from shared memory parallel architectures
SMP, hyperthreaded, multi-core, NUMA, etc. processorsfuture Intel processors: several hundreds cores
To describe the parallelism within the applicationsindependent tasks, I/O overlap, etc.
What will use threads ?User application codes
directly (with thread libraries)POSIX API (IEEE POSIX 1003.1C norm) in C, C++, . . .
with high-level programming languages (Ada, OpenMP, . . . )
Middleware programming environments
demonized tasks (garbage collector, . . . ), . . .
Introduction to ThreadsKinds of threads
User threadsK l th d
http://find/http://goback/ -
8/14/2019 02 Progpar Thread
24/76
User Threads and Blocking System CallsThread Programming Interface
Kernel threadsMixed models
User threads
by the kernelEntities managed
Operating system
Processors
Multithreaded process
threads
Kernel scheduler
User level
User scheduler
Efficiency + Flexibility + SMP - Blocking syscalls -
Introduction to ThreadsKinds of threads
User threadsKernel threads
http://find/http://goback/ -
8/14/2019 02 Progpar Thread
25/76
User Threads and Blocking System CallsThread Programming Interface
Kernel threadsMixed models
Kernel threads
by the kernelEntities managed
Operating system
Processors
Multithreaded process
threads
Kernel scheduler
Kernel level
Efficiency - Flexibility - SMP + Blocking syscalls +
Introduction to ThreadsKinds of threads
User threadsKernel threads
http://find/http://goback/ -
8/14/2019 02 Progpar Thread
26/76
User Threads and Blocking System CallsThread Programming Interface
Kernel threadsMixed models
Mixed models
by the kernelEntities managed
Operating system
Processors
Multithreaded process
threads
Kernel scheduler
User level
User scheduler
Efficiency + Flexibility + SMP + Blocking syscalls limited
http://find/http://goback/ -
8/14/2019 02 Progpar Thread
27/76
Introduction to ThreadsKinds of threads
S CScheduler Activations
-
8/14/2019 02 Progpar Thread
28/76
User Threads and Blocking System CallsThread Programming Interface
Scheduler Activations
User Threads and Blocking System Calls
User level library
Kernel scheduler
User scheduler
Mixed library
Kernel scheduler
User scheduler
Introduction to ThreadsKinds of threads
U Th d d Bl ki S t C llScheduler Activations
http://find/http://goback/ -
8/14/2019 02 Progpar Thread
29/76
User Threads and Blocking System CallsThread Programming Interface
Sc edu e ct at o s
Scheduler Activations
Idea proposed by Anderson et al. (91)Dialogue (and not monologue) between the user and kernelschedulers
the user scheduler uses system calls
the kernel scheduler uses upcalls
Upcalls
Notify the application of scheduling kernel events
Activationsa new structure to support upcallsa kinf of kernel thread or virtual processor
creating and destruction managed by the kernel
Introduction to ThreadsKinds of threads
User Threads and Blocking System CallsScheduler Activations
http://find/http://goback/ -
8/14/2019 02 Progpar Thread
30/76
User Threads and Blocking System CallsThread Programming Interface
Scheduler Activations
Instead of:
InterruptionExternal request
spaceUser
Kernel Time
Introduction to ThreadsKinds of threads
User Threads and Blocking System CallsScheduler Activations
http://find/http://goback/ -
8/14/2019 02 Progpar Thread
31/76
User Threads and Blocking System CallsThread Programming Interface
Scheduler Activations
Instead of:
InterruptionExternal request
spaceUser
Kernel Time
...better use the following schema:
Interruption
Act. A
External request
spaceUser
Kernel Time
upcall(unblocked, preempted, new)
Act. CAct. A (next)
Act. B
upcall(blocked, new)
Introduction to ThreadsKinds of threads
User Threads and Blocking System CallsScheduler Activations
http://find/http://goback/ -
8/14/2019 02 Progpar Thread
32/76
User Threads and Blocking System CallsThread Programming Interface
Working principle
managed by the kernel
User-level
Operating system
Processors
Multithreaded process
Activations
Kernel scheduler
threads
User scheduler
Introduction to ThreadsKinds of threads
User Threads and Blocking System CallsScheduler Activations
http://find/http://goback/ -
8/14/2019 02 Progpar Thread
33/76
User Threads and Blocking System CallsThread Programming Interface
Working principle
managed by the kernel
User-level
Operating system
Processors
Multithreaded process
Activations
Kernel scheduler
threads
Upcall(New)
Introduction to ThreadsKinds of threads
User Threads and Blocking System CallsScheduler Activations
http://find/http://goback/ -
8/14/2019 02 Progpar Thread
34/76
User Threads and Blocking System CallsThread Programming Interface
Working principle
managed by the kernel
User-level
Operating system
Processors
Multithreaded process
Activations
Kernel scheduler
threads
http://find/http://goback/ -
8/14/2019 02 Progpar Thread
35/76
Introduction to ThreadsKinds of threads
User Threads and Blocking System CallsScheduler Activations
-
8/14/2019 02 Progpar Thread
36/76
g yThread Programming Interface
Working principle
managed by the kernel
User-level
Operating system
Processors
Multithreaded process
Activations
Kernel scheduler
systemcall
Blocking
threads
http://find/http://goback/ -
8/14/2019 02 Progpar Thread
37/76
Introduction to ThreadsKinds of threadsUser Threads and Blocking System Calls
Scheduler Activations
-
8/14/2019 02 Progpar Thread
38/76
Thread Programming Interface
Working principle
managed by the kernel
User-level
Operating system
Processors
Multithreaded process
Activations
Kernel scheduler
systemcall
Blocking
threads
Introduction to ThreadsKinds of threadsUser Threads and Blocking System Calls
Scheduler Activations
http://find/http://goback/ -
8/14/2019 02 Progpar Thread
39/76
Thread Programming Interface
Working principle
managed by the kernel
User-level
Operating system
Processors
Multithreaded process
Activations
Kernel scheduler
systemcall
Blocking
threads
Introduction to ThreadsKinds of threadsUser Threads and Blocking System Calls
Th d P i I t f
Scheduler Activations
http://find/http://goback/ -
8/14/2019 02 Progpar Thread
40/76
Thread Programming Interface
Working principle
managed by the kernel
User-level
Operating system
Processors
Multithreaded process
Activations
Kernel scheduler
threads
Introduction to ThreadsKinds of threadsUser Threads and Blocking System Calls
Thread Programming Interface
Scheduler Activations
http://find/http://goback/ -
8/14/2019 02 Progpar Thread
41/76
Thread Programming Interface
Working principle
managed by the kernel
User-level
Operating system
Processors
Multithreaded process
Activations
Kernel scheduler
threads
Upcall(Unblocked,Preempted,New)
Introduction to ThreadsKinds of threadsUser Threads and Blocking System Calls
Thread Programming Interface
POSIX ThreadsLinux POSIX Threads LibrariesBasic POSIX Thread API
http://find/http://goback/ -
8/14/2019 02 Progpar Thread
42/76
Thread Programming Interface
Normalisation of the thread interface
Before the normeach Unix had its (slightly) incompatible interface
but same kinds of features was present
POSIX normalisationIEEE POSIX 1003.1C norm (also called POSIX threadsnorm)Only the API is normalised (not the ABI)
POSIX thread libraries can easily be switched at source
level but not at runtimePOSIX threads own
processor registers, stack, etc.signal mask
POSIX threads can be of any kind (user, kernel, etc.)
Introduction to ThreadsKinds of threadsUser Threads and Blocking System Calls
Thread Programming Interface
POSIX ThreadsLinux POSIX Threads LibrariesBasic POSIX Thread API
http://find/http://goback/ -
8/14/2019 02 Progpar Thread
43/76
Thread Programming Interface
Linux POSIX Threads Libraries
LinuxThread (1996) : kernel level, Linux standard threadlibrary for a long time, not fully POSIX compliant
GNU-Pth (1999) : user level, portable, POSIXNGPT (2002) : mixed, based on GNU-Pth, POSIX, notdeveloped anymore
NPTL (2002) : kernel level, POSIX, current Linuxstandard thread library
PM2/Marcel (2001) : mixed, POSIX compliant, lots ofextensions for HPC (scheduling control, etc.)
Introduction to ThreadsKinds of threadsUser Threads and Blocking System Calls
Thread Programming Interface
POSIX ThreadsLinux POSIX Threads LibrariesBasic POSIX Thread API
http://find/http://goback/ -
8/14/2019 02 Progpar Thread
44/76
Thread Programming Interface
Basic POSIX Thread API
Creation/destructionint pthread_create(pthread_t *thread, const
pthread_attr_t *attr, void
*(*start_routine)(void*), void *arg)
void pthread_exit(void*
value_ptr)
int pthread_join(pthread_t thread, void
**value_ptr)
Synchronisation (semaphores)
int sem_init(sem_t *sem, int pshared, unsignedint value)
int sem_wait(sem_t *sem)
int sem_post(sem_t *sem)
int sem_destroy(sem_t*
sem)
Introduction to ThreadsKinds of threadsUser Threads and Blocking System Calls
Thread Programming Interface
POSIX ThreadsLinux POSIX Threads LibrariesBasic POSIX Thread API
http://find/http://goback/ -
8/14/2019 02 Progpar Thread
45/76
Thread Programming Interface
Basic POSIX Thread API (2)
Synchronisation (mutex)int pthread_mutex_init(pthread_mutex_t *mutex,
const pthread_mutexattr_t *attr)
int pthread_mutex_lock(pthread_mutex_t *mutex)
int pthread_mutex_unlock(pthread_mutex_t
*mutex)
int pthread_mutex_destroy(pthread_mutex_t
*mutex)
Synchronisation (conditions)int pthread_cond_init(pthread_cond_t *cond,
const pthread_condattr_t *attr)
int pthread_cond_wait(pthread_cond_t *cond,
pthread_mutex_t *mutex)
int pthread_cond_signal(pthread_cond_t *cond)
Introduction to ThreadsKinds of threadsUser Threads and Blocking System Calls
Thread Programming Interface
POSIX ThreadsLinux POSIX Threads LibrariesBasic POSIX Thread API
http://find/http://goback/ -
8/14/2019 02 Progpar Thread
46/76
g g
Basic POSIX Thread API (3)
Per thread dataint pthread_key_create(pthread_key_t *key, void
(*destr_function) (void*))
int pthread_key_delete(pthread_key_t key)
int pthread_setspecific(pthread_key_t key,
const void *pointer)
void * pthread_getspecific(pthread_key_t key)
Introduction to ThreadsKinds of threadsUser Threads and Blocking System Calls
Thread Programming Interface
POSIX ThreadsLinux POSIX Threads LibrariesBasic POSIX Thread API
http://find/http://goback/ -
8/14/2019 02 Progpar Thread
47/76
g g
Basic POSIX Thread API (3)
Per thread dataint pthread_key_create(pthread_key_t *key, void
(*destr_function) (void*))
int pthread_key_delete(pthread_key_t key)
int pthread_setspecific(pthread_key_t key,
const void *pointer)
void * pthread_getspecific(pthread_key_t key)
The new __thread C keyword
used for a global per-thread variableneed support from the compiler and the linker at compiletime and execute time
libraries can have efficient per-thread variables without
disturbing the application Hardware SupportBusy-waiting Synchronisation
High-level Synchronisation PrimitivesSome examples with Linux
http://find/http://goback/ -
8/14/2019 02 Progpar Thread
48/76
Part III
Synchronisation
Hardware SupportBusy-waiting SynchronisationHigh-level Synchronisation Primitives
Some examples with Linux
http://find/http://goback/ -
8/14/2019 02 Progpar Thread
49/76
Outlines: Synchronisation
8 Hardware Support
9 Busy-waiting Synchronisation
10 High-level Synchronisation PrimitivesSemaphoresMonitors
11 Some examples with LinuxOld Linux libpthreadNew POSIX Thread Library
Hardware SupportBusy-waiting SynchronisationHigh-level Synchronisation Primitives
Some examples with Linux
http://find/http://goback/ -
8/14/2019 02 Progpar Thread
50/76
Hardware Support
What happens with incrementations in parallel?
var++; var++;
Hardware SupportBusy-waiting SynchronisationHigh-level Synchronisation Primitives
Some examples with Linux
http://find/http://goback/ -
8/14/2019 02 Progpar Thread
51/76
Hardware Support
What happens with incrementations in parallel?for (i=0; i
-
8/14/2019 02 Progpar Thread
52/76
Hardware SupportBusy-waiting SynchronisationHigh-level Synchronisation Primitives
Some examples with Linux
-
8/14/2019 02 Progpar Thread
53/76
Critical section with busy waiting
Example of code
while (! TAS(&var))
;
/* in critical section */
var=0;
Hardware SupportBusy-waiting Synchronisation
High-level Synchronisation PrimitivesSome examples with Linux
http://find/http://goback/ -
8/14/2019 02 Progpar Thread
54/76
Critical section with busy waiting
Example of code
while (! TAS(&var))
while (var) ;
/* in critical section */
var=0;
Hardware SupportBusy-waiting Synchronisation
High-level Synchronisation PrimitivesSome examples with Linux
http://find/http://goback/ -
8/14/2019 02 Progpar Thread
55/76
Critical section with busy waiting
Example of code
while (! TAS(&var))
while (var) ;
/* in critical section */
var=0;
Busy waiting
+ very reactive
+ no OS or lib support required
- use a processor while not doing anything
- does not scale if there are lots of waiters
http://find/http://goback/ -
8/14/2019 02 Progpar Thread
56/76
Hardware SupportBusy-waiting Synchronisation
High-level Synchronisation PrimitivesSome examples with Linux
SemaphoresMonitors
M i
-
8/14/2019 02 Progpar Thread
57/76
Monitors
MutexTwo states: locked or not
Two methods:lock(m) take the mutex
unlock(m) release the mutex (must be done by thethread owning the mutex)
Conditions
waiting thread list (conditions are not related with tests)
Three methods:wait(c, m) sleep on the condition. The mutex is released
atomically during the wait.signal(c) one sleeping thread is wake up
broadcast(c) all sleeping threads are wake up Hardware Support
Busy-waiting SynchronisationHigh-level Synchronisation Primitives
Some examples with Linux
Old Linux libpthreadNew POSIX Thread Library
Old Li lib th d
http://find/http://goback/ -
8/14/2019 02 Progpar Thread
58/76
Old Linux libpthread
First Linux kernel thread library
limited kernel support available
provides POSIX primitives (mutexes, conditions,
semaphores, etc.)
All internal synchronisation built on signals
lots of play with signal masks
one special (manager) thread used internally to managethread state and synchronisation
race conditions not always handled (not enough kernelsupport)
Hardware SupportBusy-waiting Synchronisation
High-level Synchronisation PrimitivesSome examples with Linux
Old Linux libpthreadNew POSIX Thread Library
NPTL N POSIX Th d Lib
http://find/http://goback/ -
8/14/2019 02 Progpar Thread
59/76
NPTL: New POSIX Thread Library
New Linux kernel thread library
requires new kernel support (available from Linux 2.6)
specific support in the libc
a lot more efficientfully POSIX compliant
Internal synchronisation based on futex
new kernel objectmutex/condition/semaphore can be fully handled in userspace unless there is contention
A Brief Overview of MPIMixing Threads and Communication in HPC
http://find/http://goback/ -
8/14/2019 02 Progpar Thread
60/76
Part IV
Multithreading and Networking
A Brief Overview of MPIMixing Threads and Communication in HPC
Outlines: Multithreading and Networking
http://find/http://goback/ -
8/14/2019 02 Progpar Thread
61/76
Outlines: Multithreading and Networking
12 A Brief Overview of MPI
13 Mixing Threads and Communication in HPCProblems arisesDiscussion about solution
A Brief Overview of MPIMixing Threads and Communication in HPC
http://find/http://goback/ -
8/14/2019 02 Progpar Thread
62/76
Standard in industry
frequently used by engineers, physicians, etc.MPI2 begin to be available
See MPI presentation
A Brief Overview of MPIMixing Threads and Communication in HPC
Problems arisesDiscussion about solution
Example of problems with MPI
http://find/http://goback/ -
8/14/2019 02 Progpar Thread
63/76
Example of problems with MPI
Token circulation while com-puting on 4 nodes
if (mynode!=0)
MPI_Recv();
req=MPI_Isend(next);
Work(); /* about 1s */
MPI_Wait(req);
if (mynode==0)
MPI_Recv();
http://find/http://goback/ -
8/14/2019 02 Progpar Thread
64/76
A Brief Overview of MPIMixing Threads and Communication in HPC
Problems arisesDiscussion about solution
Example of problems with MPI
-
8/14/2019 02 Progpar Thread
65/76
Example of problems with MPI
Recv0 1 2
Isend
Calculus
req
Recv
Calculus
Wait
Isend
data
Recv
Calculus
Wait
Isend
data
req
ack
ack
req
Ignored messageNo reactivity
Token circulation while com-puting on 4 nodes
if (mynode!=0)
MPI_Recv();
req=MPI_Isend(next);
Work(); /* about 1s */
MPI_Wait(req);
if (mynode==0)
MPI_Recv();
expected time: ~ 1 s
observed time: ~ 4 s
A Brief Overview of MPIMixing Threads and Communication in HPC
Problems arisesDiscussion about solution
Improving MPI reactivity
http://find/http://goback/ -
8/14/2019 02 Progpar Thread
66/76
Improving MPI reactivity
Possible solutionsadd calls to MPI_test() in the code
using a multithreaded MPI version+ parallelism, communication progression independent from
computations- busy waiting synchronisation less efficient
- scrutation must be managed
A Brief Overview of MPIMixing Threads and Communication in HPC
Problems arisesDiscussion about solution
Integrate a scrutation server into the scheduler
http://find/http://goback/ -
8/14/2019 02 Progpar Thread
67/76
Integrate a scrutation server into the scheduler
Scheduler: required for optimal behavioursystem is known by the scheduler
it can choose the best strategy to use
Efficient and reactive scrutationless context switchesguarantee frequency
independent with respect to the number of threads in theapplication
instead of
http://find/http://goback/ -
8/14/2019 02 Progpar Thread
68/76
A Brief Overview of MPIMixing Threads and Communication in HPC
Problems arisesDiscussion about solution
Running the scrutation server
-
8/14/2019 02 Progpar Thread
69/76
Running the scrutation server
}
{
poll}
{
group
Callback functions:
Frequency- poll- group
init_server()
A
MPI call
threads schedulerUser level
Application
library
MPI
A Brief Overview of MPIMixing Threads and Communication in HPC
Problems arisesDiscussion about solution
Running the scrutation server
http://find/http://goback/ -
8/14/2019 02 Progpar Thread
70/76
Running the scrutation server
}
{
poll}
{
group
Req2
ev_wait()
ev_wait()
BC
Req1
A
MPI call
threads schedulerUser level
Application
library
MPI
A Brief Overview of MPIMixing Threads and Communication in HPC
Problems arisesDiscussion about solution
Running the scrutation server
http://find/http://goback/ -
8/14/2019 02 Progpar Thread
71/76
Running the scrutation server
}
{
poll}
{
group
Req2
BC
Req1
21
requestsAggregated
A
MPI call
threads schedulerUser level
Application
library
MPI
A Brief Overview of MPIMixing Threads and Communication in HPC
Problems arisesDiscussion about solution
Running the scrutation server
http://find/http://goback/ -
8/14/2019 02 Progpar Thread
72/76
u g t e sc utat o se e
}
{
poll}
{
group
Req2
BC
Req1
21
requestsAggregated
A
MPI call
threads schedulerUser level
Application
library
MPI
A Brief Overview of MPIMixing Threads and Communication in HPC
Problems arisesDiscussion about solution
Running the scrutation server
http://find/http://goback/ -
8/14/2019 02 Progpar Thread
73/76
g
}
{
poll}
{
group
BC
Req1
1
requestsAggregated
A
MPI call
threads schedulerUser level
Application
library
MPI
http://find/http://goback/ -
8/14/2019 02 Progpar Thread
74/76
Part V
Conclusion
Outlines: Conclusion
http://find/http://goback/ -
8/14/2019 02 Progpar Thread
75/76
Conclusion
http://find/http://goback/ -
8/14/2019 02 Progpar Thread
76/76
Multi-threading
cannot be avoided in current HPC
directly or through languages/middlewaresdifficulties to get a efficient scheduling
no perfect universal schedulerthreads must be scheduled with respect to memory (NUMA)
threads and communications must be scheduled together
http://find/http://goback/