02 progpar thread

Upload: aatrish2001

Post on 30-May-2018

220 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/14/2019 02 Progpar Thread

    1/76

    Outlines

    How to Efficiently ProgramHigh Performance Architectures ?

    Arnaud LEGRAND, CR CNRS, LIG/INRIA/MescalJean-Louis ROCH, MCF ENSIMAG, LIG/INRIA/Moais

    Vincent DANJEAN, MCF UJF, LIG/INRIA/MoaisDerick KONDO, CR INRIA, LIG/INRIA/Mescal

    Jean-Franois MHAUT, PR UJF, LIG/INRIA/MescalBruno RAFFIN, CR INRIA, LIG/INRIA/MoaisAlexandre TERMIER, MCF UJF, LIG/Hadas

    Some slides come from Samuel THIBAULT, MCF Bordeaux, LaBRI/Runtime

    http://find/http://goback/
  • 8/14/2019 02 Progpar Thread

    2/76

  • 8/14/2019 02 Progpar Thread

    3/76

    Outlines

    Part I: High Performance ArchitecturesPart II: Parallelism and ThreadsPart III: SynchronisationPart IV: Multithreading and Networking

    High Performance Architectures

    1 Parallel Machines with Shared MemoryILP and multi-cores

    Symmetric Multi Processors

    2 Parallel Machines with Distributed MemoryClusters

    Grids

    3 Current Architectures in HPC

    http://find/http://goback/
  • 8/14/2019 02 Progpar Thread

    4/76

    Outlines

    Part I: High Performance ArchitecturesPart II: Parallelism and ThreadsPart III: SynchronisationPart IV: Multithreading and Networking

    Parallelism and Threads

    4 Introduction to Threads

    5 Kinds of threadsUser threads

    Kernel threadsMixed models

    6 User Threads and Blocking System CallsScheduler Activations

    7 Thread Programming InterfacePOSIX ThreadsLinux POSIX Threads LibrariesBasic POSIX Thread API

    http://find/http://goback/
  • 8/14/2019 02 Progpar Thread

    5/76

    Outlines

    Part I: High Performance ArchitecturesPart II: Parallelism and ThreadsPart III: SynchronisationPart IV: Multithreading and Networking

    Synchronisation

    8 Hardware Support

    9 Busy-waiting Synchronisation

    10 High-level Synchronisation PrimitivesSemaphoresMonitors

    11 Some examples with LinuxOld Linux libpthreadNew POSIX Thread Library

    P I Hi h P f A hi

    http://find/http://goback/
  • 8/14/2019 02 Progpar Thread

    6/76

    Outlines

    Part I: High Performance ArchitecturesPart II: Parallelism and ThreadsPart III: SynchronisationPart IV: Multithreading and Networking

    Multithreading and Networking

    12 A Brief Overview of MPI

    13 Mixing Threads and Communication in HPCProblems arises

    Discussion about solution

    http://find/http://goback/
  • 8/14/2019 02 Progpar Thread

    7/76

    Parallel Machines with Shared MemoryParallel Machines with Distributed Memory

    Current Architectures in HPC

    Part I

    High Performance Architectures

    http://find/http://goback/
  • 8/14/2019 02 Progpar Thread

    8/76

    Parallel Machines with Shared MemoryParallel Machines with Distributed Memory

    Current Architectures in HPC

    Outlines: High Performance Architectures

    1 Parallel Machines with Shared MemoryILP and multi-cores

    Symmetric Multi Processors

    2 Parallel Machines with Distributed MemoryClustersGrids

    3 Current Architectures in HPC

    http://find/http://goback/
  • 8/14/2019 02 Progpar Thread

    9/76

    Parallel Machines with Shared MemoryParallel Machines with Distributed Memory

    Current Architectures in HPC

    Parallel Architectures

    Two main kindsArchitectures with shared memory and architectures withdistributed memory.

    Multiprocessors

    P

    P

    P

    PMem

    Clusters

    Fast networkP Mem P Mem P Mem

    http://find/http://goback/
  • 8/14/2019 02 Progpar Thread

    10/76

    Parallel Machines with Shared MemoryParallel Machines with Distributed Memory

    Current Architectures in HPC

    ILP and multi-coresSymmetric Multi Processors

    Why several processors/cores ?

    Limits for monocore processors

    superscalar processors: instruction level parallelism

    frequency

    electrical power

    What to do with place available on chips ?

    caches (bigger and quicker)

    several series of registers (hyperthreaded processors)

    several series of cores (multi-core processors)

    all of that

    http://find/http://goback/
  • 8/14/2019 02 Progpar Thread

    11/76

    P ll l M hi i h Sh d M

  • 8/14/2019 02 Progpar Thread

    12/76

    Parallel Machines with Shared MemoryParallel Machines with Distributed Memory

    Current Architectures in HPC

    ILP and multi-coresSymmetric Multi Processors

    Towards more and morehierarchical computers

    P P M P P M

    P P M P P M

    P P M P P M

    M

    M

    M

    SMT(HyperThreading)

    Multi-core

    NUMA

    P ll l M hi ith Sh d M

    http://find/http://goback/
  • 8/14/2019 02 Progpar Thread

    13/76

    Parallel Machines with Shared MemoryParallel Machines with Distributed Memory

    Current Architectures in HPC

    ILP and multi-coresSymmetric Multi Processors

    AMD Quad-Core

    M

    P PL2 $

    P PL2 $ L2 $ L2 $

    L3 $

    P P L2

    P P L2 L2 L2

    L3

    ...

    Shared L3 cache

    NUMA factor ~1.1-1.5

    M

    P PL2 $

    P PL2 $ L2 $ L2 $

    L3 $M

    P PL2 $

    P PL2 $ L2 $ L2 $

    L3 $M

    Parallel Machines with Shared Memory

    http://find/http://goback/
  • 8/14/2019 02 Progpar Thread

    14/76

    Parallel Machines with Shared MemoryParallel Machines with Distributed Memory

    Current Architectures in HPC

    ILP and multi-coresSymmetric Multi Processors

    Intel Quad-Core

    P PL2 $

    P PL2 $

    MP P

    L2 $

    P PL2 $

    ...

    Hierarchical cache levelsP P

    L2 $

    P PL2 $

    Parallel Machines with Shared Memory

    http://find/http://goback/
  • 8/14/2019 02 Progpar Thread

    15/76

    Parallel Machines with Shared MemoryParallel Machines with Distributed Memory

    Current Architectures in HPC

    ILP and multi-coresSymmetric Multi Processors

    5

    dual-quad-core

    M

    P1 P5 P3 P7

    P0 P2P4 P6

    Intel

    Hierarchical cache levels

    Parallel Machines with Shared Memory

    http://find/http://goback/
  • 8/14/2019 02 Progpar Thread

    16/76

    Parallel Machines with Shared MemoryParallel Machines with Distributed Memory

    Current Architectures in HPC

    ClustersGrids

    Clusters

    Composed of a few to hundreds of machines

    often homogeneoussame processor, memory, etc.

    often linked with a high speed, low latency networkMyrinet, InfinityBand, Quadrix, etc.

    Biggest clusters can be split in several parts

    computing nodesI/O nodes

    front (interactive) node

    http://find/http://goback/
  • 8/14/2019 02 Progpar Thread

    17/76

    Parallel Machines with Shared Memory

  • 8/14/2019 02 Progpar Thread

    18/76

    a a e ac es t S a ed e o yParallel Machines with Distributed Memory

    Current Architectures in HPC

    Current Architectures in HPC

    Hierarchical ArchitecturesHT technology

    multi-core processor

    multi processors machine

    cluster of machines

    grid of clusters and individual machines

    Even more complexity

    computing on GPUrequire specialized codes but hardware far more powerful

    FPGAhardware can be specialized on demand

    still lots of work on interface programming here

    Introduction to ThreadsKinds of threads

    http://find/http://goback/
  • 8/14/2019 02 Progpar Thread

    19/76

    Kinds of threadsUser Threads and Blocking System Calls

    Thread Programming Interface

    Part II

    Parallelism and Threads

    http://find/http://goback/
  • 8/14/2019 02 Progpar Thread

    20/76

    Introduction to ThreadsKinds of threads

  • 8/14/2019 02 Progpar Thread

    21/76

    Kinds of threadsUser Threads and Blocking System Calls

    Thread Programming Interface

    Programming on Shared Memory Parallel Machines

    Using process

    Processors

    Operating system Resources management(files, memory, CPU, network, etc)

    Process

    Mem Mem Mem Mem

    Introduction to ThreadsKinds of threads

    http://find/http://goback/
  • 8/14/2019 02 Progpar Thread

    22/76

    Kinds of threadsUser Threads and Blocking System Calls

    Thread Programming Interface

    Programming on Shared Memory Parallel Machines

    Using threads

    Processors

    Operating system Resources management(files, memory, CPU, network, etc)

    Process

    Multithreaded process

    Memory

    Introduction to ThreadsKinds of threads

    http://find/http://goback/
  • 8/14/2019 02 Progpar Thread

    23/76

    Kinds of threadsUser Threads and Blocking System Calls

    Thread Programming Interface

    Introduction to Threads

    Why threads ?To take profit from shared memory parallel architectures

    SMP, hyperthreaded, multi-core, NUMA, etc. processorsfuture Intel processors: several hundreds cores

    To describe the parallelism within the applicationsindependent tasks, I/O overlap, etc.

    What will use threads ?User application codes

    directly (with thread libraries)POSIX API (IEEE POSIX 1003.1C norm) in C, C++, . . .

    with high-level programming languages (Ada, OpenMP, . . . )

    Middleware programming environments

    demonized tasks (garbage collector, . . . ), . . .

    Introduction to ThreadsKinds of threads

    User threadsK l th d

    http://find/http://goback/
  • 8/14/2019 02 Progpar Thread

    24/76

    User Threads and Blocking System CallsThread Programming Interface

    Kernel threadsMixed models

    User threads

    by the kernelEntities managed

    Operating system

    Processors

    Multithreaded process

    threads

    Kernel scheduler

    User level

    User scheduler

    Efficiency + Flexibility + SMP - Blocking syscalls -

    Introduction to ThreadsKinds of threads

    User threadsKernel threads

    http://find/http://goback/
  • 8/14/2019 02 Progpar Thread

    25/76

    User Threads and Blocking System CallsThread Programming Interface

    Kernel threadsMixed models

    Kernel threads

    by the kernelEntities managed

    Operating system

    Processors

    Multithreaded process

    threads

    Kernel scheduler

    Kernel level

    Efficiency - Flexibility - SMP + Blocking syscalls +

    Introduction to ThreadsKinds of threads

    User threadsKernel threads

    http://find/http://goback/
  • 8/14/2019 02 Progpar Thread

    26/76

    User Threads and Blocking System CallsThread Programming Interface

    Kernel threadsMixed models

    Mixed models

    by the kernelEntities managed

    Operating system

    Processors

    Multithreaded process

    threads

    Kernel scheduler

    User level

    User scheduler

    Efficiency + Flexibility + SMP + Blocking syscalls limited

    http://find/http://goback/
  • 8/14/2019 02 Progpar Thread

    27/76

    Introduction to ThreadsKinds of threads

    S CScheduler Activations

  • 8/14/2019 02 Progpar Thread

    28/76

    User Threads and Blocking System CallsThread Programming Interface

    Scheduler Activations

    User Threads and Blocking System Calls

    User level library

    Kernel scheduler

    User scheduler

    Mixed library

    Kernel scheduler

    User scheduler

    Introduction to ThreadsKinds of threads

    U Th d d Bl ki S t C llScheduler Activations

    http://find/http://goback/
  • 8/14/2019 02 Progpar Thread

    29/76

    User Threads and Blocking System CallsThread Programming Interface

    Sc edu e ct at o s

    Scheduler Activations

    Idea proposed by Anderson et al. (91)Dialogue (and not monologue) between the user and kernelschedulers

    the user scheduler uses system calls

    the kernel scheduler uses upcalls

    Upcalls

    Notify the application of scheduling kernel events

    Activationsa new structure to support upcallsa kinf of kernel thread or virtual processor

    creating and destruction managed by the kernel

    Introduction to ThreadsKinds of threads

    User Threads and Blocking System CallsScheduler Activations

    http://find/http://goback/
  • 8/14/2019 02 Progpar Thread

    30/76

    User Threads and Blocking System CallsThread Programming Interface

    Scheduler Activations

    Instead of:

    InterruptionExternal request

    spaceUser

    Kernel Time

    Introduction to ThreadsKinds of threads

    User Threads and Blocking System CallsScheduler Activations

    http://find/http://goback/
  • 8/14/2019 02 Progpar Thread

    31/76

    User Threads and Blocking System CallsThread Programming Interface

    Scheduler Activations

    Instead of:

    InterruptionExternal request

    spaceUser

    Kernel Time

    ...better use the following schema:

    Interruption

    Act. A

    External request

    spaceUser

    Kernel Time

    upcall(unblocked, preempted, new)

    Act. CAct. A (next)

    Act. B

    upcall(blocked, new)

    Introduction to ThreadsKinds of threads

    User Threads and Blocking System CallsScheduler Activations

    http://find/http://goback/
  • 8/14/2019 02 Progpar Thread

    32/76

    User Threads and Blocking System CallsThread Programming Interface

    Working principle

    managed by the kernel

    User-level

    Operating system

    Processors

    Multithreaded process

    Activations

    Kernel scheduler

    threads

    User scheduler

    Introduction to ThreadsKinds of threads

    User Threads and Blocking System CallsScheduler Activations

    http://find/http://goback/
  • 8/14/2019 02 Progpar Thread

    33/76

    User Threads and Blocking System CallsThread Programming Interface

    Working principle

    managed by the kernel

    User-level

    Operating system

    Processors

    Multithreaded process

    Activations

    Kernel scheduler

    threads

    Upcall(New)

    Introduction to ThreadsKinds of threads

    User Threads and Blocking System CallsScheduler Activations

    http://find/http://goback/
  • 8/14/2019 02 Progpar Thread

    34/76

    User Threads and Blocking System CallsThread Programming Interface

    Working principle

    managed by the kernel

    User-level

    Operating system

    Processors

    Multithreaded process

    Activations

    Kernel scheduler

    threads

    http://find/http://goback/
  • 8/14/2019 02 Progpar Thread

    35/76

    Introduction to ThreadsKinds of threads

    User Threads and Blocking System CallsScheduler Activations

  • 8/14/2019 02 Progpar Thread

    36/76

    g yThread Programming Interface

    Working principle

    managed by the kernel

    User-level

    Operating system

    Processors

    Multithreaded process

    Activations

    Kernel scheduler

    systemcall

    Blocking

    threads

    http://find/http://goback/
  • 8/14/2019 02 Progpar Thread

    37/76

    Introduction to ThreadsKinds of threadsUser Threads and Blocking System Calls

    Scheduler Activations

  • 8/14/2019 02 Progpar Thread

    38/76

    Thread Programming Interface

    Working principle

    managed by the kernel

    User-level

    Operating system

    Processors

    Multithreaded process

    Activations

    Kernel scheduler

    systemcall

    Blocking

    threads

    Introduction to ThreadsKinds of threadsUser Threads and Blocking System Calls

    Scheduler Activations

    http://find/http://goback/
  • 8/14/2019 02 Progpar Thread

    39/76

    Thread Programming Interface

    Working principle

    managed by the kernel

    User-level

    Operating system

    Processors

    Multithreaded process

    Activations

    Kernel scheduler

    systemcall

    Blocking

    threads

    Introduction to ThreadsKinds of threadsUser Threads and Blocking System Calls

    Th d P i I t f

    Scheduler Activations

    http://find/http://goback/
  • 8/14/2019 02 Progpar Thread

    40/76

    Thread Programming Interface

    Working principle

    managed by the kernel

    User-level

    Operating system

    Processors

    Multithreaded process

    Activations

    Kernel scheduler

    threads

    Introduction to ThreadsKinds of threadsUser Threads and Blocking System Calls

    Thread Programming Interface

    Scheduler Activations

    http://find/http://goback/
  • 8/14/2019 02 Progpar Thread

    41/76

    Thread Programming Interface

    Working principle

    managed by the kernel

    User-level

    Operating system

    Processors

    Multithreaded process

    Activations

    Kernel scheduler

    threads

    Upcall(Unblocked,Preempted,New)

    Introduction to ThreadsKinds of threadsUser Threads and Blocking System Calls

    Thread Programming Interface

    POSIX ThreadsLinux POSIX Threads LibrariesBasic POSIX Thread API

    http://find/http://goback/
  • 8/14/2019 02 Progpar Thread

    42/76

    Thread Programming Interface

    Normalisation of the thread interface

    Before the normeach Unix had its (slightly) incompatible interface

    but same kinds of features was present

    POSIX normalisationIEEE POSIX 1003.1C norm (also called POSIX threadsnorm)Only the API is normalised (not the ABI)

    POSIX thread libraries can easily be switched at source

    level but not at runtimePOSIX threads own

    processor registers, stack, etc.signal mask

    POSIX threads can be of any kind (user, kernel, etc.)

    Introduction to ThreadsKinds of threadsUser Threads and Blocking System Calls

    Thread Programming Interface

    POSIX ThreadsLinux POSIX Threads LibrariesBasic POSIX Thread API

    http://find/http://goback/
  • 8/14/2019 02 Progpar Thread

    43/76

    Thread Programming Interface

    Linux POSIX Threads Libraries

    LinuxThread (1996) : kernel level, Linux standard threadlibrary for a long time, not fully POSIX compliant

    GNU-Pth (1999) : user level, portable, POSIXNGPT (2002) : mixed, based on GNU-Pth, POSIX, notdeveloped anymore

    NPTL (2002) : kernel level, POSIX, current Linuxstandard thread library

    PM2/Marcel (2001) : mixed, POSIX compliant, lots ofextensions for HPC (scheduling control, etc.)

    Introduction to ThreadsKinds of threadsUser Threads and Blocking System Calls

    Thread Programming Interface

    POSIX ThreadsLinux POSIX Threads LibrariesBasic POSIX Thread API

    http://find/http://goback/
  • 8/14/2019 02 Progpar Thread

    44/76

    Thread Programming Interface

    Basic POSIX Thread API

    Creation/destructionint pthread_create(pthread_t *thread, const

    pthread_attr_t *attr, void

    *(*start_routine)(void*), void *arg)

    void pthread_exit(void*

    value_ptr)

    int pthread_join(pthread_t thread, void

    **value_ptr)

    Synchronisation (semaphores)

    int sem_init(sem_t *sem, int pshared, unsignedint value)

    int sem_wait(sem_t *sem)

    int sem_post(sem_t *sem)

    int sem_destroy(sem_t*

    sem)

    Introduction to ThreadsKinds of threadsUser Threads and Blocking System Calls

    Thread Programming Interface

    POSIX ThreadsLinux POSIX Threads LibrariesBasic POSIX Thread API

    http://find/http://goback/
  • 8/14/2019 02 Progpar Thread

    45/76

    Thread Programming Interface

    Basic POSIX Thread API (2)

    Synchronisation (mutex)int pthread_mutex_init(pthread_mutex_t *mutex,

    const pthread_mutexattr_t *attr)

    int pthread_mutex_lock(pthread_mutex_t *mutex)

    int pthread_mutex_unlock(pthread_mutex_t

    *mutex)

    int pthread_mutex_destroy(pthread_mutex_t

    *mutex)

    Synchronisation (conditions)int pthread_cond_init(pthread_cond_t *cond,

    const pthread_condattr_t *attr)

    int pthread_cond_wait(pthread_cond_t *cond,

    pthread_mutex_t *mutex)

    int pthread_cond_signal(pthread_cond_t *cond)

    Introduction to ThreadsKinds of threadsUser Threads and Blocking System Calls

    Thread Programming Interface

    POSIX ThreadsLinux POSIX Threads LibrariesBasic POSIX Thread API

    http://find/http://goback/
  • 8/14/2019 02 Progpar Thread

    46/76

    g g

    Basic POSIX Thread API (3)

    Per thread dataint pthread_key_create(pthread_key_t *key, void

    (*destr_function) (void*))

    int pthread_key_delete(pthread_key_t key)

    int pthread_setspecific(pthread_key_t key,

    const void *pointer)

    void * pthread_getspecific(pthread_key_t key)

    Introduction to ThreadsKinds of threadsUser Threads and Blocking System Calls

    Thread Programming Interface

    POSIX ThreadsLinux POSIX Threads LibrariesBasic POSIX Thread API

    http://find/http://goback/
  • 8/14/2019 02 Progpar Thread

    47/76

    g g

    Basic POSIX Thread API (3)

    Per thread dataint pthread_key_create(pthread_key_t *key, void

    (*destr_function) (void*))

    int pthread_key_delete(pthread_key_t key)

    int pthread_setspecific(pthread_key_t key,

    const void *pointer)

    void * pthread_getspecific(pthread_key_t key)

    The new __thread C keyword

    used for a global per-thread variableneed support from the compiler and the linker at compiletime and execute time

    libraries can have efficient per-thread variables without

    disturbing the application Hardware SupportBusy-waiting Synchronisation

    High-level Synchronisation PrimitivesSome examples with Linux

    http://find/http://goback/
  • 8/14/2019 02 Progpar Thread

    48/76

    Part III

    Synchronisation

    Hardware SupportBusy-waiting SynchronisationHigh-level Synchronisation Primitives

    Some examples with Linux

    http://find/http://goback/
  • 8/14/2019 02 Progpar Thread

    49/76

    Outlines: Synchronisation

    8 Hardware Support

    9 Busy-waiting Synchronisation

    10 High-level Synchronisation PrimitivesSemaphoresMonitors

    11 Some examples with LinuxOld Linux libpthreadNew POSIX Thread Library

    Hardware SupportBusy-waiting SynchronisationHigh-level Synchronisation Primitives

    Some examples with Linux

    http://find/http://goback/
  • 8/14/2019 02 Progpar Thread

    50/76

    Hardware Support

    What happens with incrementations in parallel?

    var++; var++;

    Hardware SupportBusy-waiting SynchronisationHigh-level Synchronisation Primitives

    Some examples with Linux

    http://find/http://goback/
  • 8/14/2019 02 Progpar Thread

    51/76

    Hardware Support

    What happens with incrementations in parallel?for (i=0; i

  • 8/14/2019 02 Progpar Thread

    52/76

    Hardware SupportBusy-waiting SynchronisationHigh-level Synchronisation Primitives

    Some examples with Linux

  • 8/14/2019 02 Progpar Thread

    53/76

    Critical section with busy waiting

    Example of code

    while (! TAS(&var))

    ;

    /* in critical section */

    var=0;

    Hardware SupportBusy-waiting Synchronisation

    High-level Synchronisation PrimitivesSome examples with Linux

    http://find/http://goback/
  • 8/14/2019 02 Progpar Thread

    54/76

    Critical section with busy waiting

    Example of code

    while (! TAS(&var))

    while (var) ;

    /* in critical section */

    var=0;

    Hardware SupportBusy-waiting Synchronisation

    High-level Synchronisation PrimitivesSome examples with Linux

    http://find/http://goback/
  • 8/14/2019 02 Progpar Thread

    55/76

    Critical section with busy waiting

    Example of code

    while (! TAS(&var))

    while (var) ;

    /* in critical section */

    var=0;

    Busy waiting

    + very reactive

    + no OS or lib support required

    - use a processor while not doing anything

    - does not scale if there are lots of waiters

    http://find/http://goback/
  • 8/14/2019 02 Progpar Thread

    56/76

    Hardware SupportBusy-waiting Synchronisation

    High-level Synchronisation PrimitivesSome examples with Linux

    SemaphoresMonitors

    M i

  • 8/14/2019 02 Progpar Thread

    57/76

    Monitors

    MutexTwo states: locked or not

    Two methods:lock(m) take the mutex

    unlock(m) release the mutex (must be done by thethread owning the mutex)

    Conditions

    waiting thread list (conditions are not related with tests)

    Three methods:wait(c, m) sleep on the condition. The mutex is released

    atomically during the wait.signal(c) one sleeping thread is wake up

    broadcast(c) all sleeping threads are wake up Hardware Support

    Busy-waiting SynchronisationHigh-level Synchronisation Primitives

    Some examples with Linux

    Old Linux libpthreadNew POSIX Thread Library

    Old Li lib th d

    http://find/http://goback/
  • 8/14/2019 02 Progpar Thread

    58/76

    Old Linux libpthread

    First Linux kernel thread library

    limited kernel support available

    provides POSIX primitives (mutexes, conditions,

    semaphores, etc.)

    All internal synchronisation built on signals

    lots of play with signal masks

    one special (manager) thread used internally to managethread state and synchronisation

    race conditions not always handled (not enough kernelsupport)

    Hardware SupportBusy-waiting Synchronisation

    High-level Synchronisation PrimitivesSome examples with Linux

    Old Linux libpthreadNew POSIX Thread Library

    NPTL N POSIX Th d Lib

    http://find/http://goback/
  • 8/14/2019 02 Progpar Thread

    59/76

    NPTL: New POSIX Thread Library

    New Linux kernel thread library

    requires new kernel support (available from Linux 2.6)

    specific support in the libc

    a lot more efficientfully POSIX compliant

    Internal synchronisation based on futex

    new kernel objectmutex/condition/semaphore can be fully handled in userspace unless there is contention

    A Brief Overview of MPIMixing Threads and Communication in HPC

    http://find/http://goback/
  • 8/14/2019 02 Progpar Thread

    60/76

    Part IV

    Multithreading and Networking

    A Brief Overview of MPIMixing Threads and Communication in HPC

    Outlines: Multithreading and Networking

    http://find/http://goback/
  • 8/14/2019 02 Progpar Thread

    61/76

    Outlines: Multithreading and Networking

    12 A Brief Overview of MPI

    13 Mixing Threads and Communication in HPCProblems arisesDiscussion about solution

    A Brief Overview of MPIMixing Threads and Communication in HPC

    http://find/http://goback/
  • 8/14/2019 02 Progpar Thread

    62/76

    Standard in industry

    frequently used by engineers, physicians, etc.MPI2 begin to be available

    See MPI presentation

    A Brief Overview of MPIMixing Threads and Communication in HPC

    Problems arisesDiscussion about solution

    Example of problems with MPI

    http://find/http://goback/
  • 8/14/2019 02 Progpar Thread

    63/76

    Example of problems with MPI

    Token circulation while com-puting on 4 nodes

    if (mynode!=0)

    MPI_Recv();

    req=MPI_Isend(next);

    Work(); /* about 1s */

    MPI_Wait(req);

    if (mynode==0)

    MPI_Recv();

    http://find/http://goback/
  • 8/14/2019 02 Progpar Thread

    64/76

    A Brief Overview of MPIMixing Threads and Communication in HPC

    Problems arisesDiscussion about solution

    Example of problems with MPI

  • 8/14/2019 02 Progpar Thread

    65/76

    Example of problems with MPI

    Recv0 1 2

    Isend

    Calculus

    req

    Recv

    Calculus

    Wait

    Isend

    data

    Recv

    Calculus

    Wait

    Isend

    data

    req

    ack

    ack

    req

    Ignored messageNo reactivity

    Token circulation while com-puting on 4 nodes

    if (mynode!=0)

    MPI_Recv();

    req=MPI_Isend(next);

    Work(); /* about 1s */

    MPI_Wait(req);

    if (mynode==0)

    MPI_Recv();

    expected time: ~ 1 s

    observed time: ~ 4 s

    A Brief Overview of MPIMixing Threads and Communication in HPC

    Problems arisesDiscussion about solution

    Improving MPI reactivity

    http://find/http://goback/
  • 8/14/2019 02 Progpar Thread

    66/76

    Improving MPI reactivity

    Possible solutionsadd calls to MPI_test() in the code

    using a multithreaded MPI version+ parallelism, communication progression independent from

    computations- busy waiting synchronisation less efficient

    - scrutation must be managed

    A Brief Overview of MPIMixing Threads and Communication in HPC

    Problems arisesDiscussion about solution

    Integrate a scrutation server into the scheduler

    http://find/http://goback/
  • 8/14/2019 02 Progpar Thread

    67/76

    Integrate a scrutation server into the scheduler

    Scheduler: required for optimal behavioursystem is known by the scheduler

    it can choose the best strategy to use

    Efficient and reactive scrutationless context switchesguarantee frequency

    independent with respect to the number of threads in theapplication

    instead of

    http://find/http://goback/
  • 8/14/2019 02 Progpar Thread

    68/76

    A Brief Overview of MPIMixing Threads and Communication in HPC

    Problems arisesDiscussion about solution

    Running the scrutation server

  • 8/14/2019 02 Progpar Thread

    69/76

    Running the scrutation server

    }

    {

    poll}

    {

    group

    Callback functions:

    Frequency- poll- group

    init_server()

    A

    MPI call

    threads schedulerUser level

    Application

    library

    MPI

    A Brief Overview of MPIMixing Threads and Communication in HPC

    Problems arisesDiscussion about solution

    Running the scrutation server

    http://find/http://goback/
  • 8/14/2019 02 Progpar Thread

    70/76

    Running the scrutation server

    }

    {

    poll}

    {

    group

    Req2

    ev_wait()

    ev_wait()

    BC

    Req1

    A

    MPI call

    threads schedulerUser level

    Application

    library

    MPI

    A Brief Overview of MPIMixing Threads and Communication in HPC

    Problems arisesDiscussion about solution

    Running the scrutation server

    http://find/http://goback/
  • 8/14/2019 02 Progpar Thread

    71/76

    Running the scrutation server

    }

    {

    poll}

    {

    group

    Req2

    BC

    Req1

    21

    requestsAggregated

    A

    MPI call

    threads schedulerUser level

    Application

    library

    MPI

    A Brief Overview of MPIMixing Threads and Communication in HPC

    Problems arisesDiscussion about solution

    Running the scrutation server

    http://find/http://goback/
  • 8/14/2019 02 Progpar Thread

    72/76

    u g t e sc utat o se e

    }

    {

    poll}

    {

    group

    Req2

    BC

    Req1

    21

    requestsAggregated

    A

    MPI call

    threads schedulerUser level

    Application

    library

    MPI

    A Brief Overview of MPIMixing Threads and Communication in HPC

    Problems arisesDiscussion about solution

    Running the scrutation server

    http://find/http://goback/
  • 8/14/2019 02 Progpar Thread

    73/76

    g

    }

    {

    poll}

    {

    group

    BC

    Req1

    1

    requestsAggregated

    A

    MPI call

    threads schedulerUser level

    Application

    library

    MPI

    http://find/http://goback/
  • 8/14/2019 02 Progpar Thread

    74/76

    Part V

    Conclusion

    Outlines: Conclusion

    http://find/http://goback/
  • 8/14/2019 02 Progpar Thread

    75/76

    Conclusion

    http://find/http://goback/
  • 8/14/2019 02 Progpar Thread

    76/76

    Multi-threading

    cannot be avoided in current HPC

    directly or through languages/middlewaresdifficulties to get a efficient scheduling

    no perfect universal schedulerthreads must be scheduled with respect to memory (NUMA)

    threads and communications must be scheduled together

    http://find/http://goback/