![Page 2: Introduction to Parallel Computing - UCLouvain · Introduction to Parallel Computing damien.francois@uclouvain.be October 2015. 2 Why parallel? Speed up – Solve a problem faster](https://reader035.vdocuments.net/reader035/viewer/2022062506/5fbaddb1452b8f71a0691af3/html5/thumbnails/2.jpg)
2
Why parallel?
Speed up – Solve a problem faster→ more processing power
(a.k.a. strong scaling)
Scale up – Solve a larger problem→ more memory and network capacity
(a.k.a. weak scaling)
Scale out – Solve many problems→ more storage capacity
![Page 3: Introduction to Parallel Computing - UCLouvain · Introduction to Parallel Computing damien.francois@uclouvain.be October 2015. 2 Why parallel? Speed up – Solve a problem faster](https://reader035.vdocuments.net/reader035/viewer/2022062506/5fbaddb1452b8f71a0691af3/html5/thumbnails/3.jpg)
3
Agenda
1. General concepts
2. Hardware
3. Programming models
4. User tools
![Page 4: Introduction to Parallel Computing - UCLouvain · Introduction to Parallel Computing damien.francois@uclouvain.be October 2015. 2 Why parallel? Speed up – Solve a problem faster](https://reader035.vdocuments.net/reader035/viewer/2022062506/5fbaddb1452b8f71a0691af3/html5/thumbnails/4.jpg)
4
1.
General concepts
![Page 5: Introduction to Parallel Computing - UCLouvain · Introduction to Parallel Computing damien.francois@uclouvain.be October 2015. 2 Why parallel? Speed up – Solve a problem faster](https://reader035.vdocuments.net/reader035/viewer/2022062506/5fbaddb1452b8f71a0691af3/html5/thumbnails/5.jpg)
5
Amdahl's Law
https://en.wikipedia.org/wiki/Amdahl%27s_law
![Page 6: Introduction to Parallel Computing - UCLouvain · Introduction to Parallel Computing damien.francois@uclouvain.be October 2015. 2 Why parallel? Speed up – Solve a problem faster](https://reader035.vdocuments.net/reader035/viewer/2022062506/5fbaddb1452b8f71a0691af3/html5/thumbnails/6.jpg)
6
Gustafson's law
https://en.wikipedia.org/wiki/Gustafson%27s_law
“Scaling can be linear anyway providing sufficiently enough data are used.”
![Page 7: Introduction to Parallel Computing - UCLouvain · Introduction to Parallel Computing damien.francois@uclouvain.be October 2015. 2 Why parallel? Speed up – Solve a problem faster](https://reader035.vdocuments.net/reader035/viewer/2022062506/5fbaddb1452b8f71a0691af3/html5/thumbnails/7.jpg)
7
Parallel overhead
https://computing.llnl.gov/tutorials/parallel_comp/images/helloWorldParallelCallgraph.gif
![Page 8: Introduction to Parallel Computing - UCLouvain · Introduction to Parallel Computing damien.francois@uclouvain.be October 2015. 2 Why parallel? Speed up – Solve a problem faster](https://reader035.vdocuments.net/reader035/viewer/2022062506/5fbaddb1452b8f71a0691af3/html5/thumbnails/8.jpg)
8
How parallel?
● Parallelization means
– distributing work to processors
– distributing data (if memory is distributed)● and
– synchronization of the distributed work
– communication of remote data to local processor (if memory is distributed)
![Page 9: Introduction to Parallel Computing - UCLouvain · Introduction to Parallel Computing damien.francois@uclouvain.be October 2015. 2 Why parallel? Speed up – Solve a problem faster](https://reader035.vdocuments.net/reader035/viewer/2022062506/5fbaddb1452b8f71a0691af3/html5/thumbnails/9.jpg)
9
Decomposition
● Work decomposition : task-level parallelism
● Data decomposition : data-level parallelism
● Domain decomposition : decomposition of work and data is done in a higher model, e.g. in the reality
![Page 10: Introduction to Parallel Computing - UCLouvain · Introduction to Parallel Computing damien.francois@uclouvain.be October 2015. 2 Why parallel? Speed up – Solve a problem faster](https://reader035.vdocuments.net/reader035/viewer/2022062506/5fbaddb1452b8f71a0691af3/html5/thumbnails/10.jpg)
10
Collaboration
● Synchronous (SIMD) at the processor level
● Fine-grained parallelism if subtasks must communicate many times per second (instruction level); loosely synchronous
● Coarse-grained parallelism if they do not communicate many times per second (function-call level)
● Embarrassingly parallel if they rarely or never have to communicate (asynchronous)
![Page 11: Introduction to Parallel Computing - UCLouvain · Introduction to Parallel Computing damien.francois@uclouvain.be October 2015. 2 Why parallel? Speed up – Solve a problem faster](https://reader035.vdocuments.net/reader035/viewer/2022062506/5fbaddb1452b8f71a0691af3/html5/thumbnails/11.jpg)
11
Parallel programming paradigms
● Task-farming (master/slave or work stealing)
● Pipelining (A->B->C, one process per task concurrently)
● SPMD (pre-defined number of processes created)
● Divide and Conquer (processes spawned at need and report their result to the parent)
● Speculative parallelism (processes spawned and result possibly discarded)
![Page 12: Introduction to Parallel Computing - UCLouvain · Introduction to Parallel Computing damien.francois@uclouvain.be October 2015. 2 Why parallel? Speed up – Solve a problem faster](https://reader035.vdocuments.net/reader035/viewer/2022062506/5fbaddb1452b8f71a0691af3/html5/thumbnails/12.jpg)
12
2.
Hardware
![Page 13: Introduction to Parallel Computing - UCLouvain · Introduction to Parallel Computing damien.francois@uclouvain.be October 2015. 2 Why parallel? Speed up – Solve a problem faster](https://reader035.vdocuments.net/reader035/viewer/2022062506/5fbaddb1452b8f71a0691af3/html5/thumbnails/13.jpg)
13
At the processor level
● Instruction-level parallelism (ILP)
– Instruction pipelining
– Superscalar execution
– Out-of-order execution
– Speculative execution● Single Instruction Multiple
Data (SIMD)
![Page 14: Introduction to Parallel Computing - UCLouvain · Introduction to Parallel Computing damien.francois@uclouvain.be October 2015. 2 Why parallel? Speed up – Solve a problem faster](https://reader035.vdocuments.net/reader035/viewer/2022062506/5fbaddb1452b8f71a0691af3/html5/thumbnails/14.jpg)
14
At the computer level
● Multithreading
– SMP
– NUMA● Accelerators
![Page 15: Introduction to Parallel Computing - UCLouvain · Introduction to Parallel Computing damien.francois@uclouvain.be October 2015. 2 Why parallel? Speed up – Solve a problem faster](https://reader035.vdocuments.net/reader035/viewer/2022062506/5fbaddb1452b8f71a0691af3/html5/thumbnails/15.jpg)
15
At the system level
Distributed computing – Grids – Clusters – Cloud
![Page 16: Introduction to Parallel Computing - UCLouvain · Introduction to Parallel Computing damien.francois@uclouvain.be October 2015. 2 Why parallel? Speed up – Solve a problem faster](https://reader035.vdocuments.net/reader035/viewer/2022062506/5fbaddb1452b8f71a0691af3/html5/thumbnails/16.jpg)
16
Distributed computing
![Page 17: Introduction to Parallel Computing - UCLouvain · Introduction to Parallel Computing damien.francois@uclouvain.be October 2015. 2 Why parallel? Speed up – Solve a problem faster](https://reader035.vdocuments.net/reader035/viewer/2022062506/5fbaddb1452b8f71a0691af3/html5/thumbnails/17.jpg)
17
Cluster computing
![Page 18: Introduction to Parallel Computing - UCLouvain · Introduction to Parallel Computing damien.francois@uclouvain.be October 2015. 2 Why parallel? Speed up – Solve a problem faster](https://reader035.vdocuments.net/reader035/viewer/2022062506/5fbaddb1452b8f71a0691af3/html5/thumbnails/18.jpg)
18
Grid computing
![Page 19: Introduction to Parallel Computing - UCLouvain · Introduction to Parallel Computing damien.francois@uclouvain.be October 2015. 2 Why parallel? Speed up – Solve a problem faster](https://reader035.vdocuments.net/reader035/viewer/2022062506/5fbaddb1452b8f71a0691af3/html5/thumbnails/19.jpg)
19
Cloud computing
![Page 20: Introduction to Parallel Computing - UCLouvain · Introduction to Parallel Computing damien.francois@uclouvain.be October 2015. 2 Why parallel? Speed up – Solve a problem faster](https://reader035.vdocuments.net/reader035/viewer/2022062506/5fbaddb1452b8f71a0691af3/html5/thumbnails/20.jpg)
20
3.
Programming models
![Page 21: Introduction to Parallel Computing - UCLouvain · Introduction to Parallel Computing damien.francois@uclouvain.be October 2015. 2 Why parallel? Speed up – Solve a problem faster](https://reader035.vdocuments.net/reader035/viewer/2022062506/5fbaddb1452b8f71a0691af3/html5/thumbnails/21.jpg)
21
Programming models
● CUDA, OpenCL
● PThreads, OpenMP, TBB
● MPI
● CoArray, UPC
● MapReduce
● Boinc
![Page 22: Introduction to Parallel Computing - UCLouvain · Introduction to Parallel Computing damien.francois@uclouvain.be October 2015. 2 Why parallel? Speed up – Solve a problem faster](https://reader035.vdocuments.net/reader035/viewer/2022062506/5fbaddb1452b8f71a0691af3/html5/thumbnails/22.jpg)
22
CUDA, OpenCL (own session)
● Shared memory
● Single Machine / Multiple devices
● Work decomposition
● Single Program Multiple Data
● Synchronous (SIMD)
![Page 23: Introduction to Parallel Computing - UCLouvain · Introduction to Parallel Computing damien.francois@uclouvain.be October 2015. 2 Why parallel? Speed up – Solve a problem faster](https://reader035.vdocuments.net/reader035/viewer/2022062506/5fbaddb1452b8f71a0691af3/html5/thumbnails/23.jpg)
23
CUDA (C/C++/Fortran)
http://computer-graphics.se/hello-world-for-cuda.html
![Page 24: Introduction to Parallel Computing - UCLouvain · Introduction to Parallel Computing damien.francois@uclouvain.be October 2015. 2 Why parallel? Speed up – Solve a problem faster](https://reader035.vdocuments.net/reader035/viewer/2022062506/5fbaddb1452b8f71a0691af3/html5/thumbnails/24.jpg)
24
OpenCL (C/C++)
http://www.fixstars.com/en/opencl/book/OpenCLProgrammingBook/first-opencl-program/
![Page 25: Introduction to Parallel Computing - UCLouvain · Introduction to Parallel Computing damien.francois@uclouvain.be October 2015. 2 Why parallel? Speed up – Solve a problem faster](https://reader035.vdocuments.net/reader035/viewer/2022062506/5fbaddb1452b8f71a0691af3/html5/thumbnails/25.jpg)
25
Pthreads, OpenMP (own session)
● Shared memory
● Single machine
● Work decomposition
● Single Program Multiple Data
● Fine-grained/Coarse-grained parallelism
![Page 26: Introduction to Parallel Computing - UCLouvain · Introduction to Parallel Computing damien.francois@uclouvain.be October 2015. 2 Why parallel? Speed up – Solve a problem faster](https://reader035.vdocuments.net/reader035/viewer/2022062506/5fbaddb1452b8f71a0691af3/html5/thumbnails/26.jpg)
26
Pthreads
https://en.wikipedia.org/wiki/POSIX_Threads
![Page 27: Introduction to Parallel Computing - UCLouvain · Introduction to Parallel Computing damien.francois@uclouvain.be October 2015. 2 Why parallel? Speed up – Solve a problem faster](https://reader035.vdocuments.net/reader035/viewer/2022062506/5fbaddb1452b8f71a0691af3/html5/thumbnails/27.jpg)
27
OpenMP (C/Fortran)
https://en.wikipedia.org/wiki/OpenMP
![Page 28: Introduction to Parallel Computing - UCLouvain · Introduction to Parallel Computing damien.francois@uclouvain.be October 2015. 2 Why parallel? Speed up – Solve a problem faster](https://reader035.vdocuments.net/reader035/viewer/2022062506/5fbaddb1452b8f71a0691af3/html5/thumbnails/28.jpg)
28
Threading Building Blocks
● Shared memory
● Single machine
● Data decomposition, Work decomposition
● Single Program Multiple Data
● Fine-grained parallelism
![Page 29: Introduction to Parallel Computing - UCLouvain · Introduction to Parallel Computing damien.francois@uclouvain.be October 2015. 2 Why parallel? Speed up – Solve a problem faster](https://reader035.vdocuments.net/reader035/viewer/2022062506/5fbaddb1452b8f71a0691af3/html5/thumbnails/29.jpg)
29
Threading Building Blocks (C++)
http://www.ibm.com/developerworks/aix/library/au-intelthreadbuilding/
![Page 30: Introduction to Parallel Computing - UCLouvain · Introduction to Parallel Computing damien.francois@uclouvain.be October 2015. 2 Why parallel? Speed up – Solve a problem faster](https://reader035.vdocuments.net/reader035/viewer/2022062506/5fbaddb1452b8f71a0691af3/html5/thumbnails/30.jpg)
30
Message Passing Interface
● Distributed memory
● Multi-machine (Cluster computing)
● Data decomposition, Work decomposition
● Single Program Multiple Data
● Coarse-grained parallelism
(own session)
![Page 31: Introduction to Parallel Computing - UCLouvain · Introduction to Parallel Computing damien.francois@uclouvain.be October 2015. 2 Why parallel? Speed up – Solve a problem faster](https://reader035.vdocuments.net/reader035/viewer/2022062506/5fbaddb1452b8f71a0691af3/html5/thumbnails/31.jpg)
31
MPI (C/Fortran)
https://en.wikipedia.org/wiki/Message_Passing_Interface
![Page 32: Introduction to Parallel Computing - UCLouvain · Introduction to Parallel Computing damien.francois@uclouvain.be October 2015. 2 Why parallel? Speed up – Solve a problem faster](https://reader035.vdocuments.net/reader035/viewer/2022062506/5fbaddb1452b8f71a0691af3/html5/thumbnails/32.jpg)
32
Partitioned global address space
● Shared memory
● Multi-machine (Cluster computing)
● Data decomposition, Work decomposition
● Single Program Multiple Data
● Coarse-grained parallelism
![Page 33: Introduction to Parallel Computing - UCLouvain · Introduction to Parallel Computing damien.francois@uclouvain.be October 2015. 2 Why parallel? Speed up – Solve a problem faster](https://reader035.vdocuments.net/reader035/viewer/2022062506/5fbaddb1452b8f71a0691af3/html5/thumbnails/33.jpg)
33
Co-Array Fortran
https://en.wikipedia.org/wiki/Coarray_Fortran
![Page 34: Introduction to Parallel Computing - UCLouvain · Introduction to Parallel Computing damien.francois@uclouvain.be October 2015. 2 Why parallel? Speed up – Solve a problem faster](https://reader035.vdocuments.net/reader035/viewer/2022062506/5fbaddb1452b8f71a0691af3/html5/thumbnails/34.jpg)
34
UPC (C)
http://www.upc.mtu.edu/tutorials/helloworld.html
![Page 35: Introduction to Parallel Computing - UCLouvain · Introduction to Parallel Computing damien.francois@uclouvain.be October 2015. 2 Why parallel? Speed up – Solve a problem faster](https://reader035.vdocuments.net/reader035/viewer/2022062506/5fbaddb1452b8f71a0691af3/html5/thumbnails/35.jpg)
35
MapReduce, Boinc
● Distributed memory
● Multi-machine (Distributed computing, Cloud computing)
● Data decomposition
● Divide and Conquer
● Embarrassingly parallel
![Page 36: Introduction to Parallel Computing - UCLouvain · Introduction to Parallel Computing damien.francois@uclouvain.be October 2015. 2 Why parallel? Speed up – Solve a problem faster](https://reader035.vdocuments.net/reader035/viewer/2022062506/5fbaddb1452b8f71a0691af3/html5/thumbnails/36.jpg)
36
Map/Reduce (Java, Python, R, ...)
http://discoproject.org/
![Page 37: Introduction to Parallel Computing - UCLouvain · Introduction to Parallel Computing damien.francois@uclouvain.be October 2015. 2 Why parallel? Speed up – Solve a problem faster](https://reader035.vdocuments.net/reader035/viewer/2022062506/5fbaddb1452b8f71a0691af3/html5/thumbnails/37.jpg)
37
Boinc (C)
http://www.spy-hill.com/help/boinc/hello.html
![Page 38: Introduction to Parallel Computing - UCLouvain · Introduction to Parallel Computing damien.francois@uclouvain.be October 2015. 2 Why parallel? Speed up – Solve a problem faster](https://reader035.vdocuments.net/reader035/viewer/2022062506/5fbaddb1452b8f71a0691af3/html5/thumbnails/38.jpg)
38
4.
User tools when you cannot modify the program
![Page 39: Introduction to Parallel Computing - UCLouvain · Introduction to Parallel Computing damien.francois@uclouvain.be October 2015. 2 Why parallel? Speed up – Solve a problem faster](https://reader035.vdocuments.net/reader035/viewer/2022062506/5fbaddb1452b8f71a0691af3/html5/thumbnails/39.jpg)
39
Parallel processes in Bash
![Page 40: Introduction to Parallel Computing - UCLouvain · Introduction to Parallel Computing damien.francois@uclouvain.be October 2015. 2 Why parallel? Speed up – Solve a problem faster](https://reader035.vdocuments.net/reader035/viewer/2022062506/5fbaddb1452b8f71a0691af3/html5/thumbnails/40.jpg)
40
Parallel processes in Bash
![Page 41: Introduction to Parallel Computing - UCLouvain · Introduction to Parallel Computing damien.francois@uclouvain.be October 2015. 2 Why parallel? Speed up – Solve a problem faster](https://reader035.vdocuments.net/reader035/viewer/2022062506/5fbaddb1452b8f71a0691af3/html5/thumbnails/41.jpg)
41
One program and many files
![Page 42: Introduction to Parallel Computing - UCLouvain · Introduction to Parallel Computing damien.francois@uclouvain.be October 2015. 2 Why parallel? Speed up – Solve a problem faster](https://reader035.vdocuments.net/reader035/viewer/2022062506/5fbaddb1452b8f71a0691af3/html5/thumbnails/42.jpg)
42
Several programs and one file
![Page 43: Introduction to Parallel Computing - UCLouvain · Introduction to Parallel Computing damien.francois@uclouvain.be October 2015. 2 Why parallel? Speed up – Solve a problem faster](https://reader035.vdocuments.net/reader035/viewer/2022062506/5fbaddb1452b8f71a0691af3/html5/thumbnails/43.jpg)
43
One program and one large file
Need recent version of Coreutils/8.22-goolf-1.4.10
![Page 44: Introduction to Parallel Computing - UCLouvain · Introduction to Parallel Computing damien.francois@uclouvain.be October 2015. 2 Why parallel? Speed up – Solve a problem faster](https://reader035.vdocuments.net/reader035/viewer/2022062506/5fbaddb1452b8f71a0691af3/html5/thumbnails/44.jpg)
44
Several programs and many files
![Page 45: Introduction to Parallel Computing - UCLouvain · Introduction to Parallel Computing damien.francois@uclouvain.be October 2015. 2 Why parallel? Speed up – Solve a problem faster](https://reader035.vdocuments.net/reader035/viewer/2022062506/5fbaddb1452b8f71a0691af3/html5/thumbnails/45.jpg)
45
Several programs and many files
![Page 46: Introduction to Parallel Computing - UCLouvain · Introduction to Parallel Computing damien.francois@uclouvain.be October 2015. 2 Why parallel? Speed up – Solve a problem faster](https://reader035.vdocuments.net/reader035/viewer/2022062506/5fbaddb1452b8f71a0691af3/html5/thumbnails/46.jpg)
46
Summary
● You have either
– one very large file to process● with one program (split – task farming)● with several programs (pipes, fifo – pipeline)
– many files to process● with one program (xargs – task farming)● with many programs (make – divide and conquer)
● Always embarrassingly parallel, work decomposition
![Page 47: Introduction to Parallel Computing - UCLouvain · Introduction to Parallel Computing damien.francois@uclouvain.be October 2015. 2 Why parallel? Speed up – Solve a problem faster](https://reader035.vdocuments.net/reader035/viewer/2022062506/5fbaddb1452b8f71a0691af3/html5/thumbnails/47.jpg)
47
GNU Parallel
● Syntax: parallel command ::: argument list
![Page 48: Introduction to Parallel Computing - UCLouvain · Introduction to Parallel Computing damien.francois@uclouvain.be October 2015. 2 Why parallel? Speed up – Solve a problem faster](https://reader035.vdocuments.net/reader035/viewer/2022062506/5fbaddb1452b8f71a0691af3/html5/thumbnails/48.jpg)
48
GNU Parallel
● Syntax: {} as argument placeholder. Can be modified
![Page 49: Introduction to Parallel Computing - UCLouvain · Introduction to Parallel Computing damien.francois@uclouvain.be October 2015. 2 Why parallel? Speed up – Solve a problem faster](https://reader035.vdocuments.net/reader035/viewer/2022062506/5fbaddb1452b8f71a0691af3/html5/thumbnails/49.jpg)
49
GNU Parallel
● Syntax: --xapply
![Page 50: Introduction to Parallel Computing - UCLouvain · Introduction to Parallel Computing damien.francois@uclouvain.be October 2015. 2 Why parallel? Speed up – Solve a problem faster](https://reader035.vdocuments.net/reader035/viewer/2022062506/5fbaddb1452b8f71a0691af3/html5/thumbnails/50.jpg)
50
GNU Parallel
● Syntax: :::: arguments in file
![Page 51: Introduction to Parallel Computing - UCLouvain · Introduction to Parallel Computing damien.francois@uclouvain.be October 2015. 2 Why parallel? Speed up – Solve a problem faster](https://reader035.vdocuments.net/reader035/viewer/2022062506/5fbaddb1452b8f71a0691af3/html5/thumbnails/51.jpg)
51
GNU Parallel
● Syntax: --pipe
![Page 52: Introduction to Parallel Computing - UCLouvain · Introduction to Parallel Computing damien.francois@uclouvain.be October 2015. 2 Why parallel? Speed up – Solve a problem faster](https://reader035.vdocuments.net/reader035/viewer/2022062506/5fbaddb1452b8f71a0691af3/html5/thumbnails/52.jpg)
52
Other interesting options
-S Use remote servers through SSH
-j n Run n jobs in parallel
-k Keep same order
--delay n Ensure there are n seconds between each start
--timeout n Kill task after n seconds if still running
Author asks to be cited: O. Tange (2011): GNU Parallel - The Command-Line Power Tool, The USENIX Magazine, February 2011:42-47.
![Page 53: Introduction to Parallel Computing - UCLouvain · Introduction to Parallel Computing damien.francois@uclouvain.be October 2015. 2 Why parallel? Speed up – Solve a problem faster](https://reader035.vdocuments.net/reader035/viewer/2022062506/5fbaddb1452b8f71a0691af3/html5/thumbnails/53.jpg)
53
Exercises
● Can you reproduce the examples with ./lower and ./upper.sh using GNU Parallel?
![Page 54: Introduction to Parallel Computing - UCLouvain · Introduction to Parallel Computing damien.francois@uclouvain.be October 2015. 2 Why parallel? Speed up – Solve a problem faster](https://reader035.vdocuments.net/reader035/viewer/2022062506/5fbaddb1452b8f71a0691af3/html5/thumbnails/54.jpg)
54
Solutions
● One program and many files
dfr@hmem00:~/parcomp $ time -k parallel ./lower.sh {} > res.txt ::: d?.txt
● one program and one large file
dfr@hmem00:~/parcomp $ time cat d.txt | parallel -k -N1 --pipe ./lower.sh {} > res.txt
● several programs and several files
dfr@hmem00:~/parcomp $ time { parallel ./lower.sh {} {.}.tmp ::: d?.txt ; parallel ./upper.sh {} {.}.res ::: d?.tmp ; }