1 programming multicore processors aamir shafi high performance computing lab
TRANSCRIPT
![Page 1: 1 Programming Multicore Processors Aamir Shafi High Performance Computing Lab](https://reader035.vdocuments.net/reader035/viewer/2022062304/56649e205503460f94b0c607/html5/thumbnails/1.jpg)
1
Programming Multicore Processors
Aamir Shafi
High Performance Computing Lab
http://hpc.seecs.nust.edu.pk
![Page 2: 1 Programming Multicore Processors Aamir Shafi High Performance Computing Lab](https://reader035.vdocuments.net/reader035/viewer/2022062304/56649e205503460f94b0c607/html5/thumbnails/2.jpg)
2
Serial Computation • Traditionally, software has been written for
serial computation:• To be run on a single computer having a single
Central Processing Unit (CPU)• A problem is broken into a discrete series of
instructions
![Page 3: 1 Programming Multicore Processors Aamir Shafi High Performance Computing Lab](https://reader035.vdocuments.net/reader035/viewer/2022062304/56649e205503460f94b0c607/html5/thumbnails/3.jpg)
Parallel Computation• Parallel computing is the simultaneous use of
multiple compute resources to solve a computational problem:• Also known as High Performance Computing (HPC)
• The prime focus of HPC is performance—the ability to solve biggest possible problems in the least possible time
3
![Page 4: 1 Programming Multicore Processors Aamir Shafi High Performance Computing Lab](https://reader035.vdocuments.net/reader035/viewer/2022062304/56649e205503460f94b0c607/html5/thumbnails/4.jpg)
Traditional Usage of Parallel Computing--Scientific Computing • Traditionally parallel computing is used to solve
challenging scientific problems by doing simulations: • For this reason, it is also called “Scientific Computing”:
• Computational science
4
![Page 5: 1 Programming Multicore Processors Aamir Shafi High Performance Computing Lab](https://reader035.vdocuments.net/reader035/viewer/2022062304/56649e205503460f94b0c607/html5/thumbnails/5.jpg)
Emergence of Multi-core Processors
• In the last decade, performance of processors is not enhanced by increasing clock speed:• Increasing clock speed directly increases power
consumption• Power is dissipated as heat, not practical to cool down
processors• Intel canceled a project to produce 4 GHz processor!
• This led to the emergence of multi-core processors:• Performance is increased by increasing processing cores
that run on lower clock speed:• Implies better power usage
5Disruptive Technology!
![Page 6: 1 Programming Multicore Processors Aamir Shafi High Performance Computing Lab](https://reader035.vdocuments.net/reader035/viewer/2022062304/56649e205503460f94b0c607/html5/thumbnails/6.jpg)
6
Moore’s Law is Alive and Well
![Page 7: 1 Programming Multicore Processors Aamir Shafi High Performance Computing Lab](https://reader035.vdocuments.net/reader035/viewer/2022062304/56649e205503460f94b0c607/html5/thumbnails/7.jpg)
7
Power Wall
![Page 8: 1 Programming Multicore Processors Aamir Shafi High Performance Computing Lab](https://reader035.vdocuments.net/reader035/viewer/2022062304/56649e205503460f94b0c607/html5/thumbnails/8.jpg)
Why Multi-core Processors Consume Lesser Power
• Dynamic power is proportional to V2fC• Increasing frequency (f) also increases supply voltage
(V): more than linear effect • Increasing cores increases capacitance (C) but has only
a linear effect
8
![Page 9: 1 Programming Multicore Processors Aamir Shafi High Performance Computing Lab](https://reader035.vdocuments.net/reader035/viewer/2022062304/56649e205503460f94b0c607/html5/thumbnails/9.jpg)
9
Software in the Multi-core Era• The challenge has been thrown to the software
industry:• Parallelism is perhaps the answer
• The Free Lunch Is Over: A Fundamental Turn Toward Concurrency in Software:• http://www.gotw.ca/publications/concurrency-ddj.htm
• Some excerpts:• The biggest sea change in software development since the
OO revolution is knocking at the door, and its name is Concurrency
• This essentially means every software programmer will be a parallel programmer:• The main motivation behind conducting this “Programming
Multicore Processors” workshop
![Page 10: 1 Programming Multicore Processors Aamir Shafi High Performance Computing Lab](https://reader035.vdocuments.net/reader035/viewer/2022062304/56649e205503460f94b0c607/html5/thumbnails/10.jpg)
10
About the “Programming Multicore Processors” Workshop
![Page 11: 1 Programming Multicore Processors Aamir Shafi High Performance Computing Lab](https://reader035.vdocuments.net/reader035/viewer/2022062304/56649e205503460f94b0c607/html5/thumbnails/11.jpg)
12
Course Contents …
A little background on Parallel Computing Approaches
![Page 12: 1 Programming Multicore Processors Aamir Shafi High Performance Computing Lab](https://reader035.vdocuments.net/reader035/viewer/2022062304/56649e205503460f94b0c607/html5/thumbnails/12.jpg)
Parallel Hardware• Three main classifications:
• Shared Memory Multi-processors: • Symmetric Multi-Processors (SMP)• Multi-core Processors
• Distributed Memory Multi-processors• Massively Parallel Processors (MPP)• Clusters:
• Commodity and custom clusters
• Hybrid Multi-processors: • Mixture of shared and distributed memory technologies
13
![Page 13: 1 Programming Multicore Processors Aamir Shafi High Performance Computing Lab](https://reader035.vdocuments.net/reader035/viewer/2022062304/56649e205503460f94b0c607/html5/thumbnails/13.jpg)
14
First Type: Shared Memory Multi-processors
• All processors have access to shared memory: • Notion of “Global Address Space”
![Page 14: 1 Programming Multicore Processors Aamir Shafi High Performance Computing Lab](https://reader035.vdocuments.net/reader035/viewer/2022062304/56649e205503460f94b0c607/html5/thumbnails/14.jpg)
15
Symmetric Multi-Processors (SMP)• A SMP is a parallel processing system with a shared-
everything approach:• The term signifies that each processor shares the main
memory and possibly the cache
• Typically a SMP can have 2 to 256 processors• Also called Uniform Memory Access (UMA)• Examples include AMD Athlon, AMD Opteron 200 and
2000 series, Intel XEON etc
![Page 15: 1 Programming Multicore Processors Aamir Shafi High Performance Computing Lab](https://reader035.vdocuments.net/reader035/viewer/2022062304/56649e205503460f94b0c607/html5/thumbnails/15.jpg)
16
Multi-core Processors
![Page 16: 1 Programming Multicore Processors Aamir Shafi High Performance Computing Lab](https://reader035.vdocuments.net/reader035/viewer/2022062304/56649e205503460f94b0c607/html5/thumbnails/16.jpg)
17
Second Type: Distributed Memory
• Each processor has its own local memory• Processors communicate with each other by
message passing on an interconnect
![Page 17: 1 Programming Multicore Processors Aamir Shafi High Performance Computing Lab](https://reader035.vdocuments.net/reader035/viewer/2022062304/56649e205503460f94b0c607/html5/thumbnails/17.jpg)
18
Cluster Computers• A group of PCs or workstations or Macs (called
nodes) connected to each other via a fast (and private) interconnect: • Each node is an independent computer
• Each cluster has one head-node and multiple compute-nodes:• Users logon to head-node and start parallel jobs on
compute-nodes
• Two popular cluster classifications: • Beowulf Clusters (http://www.beowulf.org)• Rocks Clusters (http://www.rocksclusters.org)
![Page 18: 1 Programming Multicore Processors Aamir Shafi High Performance Computing Lab](https://reader035.vdocuments.net/reader035/viewer/2022062304/56649e205503460f94b0c607/html5/thumbnails/18.jpg)
19
Proc 6
Proc 0
Proc 1
Proc 3
Proc 2
Proc 4
Proc 5
Proc 7
Cluster Computer
![Page 19: 1 Programming Multicore Processors Aamir Shafi High Performance Computing Lab](https://reader035.vdocuments.net/reader035/viewer/2022062304/56649e205503460f94b0c607/html5/thumbnails/19.jpg)
20
Third Type: Hybrid• Modern clusters have hybrid architecture:
• Distributed memory for inter-node (between nodes) communications
• Shared memory for intra-node (within a node) communications
![Page 20: 1 Programming Multicore Processors Aamir Shafi High Performance Computing Lab](https://reader035.vdocuments.net/reader035/viewer/2022062304/56649e205503460f94b0c607/html5/thumbnails/20.jpg)
21
SMP and Multi-core clusters• Most modern commodity clusters have SMP and/or
multi-core nodes: • Processors not only communicate via interconnect, but
shared memory programming is also required
• This trend is likely to continue: • Even a new name “constellations” has been proposed
![Page 21: 1 Programming Multicore Processors Aamir Shafi High Performance Computing Lab](https://reader035.vdocuments.net/reader035/viewer/2022062304/56649e205503460f94b0c607/html5/thumbnails/21.jpg)
Classification of Parallel Computers
22
Parallel HardwareParallel Hardware
Shared Memory HardwareShared Memory Hardware Distributed Memory HardwareDistributed Memory Hardware
SMPsSMPs Multicore ProcessorsMulticore Processors ClustersClustersMPPsMPPs
In this workshop, we will learn how to program shared memory parallel hardware …
Parallel Hardware Shared Memory Hardware *
![Page 22: 1 Programming Multicore Processors Aamir Shafi High Performance Computing Lab](https://reader035.vdocuments.net/reader035/viewer/2022062304/56649e205503460f94b0c607/html5/thumbnails/22.jpg)
Writing Parallel Software• There are mainly two approaches for writing
parallel software• The first approach is to use libraries (packages)
written in already existing languages:• Economical
• The second and more radical approach is to provide new languages: • Parallel Computing has a history of novel parallel
languages• These languages provide high level parallelism
constructs:
23
![Page 23: 1 Programming Multicore Processors Aamir Shafi High Performance Computing Lab](https://reader035.vdocuments.net/reader035/viewer/2022062304/56649e205503460f94b0c607/html5/thumbnails/23.jpg)
24
Shared Memory Languages and Libraries
• Designed to support parallel programming on shared memory platforms:• OpenMP:
• Consists of a set of compiler directives, library routines, and environment variables
• The runtime uses fork-join model of parallel execution
• Cilk++:• A design goal was to support asynchronous parallelism
• A set of keywords:• cilk_for, cilk_spawn, cilk_sync …
• POSIX Threads (PThreads)• Threads Building Blocks (TBB)
![Page 24: 1 Programming Multicore Processors Aamir Shafi High Performance Computing Lab](https://reader035.vdocuments.net/reader035/viewer/2022062304/56649e205503460f94b0c607/html5/thumbnails/24.jpg)
Distributed Memory Languages and Libraries
• Libraries: • Message Passing Interface (MPI)—defacto standard• PVM
• Languages:• High Performance Fortran (HPF): • Fortran M:• HPJava:
25
![Page 25: 1 Programming Multicore Processors Aamir Shafi High Performance Computing Lab](https://reader035.vdocuments.net/reader035/viewer/2022062304/56649e205503460f94b0c607/html5/thumbnails/25.jpg)
26
Our Focus• Shared Memory and Multi-core Processors
Machines:• Using POSIX Threads• Using OpenMP• Using Cilk++ (covered briefly)
• Disruptive Technology: • Using Graphics Processing Units (GPUs) by NVIDIA for
general-purpose computing
![Page 26: 1 Programming Multicore Processors Aamir Shafi High Performance Computing Lab](https://reader035.vdocuments.net/reader035/viewer/2022062304/56649e205503460f94b0c607/html5/thumbnails/26.jpg)
Day One
27
Timings Topic Presenter10:00 to 10:30 Introduction to multicore
computingAamir Shafi
10:30 to 11:30 Background discussion—review of processes, threads, and architecture. Speedup analysis
Akbar Mehdi
11:30 to 11:45 Break11:45 to 12:55P Introduction to POSIX
ThreadsAkbar Mehdi
12:55P to 1:25P Prayers break1:25P to 2:30P Practical Session—Run
hello world PThreads program, introduce Linux, top, Solaris. Also introduce the first coding assignment
Akbar Mehdi
![Page 27: 1 Programming Multicore Processors Aamir Shafi High Performance Computing Lab](https://reader035.vdocuments.net/reader035/viewer/2022062304/56649e205503460f94b0c607/html5/thumbnails/27.jpg)
Day Two
28
Timings Topic Presenter
10:00 to 11:00 POSIX Threads continued…
Akbar Mehdi
11:00 to 12:55P Introduction to OpenMP
Mohsan Jameel
12:55P to 1:25P Prayer Break
1:25P to 2:30P OpenMP continued… + Lab session
Mohsan Jameel
![Page 28: 1 Programming Multicore Processors Aamir Shafi High Performance Computing Lab](https://reader035.vdocuments.net/reader035/viewer/2022062304/56649e205503460f94b0c607/html5/thumbnails/28.jpg)
Day Three
29
Timings Topic Presenter10:00 to 12:00 Parallelizing the Image
Processing Application using PThreads and OpenMP—Practical Session
Akbar Mehdi and Mohsan Jameel
12:00 to 12:55P Introduction to Intel Cilk++ Aamir Shafi
12:55 to 1:25P Prayer Break
1:25P to 2:30P Introduction to NVIDIA CUDA
Akbar Mehdi
2:30P to 2:35P Concluding Remarks Aamir Shafi
![Page 29: 1 Programming Multicore Processors Aamir Shafi High Performance Computing Lab](https://reader035.vdocuments.net/reader035/viewer/2022062304/56649e205503460f94b0c607/html5/thumbnails/29.jpg)
Learning Objectives• To become aware of the multicore revolution and
its impact on the computer software industry• To program multicore processors using POSIX
Threads• To program multicore processors using OpenMP
and Cilk++• To program Graphics Processing Units (GPUs) for
general purpose computation (using NVIDIA CUDA API)
30
You may download the tentative agenda from http://hpc.seecs.nust.edu.pk/~aamir/res/mc_agenda.pdf
![Page 30: 1 Programming Multicore Processors Aamir Shafi High Performance Computing Lab](https://reader035.vdocuments.net/reader035/viewer/2022062304/56649e205503460f94b0c607/html5/thumbnails/30.jpg)
Next Session
• Review of important and relevant Operating Systems and Computer Architecture concepts by Akbar Mehdi ….
31