introduction to high performance computing€¦ · high performance computing • supercomputers...

45
10/8/2008 1 Introduction to High Performance Computing Jon Johansson Academic ICT Copyright 2008, University of Alberta Academic ICT University of Alberta Agenda What is High Performance Computing? What is a supercomputer? What is a supercomputer ? is it a mainframe? Supercomputer architectures Who has the fastest computers? Speedup Programming for parallel computing The GRID?? Copyright 2008, University of Alberta The GRID??

Upload: others

Post on 24-Sep-2020

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Introduction to High Performance Computing€¦ · High Performance Computing • supercomputers use many CPUs to do the work • note that all supercomputing architectures havenote

10/8/2008

1

Introduction to High Performance Computing

Jon JohanssonAcademic ICT

Copyright 2008, University of Alberta

Academic ICTUniversity of Alberta

Agenda

• What is High Performance Computing?• What is a “supercomputer”?• What is a supercomputer ?

• is it a mainframe?• Supercomputer architectures• Who has the fastest computers?• Speedup• Programming for parallel computing

The GRID??

Copyright 2008, University of Alberta

• The GRID??

Page 2: Introduction to High Performance Computing€¦ · High Performance Computing • supercomputers use many CPUs to do the work • note that all supercomputing architectures havenote

10/8/2008

2

High Performance Computing

• HPC is the field that concentrates on developing supercomputers and software to run onsupercomputers and software to run on supercomputers

• a main area of this discipline is developing parallel processing algorithms and software• programs that can be divided into little pieces so that each

piece can be executed simultaneously by separate processors

Copyright 2008, University of Alberta

p

High Performance Computing

• HPC is about “big problems”, i.e. need:• lots of memorylots of memory• many cpu cycles• big hard drives

• no matter what field you work in, perhaps your research would benefit by making problems “larger”• 2d → 3d• finer mesh

i b f l t i th i l ti

Copyright 2008, University of Alberta

• increase number of elements in the simulation

Page 3: Introduction to High Performance Computing€¦ · High Performance Computing • supercomputers use many CPUs to do the work • note that all supercomputing architectures havenote

10/8/2008

3

Grand Challenges• weather forecasting• economic modelingg• computer-aided design• drug design• exploring the origins of the universe• searching for extra-terrestrial life• computer vision

Copyright 2008, University of Alberta

computer vision• nuclear power and weapons simulations

Grand Challenges – ProteinTo simulate the folding of a 300 amino acid protein in water:# of atoms: ~ 32,000,folding time: 1 millisecond# of FLOPs: 3 × 1022

Machine Speed: 1 PetaFLOP/sSimulation Time: 1 year

(Source: IBM Blue Gene Project)

Copyright 2008, University of Alberta

IBM’s answer: The Blue Gene ProjectUS$ 100 M of funding to build a1 PetaFLOP/s computer

Ken Dil and Kit Lau’s protein folding model.

Charles L Brooks III, Scripps Research Institute

Page 4: Introduction to High Performance Computing€¦ · High Performance Computing • supercomputers use many CPUs to do the work • note that all supercomputing architectures havenote

10/8/2008

4

Grand Challenges - Nuclear• National Nuclear Security

Administration• http://www.nnsa.doe.gov/

• use supercomputers to run three-dimensional codes to simulate instead of test

• address critical problems of materials aging• simulate the environment of

the weapon and try to gauge whether the device

Copyright 2008, University of Alberta

continues to be usable• stockpile science, molecular

dynamics and turbulence calculations

http://archive.greenpeace.org/comms/nukes/fig05.gif

Grand Challenges - Nuclear• March 7, 2002: first full-

system three-dimensional simulations of a nuclear weapon explosion

ASCI White

weapon explosion • simulation used more than

480 million cells (grid: 780x780x780)• if the grid is a cube

• 1,920 processors on IBM ASCI White at the Lawrence Livermore National laboratory

Copyright 2008, University of Alberta

y• 2,931 wall-clock hours

or 122.5 days • 6.6 million CPU hours Test shot “Badger”

Nevada Test Site – Apr. 1953 Yield: 23 kilotons

http://nuclearweaponarchive.org/Usa/Tests/Upshotk.html

Page 5: Introduction to High Performance Computing€¦ · High Performance Computing • supercomputers use many CPUs to do the work • note that all supercomputing architectures havenote

10/8/2008

5

Grand Challenges - Nuclear

Copyright 2008, University of Alberta

• Advanced Simulation and Computing Program (ASC)• http://www.llnl.gov/asc/asc_history/asci_mission.html

Agenda

• What is High Performance Computing?Wh t i “ t ”?• What is a “supercomputer”?• is it a mainframe?

• Supercomputer architectures• Who has the fastest computers?• Speedup• Programming for parallel computing

Copyright 2008, University of Alberta

Programming for parallel computing• The GRID??

Page 6: Introduction to High Performance Computing€¦ · High Performance Computing • supercomputers use many CPUs to do the work • note that all supercomputing architectures havenote

10/8/2008

6

What is a “Mainframe”?

• large and reasonably fast machines• the speed isn't the most important characteristic• the speed isn t the most important characteristic

• high-quality internal engineering and resulting proven reliability

• expensive but high-quality technical support• top-notch security• strict backward compatibility for older software

Copyright 2008, University of Alberta

What is a “Mainframe”?

• these machines can, and do, run successfully for years without interruption (long uptimes)years without interruption (long uptimes)

• repairs can take place while the mainframe continues to run

• the machines are robust and dependable• IBM coined a term advertise the robustness of their

mainframe computers :• Reliability Availability and Serviceability (RAS)

Copyright 2008, University of Alberta

• Reliability, Availability and Serviceability (RAS)

Page 7: Introduction to High Performance Computing€¦ · High Performance Computing • supercomputers use many CPUs to do the work • note that all supercomputing architectures havenote

10/8/2008

7

What is a “Mainframe”?

• Introducing IBM System z9 109• Designed for the On Demand

B iBusiness• IBM is delivering a holistic

approach to systems design• Designed and optimized with a

total systems approach• Helps keep your applications

running with enhanced protection against planned and unplanned outages

• Extended security capabilities

Copyright 2008, University of Alberta

• Extended security capabilities for even greater protection capabilities

• Increased capacity with more available engines per server

What is a Supercomputer??

• at any point in time the term “Supercomputer” refers to the fastest machines currently availableto the fastest machines currently available

• a supercomputer this year might be a mainframe in a couple of years

• a supercomputer is typically used for scientific and engineering applications that must do a great amount of computation

Copyright 2008, University of Alberta

Page 8: Introduction to High Performance Computing€¦ · High Performance Computing • supercomputers use many CPUs to do the work • note that all supercomputing architectures havenote

10/8/2008

8

What is a Supercomputer??

• the most significant difference between a supercomputer and a mainframe:supercomputer and a mainframe:• a supercomputer channels all its power into executing a few

programs as fast as possible• if the system crashes, restart the job(s) – no great harm

done• a mainframe uses its power to execute many programs

simultaneously• e.g. – a banking system

Copyright 2008, University of Alberta

• must run reliably for extended periods

What is a Supercomputer??

• to see the worlds “fastest” computers look at • http://www top500 org/• http://www.top500.org/

• measure performance with the Linpack benchmark • http://www.top500.org/lists/linpack.php• solve a dense system of linear equations• the performance numbers give a good indication of peak

performance

Copyright 2008, University of Alberta

Page 9: Introduction to High Performance Computing€¦ · High Performance Computing • supercomputers use many CPUs to do the work • note that all supercomputing architectures havenote

10/8/2008

9

Terminology

• combining a number of processors to run a i ll d i lprogram is called variously:

• multiprocessing• parallel processing• coprocessing

Copyright 2008, University of Alberta

Terminology

• parallel computing – harnessing a bunch of th hi tprocessors on the same machine to run your

computer program• note that this is one machine• generally a homogeneous architecture

• same processors, memory, operating system• all the machines in the Top 500 are in this

category

Copyright 2008, University of Alberta

Page 10: Introduction to High Performance Computing€¦ · High Performance Computing • supercomputers use many CPUs to do the work • note that all supercomputing architectures havenote

10/8/2008

10

Terminology• cluster:

• a set of generally homogeneous machinesi i ll b ilt i l t dit• originally built using low-cost commodity

hardware• to increase density, clusters are now

commonly build with 1-u rack servers or blades

• can use standard network interconnect or high performance interconnect such as I fi ib d M i tInfiniband or Myrinet

• cluster hardware is becoming quite specialized

• thought of as a single machine with a name, e.g. “glacier” – glacier.westgrid.ca

Copyright 2008, University of Alberta

Terminology

• distributed computing - harnessing a bunch f diff t hi tof processors on different machines to run

your computer program• heterogeneous architecture

• different operating systems, cpus, memory• the terms “parallel” and “distributed”

ti ft d i t h blcomputing are often used interchangeably • the work is divided into sections so each

processor does a unique piece

Copyright 2008, University of Alberta

Page 11: Introduction to High Performance Computing€¦ · High Performance Computing • supercomputers use many CPUs to do the work • note that all supercomputing architectures havenote

10/8/2008

11

Terminology

• some distributed computing projects are built BOINC (B k l O I f t t fon BOINC (Berkeley Open Infrastructure for

Network Computing):• SETI@home – Search for Extraterrestrial

Intelligence• Proteins@home – deduces DNA sequence,

given a proteing p• Hydrogen@home – enhance clean energy

technology by improving hydrogen production and storage (this is beta now)

Copyright 2008, University of Alberta

Terminology

• “Grid” computing• a Grid is a cluster of

supercomputers• in the ideal case:

• we submit our job with resource requirements

• the job is run on a machine with available resourcesavailable resources

• we get results back• NOTE: we don’t care where the

resources are, just that the job is run.

Copyright 2008, University of Alberta

Page 12: Introduction to High Performance Computing€¦ · High Performance Computing • supercomputers use many CPUs to do the work • note that all supercomputing architectures havenote

10/8/2008

12

Terminology

• “Utility” computing• computation and storage facilities are• computation and storage facilities are

provided as a commercial service• charges are for resources actually used

– “Pay and Use computing”

• “Cloud” computing• aka “on-demand computing”

• any IT-related capability can be provided as a “service”

• repackages grid computing and utility computing

• users can access computing resources in the “Cloud” – i.e. out in the Internet

Copyright 2008, University of Alberta

How to Measure Speed?

• count the number of “floating point operations” required to solve the problemrequired to solve the problem• + - x /

• results of the benchmark are so many Floating point Operations Per Second (FLOPS)

• a supercomputer is a machine that can provide a very large number of FLOPS

Copyright 2008, University of Alberta

Page 13: Introduction to High Performance Computing€¦ · High Performance Computing • supercomputers use many CPUs to do the work • note that all supercomputing architectures havenote

10/8/2008

13

⎥⎥⎤

⎢⎢⎡

⎥⎥⎤

⎢⎢⎡ NN

2...21

2...21

Floating Point Operations• multiply 2 1000x1000 matrices• for each resulting array element

⎥⎥⎥

⎦⎢⎢⎢

⎣⎥⎥⎥

⎦⎢⎢⎢

⎣ NN...2

...2g y

• 1000 multiplies• 999 adds

• do this 1,000,000 times• ~109 operations needed• increasing array size has the

number of operations increasing as O(N3)

Copyright 2008, University of Alberta

as O(N )

Agenda

• What is High Performance Computing?Wh t i “ t ”?• What is a “supercomputer”?• is it a mainframe?

• Supercomputer architectures• Who has the fastest computers?• Speedup• Programming for parallel computing

Copyright 2008, University of Alberta

Programming for parallel computing• The GRID??

Page 14: Introduction to High Performance Computing€¦ · High Performance Computing • supercomputers use many CPUs to do the work • note that all supercomputing architectures havenote

10/8/2008

14

High Performance Computing

• supercomputers use many CPUs to do the work• note that all supercomputing architectures have• note that all supercomputing architectures have

• processors and some combination cache• some form of memory and IO• the processors are separated from the other processors by

some distance• there are major differences in the way that the parts

are connected

Copyright 2008, University of Alberta

• some problems fit into different architectures better than others

High Performance Computing

• increasing computing power available to h llresearchers allows

• increasing problem dimensions• adding more particles to a system• increasing the accuracy of the result• improving experiment turnaround time

Copyright 2008, University of Alberta

Page 15: Introduction to High Performance Computing€¦ · High Performance Computing • supercomputers use many CPUs to do the work • note that all supercomputing architectures havenote

10/8/2008

15

Flynn’s Taxonomy

• Michael J. Flynn (1972)• classified computer architectures based on the• classified computer architectures based on the

number of concurrent instructions and data streams available• single instruction, single data (SISD) – basic old PC• multiple instruction, single data (MISD) – redundant systems• single instruction, multiple data (SIMD) – vector (or array)

processor • multiple instruction multiple data (MIMD) shared or

Copyright 2008, University of Alberta

• multiple instruction, multiple data (MIMD) – shared or distributed memory systems: symmetric multiprocessors and clusters

• common extension:• single program (or process), multiple data (SPMD)

Architectures

• we can also classify supercomputers di t h th daccording to how the processors and memory

are connected• couple processors to a single large memory

address space• couple computers, each with its own memory

address space

Copyright 2008, University of Alberta

p

Page 16: Introduction to High Performance Computing€¦ · High Performance Computing • supercomputers use many CPUs to do the work • note that all supercomputing architectures havenote

10/8/2008

16

Architectures• Symmetric

Multiprocessing (SMP)• Uniform Memory Access

(UMA)• multiple CPUs, residing

in one cabinet, share the same memory

• processors and memory are tightly coupled

• the processors share

Copyright 2008, University of Alberta

the processors share memory and the I/O bus or data path

Architectures

• SMP• a single copy of the

operating system is in charge of all the processors

• SMP systems range from two to as many as 32 or more processors

Copyright 2008, University of Alberta

Page 17: Introduction to High Performance Computing€¦ · High Performance Computing • supercomputers use many CPUs to do the work • note that all supercomputing architectures havenote

10/8/2008

17

Architectures

• SMP• SMP• "capability computing"

• one CPU can use all the memory

• all the CPUs can work on a little memory

• whatever you need

Copyright 2008, University of Alberta

whatever you need

Architectures

• UMA-SMP negatives• as the number of CPUs get large the buses

become saturated• long wires cause latency problems

Copyright 2008, University of Alberta

Page 18: Introduction to High Performance Computing€¦ · High Performance Computing • supercomputers use many CPUs to do the work • note that all supercomputing architectures havenote

10/8/2008

18

Architectures

• Non-Uniform Memory Access (NUMA)• NUMA is similar to SMP - multiple CPUs share a singleNUMA is similar to SMP multiple CPUs share a single

memory space• hardware support for shared memory

• memory is separated into close and distant banks• basically a cluster of SMPs

• memory on the same processor board as the CPU (local memory) is accessed faster than memory on other processor boards (shared memory)

Copyright 2008, University of Alberta

• hence "non-uniform"• NUMA architecture scales much better to higher numbers of

CPUs than SMP

Architectures

Copyright 2008, University of Alberta

Page 19: Introduction to High Performance Computing€¦ · High Performance Computing • supercomputers use many CPUs to do the work • note that all supercomputing architectures havenote

10/8/2008

19

Architectures

Copyright 2008, University of Alberta

University of Alberta SGI Origin SGI NUMA cables

Architectures

• Cache Coherent NUMA (ccNUMA)• each CPU has an associated cache• each CPU has an associated cache• ccNUMA machines use special-purpose hardware to

maintain cache coherence • typically done by using inter-processor communication

between cache controllers to keep a consistent memory image when the same memory location is stored in more than one cache

• ccNUMA performs poorly when multiple processors attempt

Copyright 2008, University of Alberta

ccNUMA performs poorly when multiple processors attempt to access the same memory area in rapid succession

Page 20: Introduction to High Performance Computing€¦ · High Performance Computing • supercomputers use many CPUs to do the work • note that all supercomputing architectures havenote

10/8/2008

20

Architectures

Distributed Memory Multiprocessor (DMMP)

h t h it• each computer has its own memory address space

• looks like NUMA but there is no hardware support for remote memory access

• the special purpose switched network is replaced by a general purpose network such as Ethernet or more specialized interconnects:

Copyright 2008, University of Alberta

specialized interconnects: • Infiniband• Myrinet Lattice: Calgary’s HP ES40 and ES45

cluster – each node has 4 processors

Architectures

• Massively Parallel Processing (MPP) Cluster of commodity PCscommodity PCs• processors and memory are loosely coupled• "capacity computing"• each CPU contains its own memory and copy of the

operating system and application. • each subsystem communicates with the others via a high-

speed interconnect.• in order to use MPP effectively, a problem must be

Copyright 2008, University of Alberta

y, pbreakable into pieces that can all be solved simultaneously

Page 21: Introduction to High Performance Computing€¦ · High Performance Computing • supercomputers use many CPUs to do the work • note that all supercomputing architectures havenote

10/8/2008

21

Architectures

Copyright 2008, University of Alberta

Architectures

• lots of “how to build a cluster” tutorials on the b j t G lweb – just Google:

• http://www.beowulf.org/• http://www.cacr.caltech.edu/beowulf/tutorial/b

uilding.html

Copyright 2008, University of Alberta

Page 22: Introduction to High Performance Computing€¦ · High Performance Computing • supercomputers use many CPUs to do the work • note that all supercomputing architectures havenote

10/8/2008

22

Architectures

• Vector Processor or Array Processor• a CPU design that is able to run mathematical operations ona CPU design that is able to run mathematical operations on

multiple data elements simultaneously• a scalar processor operates on data elements one at a

time• vector processors formed the basis of most supercomputers

through the 1980s and into the 1990s• “pipeline” the data

Copyright 2008, University of Alberta

Architectures

• Vector Processor or Array Processor• operate on many pieces of data simultaneouslyp y p y• consider the following add instruction:

• C = A + B • on both scalar and vector machines this means:

• add the contents of A to the contents of B and put the sum in C' • on a scalar machine the operands are numbers• on a vector machine the operands are vectors and the

instruction directs the machine to compute the pair-wise sum of

Copyright 2008, University of Alberta

each pair of vector elements

Page 23: Introduction to High Performance Computing€¦ · High Performance Computing • supercomputers use many CPUs to do the work • note that all supercomputing architectures havenote

10/8/2008

23

Architectures

• University of Victoria has 4 NEC SX-6/8A vector processorsp• in the School of Earth and Ocean

Sciences • each has 32 GB of RAM• 8 vector processors in the box• peak performance is 72 GFLOPS

Copyright 2008, University of Alberta

Agenda

• What is High Performance Computing?Wh t i “ t ”?• What is a “supercomputer”?• is it a mainframe?

• Supercomputer architectures• Who has the fastest computers?• Speedup• Programming for parallel computing

Copyright 2008, University of Alberta

Programming for parallel computing• The GRID??

Page 24: Introduction to High Performance Computing€¦ · High Performance Computing • supercomputers use many CPUs to do the work • note that all supercomputing architectures havenote

10/8/2008

24

BlueGene/L

• The fastest on the Nov. 2007 top 500 list:http://www top500 org/• http://www.top500.org/

• installed at the Lawrence Livermore National Laboratory (LLNL) (US Department of Energy)• Livermore California

Copyright 2008, University of Alberta

Copyright 2008, University of Alberta

http://www.llnl.gov/asc/platforms/bluegenel/photogallery.html

Page 25: Introduction to High Performance Computing€¦ · High Performance Computing • supercomputers use many CPUs to do the work • note that all supercomputing architectures havenote

10/8/2008

25

BlueGene/L• processors: 212992• memory: 72 TB• 104 racks – each has 2048 processors

• the first 64 had 512 GB of RAM (256 MB/processor)

• the 40 new racks have 1 TB of RAM (512 MB/processor)

• a Linpack performance of 478.2 TFlop/s

Copyright 2008, University of Alberta

a Linpack performance of 478.2 TFlop/s• in Nov 2005 it was the only system ever to

exceed the 100 TFlop/s mark• there are now 10 machines over 100 TFlop/s

The Fastest FiveSite Computer Cores Year Rmax (Gflops) Rpeak (Gflops)

DOE/NNSA/LANLUnited States

Roadrunner – BladeCenterQS22/LS21 Cluster

122400 2008 1,026,000 1,375,780Cell/OpteronIBM

DOE/NNSA/LLNLUnited States

BlueGene/L - eServer Blue Gene SolutionIBM

212992 2007 478,200 596,378

Argonne National LaboratoryUnited States

BlueGene/P SolutionIBM

163840 2007 450,300 557,060

Texas Advanced Computing Center/Univ. of Texas

Ranger – SunBlade x6420, Opteron Quad 2 GHzSGI

62976 2008 326,000 503,810

United States

DOE/OakridgeNational LaboratoryUnited States

Jaguar – Cray XT4 QuadCoreOpteron 2.1 GHz Hewlett-Packard

30976 2008 205,000 260,000

Copyright 2008, University of Alberta

Page 26: Introduction to High Performance Computing€¦ · High Performance Computing • supercomputers use many CPUs to do the work • note that all supercomputing architectures havenote

10/8/2008

26

# of Processors with Time

Copyright 2008, University of Alberta

The number of processors in the fastest machines has increased by about a factor of 200 in the last 15 years

# of Gflops Increase with Time

O P t fl !One Petaflop!

Copyright 2008, University of Alberta

Machine speed has increased by more than a factor of 15000 since 1993“Roadrunner” tests at > 1 petaflop for June 2008

Page 27: Introduction to High Performance Computing€¦ · High Performance Computing • supercomputers use many CPUs to do the work • note that all supercomputing architectures havenote

10/8/2008

27

Future BlueGene

Copyright 2008, University of Alberta

Roadrunner• cores: 122400

• 6,562 Opteron dual-core, 12,240 Cell• memory: 98 TB• 278 racks• a Linpack performance of 1026.00 TFlop/s• in June 2008 it was the only system ever to

exceed the 1 PetaFlop/s mark

Copyright 2008, University of Alberta

• cost: $100 million• weight: 500,000 lbs• power: 2.35 (or 3.9) megawatts

Page 28: Introduction to High Performance Computing€¦ · High Performance Computing • supercomputers use many CPUs to do the work • note that all supercomputing architectures havenote

10/8/2008

28

Roadrunner

Copyright 2008, University of Alberta

Agenda

• What is High Performance Computing?Wh t i “ t ”?• What is a “supercomputer”?• is it a mainframe?

• Supercomputer architectures• Who has the fastest computers?• Speedup• Programming for parallel computing

Copyright 2008, University of Alberta

Programming for parallel computing• The GRID??

Page 29: Introduction to High Performance Computing€¦ · High Performance Computing • supercomputers use many CPUs to do the work • note that all supercomputing architectures havenote

10/8/2008

29

Speedup

• how can we measure how much faster our program runs when using more than one processor? Tg p

• define Speedup S as:• the ratio of 2 program execution times• constant problem size

• T1 is the execution time for the problem on a single processor (use the “best” serial time)

• TP is the execution time for the problem on P processors

PTTS 1=

Copyright 2008, University of Alberta

Speedup

• Linear speedupp p• the time to execute the

problem decreases by the number of processors

• if a job requires 1 week with 1 processor it will take less that 10 minutes with 1024 processors

Copyright 2008, University of Alberta

Page 30: Introduction to High Performance Computing€¦ · High Performance Computing • supercomputers use many CPUs to do the work • note that all supercomputing architectures havenote

10/8/2008

30

Speedup

• Sublinear speedup• the usual case• there are generally some

limitations to the amount of speedup that you get

• communication

Copyright 2008, University of Alberta

Speedup

• Superlinear speedup• very rare• memory access patterns

may allow this for some algorithms

Copyright 2008, University of Alberta

Page 31: Introduction to High Performance Computing€¦ · High Performance Computing • supercomputers use many CPUs to do the work • note that all supercomputing architectures havenote

10/8/2008

31

Speedup• why do a speedup test?• it’s hard to tell how a

program will behave• e.g.

• “Strange” is actually fairly common behaviour for un-tuned code

• in this case:• linear speedup to ~10

cpus

Copyright 2008, University of Alberta

• after 24 cpus speedup is starting to decrease

Speedup

• to use more processors ffi i tl h thiefficiently change this

behaviour• change loop structure • adjust algorithms• ??

• run jobs with 10-20 processors so the machines are used efficiently

Copyright 2008, University of Alberta

are used efficiently

Page 32: Introduction to High Performance Computing€¦ · High Performance Computing • supercomputers use many CPUs to do the work • note that all supercomputing architectures havenote

10/8/2008

32

Speedup

• one class of jobs that have linear speed up are called “embarrassingly parallel”embarrassingly parallel• a better name might be “perfectly” parallel

• doesn’t take much effort to turn the problem into a bunch of parts that can be run in parallel:• parameter searches• rendering the frames in a computer animation• brute force searches in cryptography

Copyright 2008, University of Alberta

• brute force searches in cryptography

Speedup

• we have been discussing Strong Scaling• the problem size is fixed and we increase the number of• the problem size is fixed and we increase the number of

processors• decrease computational time (Amdahl Scaling)

• the amount of work available to each processor decreases as the number of processors increases

• eventually, the processors are doing more communication than number crunching and the speedup curve flattensdiffi lt t h hi h ffi i f l b f

Copyright 2008, University of Alberta

• difficult to have high efficiency for large numbers of processors

Page 33: Introduction to High Performance Computing€¦ · High Performance Computing • supercomputers use many CPUs to do the work • note that all supercomputing architectures havenote

10/8/2008

33

Speedup

• we are often interested in Weak Scaling• double the problem size when we double the number of• double the problem size when we double the number of

processors• constant computational time (Gustafson scaling)

• the amount of work for each processor has stays roughly constant

• parallel overhead is (hopefully) small compared to the real work the processor does

Copyright 2008, University of Alberta

• e.g. Weather prediction

Amdahl’s Law

• Gene Amdahl: 1967• parallelize some of the serial parallelp

program – some must remain serial

• f is the fraction of the calculation that is serial

• 1-f is the fraction of the calculation that is parallel

• the maximum speedup that can be obtained by using P S 1

=

f 1-f

serial parallel

Copyright 2008, University of Alberta

can be obtained by using P processors is:

Pff

S )1(max −+

=

Page 34: Introduction to High Performance Computing€¦ · High Performance Computing • supercomputers use many CPUs to do the work • note that all supercomputing architectures havenote

10/8/2008

34

Amdahl’s Law

• if 25% of the calculation must remain serial th b t d bt i i 4the best speedup you can obtain is 4

• need to parallelize as much of the program as possible to get the best advantage from multiple processors

Copyright 2008, University of Alberta

Agenda

• What is High Performance Computing?Wh t i “ t ”?• What is a “supercomputer”?• is it a mainframe?

• Supercomputer architectures• Who has the fastest computers?• Speedup• Programming for parallel computing

Copyright 2008, University of Alberta

Programming for parallel computing• The GRID??

Page 35: Introduction to High Performance Computing€¦ · High Performance Computing • supercomputers use many CPUs to do the work • note that all supercomputing architectures havenote

10/8/2008

35

Parallel Programming

• need to do something to your program to use multiple processorsmultiple processors

• need to incorporate commands into your program which allow multiple threads to run

• one thread per processor• each thread gets a piece of the work• several ways (APIs) to do this

Copyright 2008, University of Alberta

• several ways (APIs) to do this …

Parallel Programming

• OpenMP• introduce statements into your code• introduce statements into your code

• in C: #pragma• in FORTRAN: C$OMP or !$OMP

• can compile serial and parallel executables from the same source code

• restricted to shared memory machines• not clusters!

Copyright 2008, University of Alberta

• www.openmp.org

Page 36: Introduction to High Performance Computing€¦ · High Performance Computing • supercomputers use many CPUs to do the work • note that all supercomputing architectures havenote

10/8/2008

36

Parallel Programming

• OpenMP• demo: MatCrunch• demo: MatCrunch

• mathematical operations on the elements of an array• introduce 2 OMP directives before a loop

• # pragma omp parallel // define a parallel section• # pragma omp for // loop is to be parallel

• serial section: 4.03 sec• parallel section – 1 cpu: 40.27 secs

ll l ti 2 20 25

Copyright 2008, University of Alberta

• parallel section – 2 cpu: 20.25 secs• speedup = 1.99 // not bad for adding 2 lines

Parallel Programming

• for a larger number of processors theprocessors the speedup for MatCrunch is not linear

• need to do the speedup test to see how your program will

Copyright 2008, University of Alberta

how your program will behave

Page 37: Introduction to High Performance Computing€¦ · High Performance Computing • supercomputers use many CPUs to do the work • note that all supercomputing architectures havenote

10/8/2008

37

Parallel Programming

• MPI (Message Passing Interface)• a standard set of communication subroutine librariesa standard set of communication subroutine libraries

• works for SMPs and clusters• programs written with MPI are highly portable • information and downloads

• http://www.mpi-forum.org/• MPICH: http://www-unix.mcs.anl.gov/mpi/mpich/• LAM/MPI: http://www.lam-mpi.org/

O MPI htt // i /

Copyright 2008, University of Alberta

• Open MPI: http://www.open-mpi.org/

Parallel Programming

• MPI (Message Passing Interface)t th SPMD i l lti l• supports the SPMD, single program multiple

data model• all processors use the same program• each processor has its own data

• think of a cluster – each node is getting a copy of the program but running a specific

Copyright 2008, University of Alberta

py p g g pportion of it with its own data

Page 38: Introduction to High Performance Computing€¦ · High Performance Computing • supercomputers use many CPUs to do the work • note that all supercomputing architectures havenote

10/8/2008

38

Parallel Programming

• starting mpi jobs is not standardstandard• for mpich2 use “mpiexec”

• start a job with 6 processes

• 6 copies of the program run in the default Communicator GroupCommunicator Group “MPI_COMM_WORLD”

• each process has an ID – its “rank”

Copyright 2008, University of Alberta

Parallel Programming

• example: start N processes to calculateprocesses to calculate N-1 factorial

• 0! = 1• 1! = 1• 2! = 2 x 1 = 2• 3! = 3 x 2 x 1 = 6• …• n! = n x (n-1) x…x 2 x 1

Copyright 2008, University of Alberta

Page 39: Introduction to High Performance Computing€¦ · High Performance Computing • supercomputers use many CPUs to do the work • note that all supercomputing architectures havenote

10/8/2008

39

Parallel Programming

• generally the master process will:• send work to other processes• receive results from processes that complete• send more work to those processes• do final calculations• output results

d i i ffi i t l ith f ll thi i• designing an efficient algorithm for all this is up to you

Copyright 2008, University of Alberta

Parallel Programming

• it’s possible to combine OpenMP and MPI for running on clusters of SMP machinesrunning on clusters of SMP machines

• the trick in parallel programming is to keep all the processors• working (“load balancing”) • working on data that no other processor needs to

touch (there aren’t any cache conflicts)

Copyright 2008, University of Alberta

• parallel programming is generally harder than serial programming

Page 40: Introduction to High Performance Computing€¦ · High Performance Computing • supercomputers use many CPUs to do the work • note that all supercomputing architectures havenote

10/8/2008

40

Agenda

• What is High Performance Computing?Wh t i “ t ”?• What is a “supercomputer”?• is it a mainframe?

• Supercomputer architectures• Who has the fastest computers?• Speedup• Programming for parallel computing

Copyright 2008, University of Alberta

Programming for parallel computing• The GRID??

Grid Computing• A computational grid:

• is a large-scale distributed computing infrastructure• composed of geographically distributed autonomouscomposed of geographically distributed, autonomous

resource providers• lots of computers joined together• requires excellent networking that supports resource

sharing and distribution• offers access to all the resources that are part of the grid

• compute cycles• storage capacity• visualization/collaboration

Copyright 2008, University of Alberta

• visualization/collaboration• is intended for integrated and collaborative use by multiple

organizations

Page 41: Introduction to High Performance Computing€¦ · High Performance Computing • supercomputers use many CPUs to do the work • note that all supercomputing architectures havenote

10/8/2008

41

Grids

• Ian Foster (the “Father of the Grid”) says that to be a Grid three points must be metGrid three points must be met• computing resources are not administered centrally

• many sites connected• open standards are used

• not a proprietary system• non-trivial quality of service is achieved

• it is available most of the timeCERN says a Grid is “a service for sharing computer

Copyright 2008, University of Alberta

• CERN says a Grid is “a service for sharing computer power and data storage capacity over the Internet”

Canadian Academic Computing Sites in 2000

Copyright 2008, University of Alberta

Page 42: Introduction to High Performance Computing€¦ · High Performance Computing • supercomputers use many CPUs to do the work • note that all supercomputing architectures havenote

10/8/2008

42

Canadian Grids

• Some sites in Canada have tied their resources together to form 7 Canadian Grid Consortia:7 Canadian Grid Consortia:• ACENET Atlantic Computational Excellence Network• CLUMEQ Consortium Laval UQAM McGill and Eastern

Quebec for High Performance Computing• SCINET University of Toronto• HPCVL High Performance Computing Virtual

Laboratory• RQCHP Reseau Quebecois de calcul de haute performance• SHARCNET Shared Hierarchical Academic Research

Copyright 2008, University of Alberta

SHARCNET Shared Hierarchical Academic Research Computing Network

• WESTGRID Alberta, British Columbia

WestGrid

EdmontonSFU Campus Edmonton

Calgary

UBC Campus

SFU Campus

Copyright 2008, University of Alberta

Page 43: Introduction to High Performance Computing€¦ · High Performance Computing • supercomputers use many CPUs to do the work • note that all supercomputing architectures havenote

10/8/2008

43

Grids

• the ultimate goal of the Grid idea is to have a system that you can submit a job to, so that:that you can submit a job to, so that:• your job uses resources that fit requirements that you specify

128 nodes on an SMP200 GB of RAM

• or256 nodes on a PC cluster1 GB/processor

• when done the results come back to you

Copyright 2008, University of Alberta

y• you don’t care where the job runs

• Vancouver or St. John’s or in between

Sharing Resources

• HPC resources are not available quite as readily as your desktop computeryour desktop computer

• the resources must be shared fairly• the idea is that each person get as much of the resource as

necessary to run their job for a “reasonable” time• if the job can’t finish in the allotted time the job needs to

“checkpoint”• save enough information to begin running again from

Copyright 2008, University of Alberta

g g g gwhere it left off

Page 44: Introduction to High Performance Computing€¦ · High Performance Computing • supercomputers use many CPUs to do the work • note that all supercomputing architectures havenote

10/8/2008

44

Sharing Resources

• Portable Batch System (T )(Torque)

• submit a job to PBS• job is placed in a queue

with other users’ jobs• jobs in the queue are

prioritized by a scheduler• your job executes at

Copyright 2008, University of Alberta

some time in the future

An HPC Site

Sharing Resources

A Grid of HPC Sites• When connecting to a Grid we need a layer of “middleware” tools to securely access the resources

• Globus is one example• http://www globus org/

A Grid of HPC Sites

Copyright 2008, University of Alberta

• http://www.globus.org/

Page 45: Introduction to High Performance Computing€¦ · High Performance Computing • supercomputers use many CPUs to do the work • note that all supercomputing architectures havenote

10/8/2008

45

Questions?Many details in other sessions of this

seminar series!

Copyright 2008, University of Alberta