cs 770g - parallel algorithms in scientific computing …cs770g/handout/introduction.pdf · sni...

21
CS 770G - Parallel Algorithms in Scientific Computing May 7 , 2001 Lecture 1 Introduction

Upload: phamduong

Post on 30-Jun-2019

215 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: CS 770G - Parallel Algorithms in Scientific Computing …cs770g/handout/introduction.pdf · SNI VP200EX Uni Dresden Hitachi/Tsukuba CP-PACS/2048 Intel ASCI Red Sandia IBM ASCI White

CS 770G - Parallel Algorithms in Scientific Computing

May 7 , 2001Lecture 1

Introduction

Page 2: CS 770G - Parallel Algorithms in Scientific Computing …cs770g/handout/introduction.pdf · SNI VP200EX Uni Dresden Hitachi/Tsukuba CP-PACS/2048 Intel ASCI Red Sandia IBM ASCI White

2

Course Information

• Instructor: Justin Wan ([email protected])

DC 3129• Class homepage: http://www.student.math.uwaterloo.ca/~cs770g• Office hours

– To be determined

• Prerequisites– Familiarity with basic numerical computations such as materials

covered in CS 370.– Experience with programming languages such as C, C++, or

Fortran.

Page 3: CS 770G - Parallel Algorithms in Scientific Computing …cs770g/handout/introduction.pdf · SNI VP200EX Uni Dresden Hitachi/Tsukuba CP-PACS/2048 Intel ASCI Red Sandia IBM ASCI White

3

Why Need Powerful Computers ?

• Traditional scientific & engineering paradigm:– Do theory or paper design.– Perform experiments or build systems.

• Supplement both by numerical experiments– Real phenomena too complicated to model by hand.– Real experiments are: Too hard: e.g. build large wind tunnels. Too expensive: e.g. build a throw-away passenger jet. Too slow: e.g. wait for tornado to come. Too dangerous: e.g. weapons, drug design.

Page 4: CS 770G - Parallel Algorithms in Scientific Computing …cs770g/handout/introduction.pdf · SNI VP200EX Uni Dresden Hitachi/Tsukuba CP-PACS/2048 Intel ASCI Red Sandia IBM ASCI White

4

High Performance Computing

• Units/Notation: 1 Mflop 1 Megaflop 106 flop/sec 1 Gflop 1 Gigaflop 109 flop/sec 1 Tflop 1 Teraflop 1012 flop/sec 1 MB 1 Megabyte 106 bytes 1 GB 1 Gigabyte 109 bytes 1 TB 1 Terabyte 1012 bytes 1 PB 1 Petabyte 1015 bytes

Page 5: CS 770G - Parallel Algorithms in Scientific Computing …cs770g/handout/introduction.pdf · SNI VP200EX Uni Dresden Hitachi/Tsukuba CP-PACS/2048 Intel ASCI Red Sandia IBM ASCI White

5

High Performance Computers

• High end: – ASCI White IBM SP3– ASCI Red Intel Pentium II Xeon– ASCI Blue-Pacific IBM SP– ASCI Blue Mountain SGI.

• Powerful:– Cray T3E, SGI Origin 2000, IBM SP, HP, Hitachi, Fujitsu,

NEC, SUN.

• History:– Thinking Machines, MasPar, nCube, Meiko…

Page 6: CS 770G - Parallel Algorithms in Scientific Computing …cs770g/handout/introduction.pdf · SNI VP200EX Uni Dresden Hitachi/Tsukuba CP-PACS/2048 Intel ASCI Red Sandia IBM ASCI White

6

TOP 500 List

• Statistics on high-performance computers• Top 500 most powerful computer systems• Performance measure: Linpack benchmark• Updated twice a year

Page 7: CS 770G - Parallel Algorithms in Scientific Computing …cs770g/handout/introduction.pdf · SNI VP200EX Uni Dresden Hitachi/Tsukuba CP-PACS/2048 Intel ASCI Red Sandia IBM ASCI White

7

Highlights from Top 10

• No.1: ASCI White 4.9 TF on Linpack benchmark.• DOE ASCI systems hold the first 4 positions.• 7 systems have Linpack performance above 1TF.• 18 systems have peak performance above 1TF,

including 1 commercial system (#15 at Charles Schwab).

• 0.89 TF is the entry point for the Top 10

Page 8: CS 770G - Parallel Algorithms in Scientific Computing …cs770g/handout/introduction.pdf · SNI VP200EX Uni Dresden Hitachi/Tsukuba CP-PACS/2048 Intel ASCI Red Sandia IBM ASCI White

8

Highlights from Top 500

• 231 systems dropped off TOP500 since last June.• Accumulated performance: 65.2 TF -> 88.1 TF.• Entry level: 43.8 GF -> 55.1 GF.• Pure SMP: 121 -> 17.• 112 systems are clusters of SMPs.• Network of workstations: 11 -> 28.

Page 9: CS 770G - Parallel Algorithms in Scientific Computing …cs770g/handout/introduction.pdf · SNI VP200EX Uni Dresden Hitachi/Tsukuba CP-PACS/2048 Intel ASCI Red Sandia IBM ASCI White

9

More Statistics

TOP500 11/00

Systems installed

Industry49%

Academic17%

Classified5%

Govern't2%Vendor

3%Research24%

Total: 500

Page 10: CS 770G - Parallel Algorithms in Scientific Computing …cs770g/handout/introduction.pdf · SNI VP200EX Uni Dresden Hitachi/Tsukuba CP-PACS/2048 Intel ASCI Red Sandia IBM ASCI White

10

More Statistics

TOP500 11/00

Performance Development

88.0 TF/s

1.167 TF/s

59.7 GF/s

4.94 TF/s

0.4 GF/s

55.1 GF/s

Jun

93No

v 93

Jun

94No

v 94

Jun

95No

v 95

Jun

96No

v 96

Jun

97No

v 97

Jun

98No

v 98

Jun

99No

v 99

Jun

00No

v 00

Intel XP/S140

Sandia

Fujitsu

'NWT' NAL

SNI VP200EX

Uni Dresden

Hitachi/Tsukuba

CP-PACS/2048

Intel

ASCI Red

Sandia

IBM

ASCI White

LLNL

N=1

N=500

SUM

IBM SP PC604e

130 processors

Alcatel

1 Gflop/s

1 Tflop/s

100 Mflop/s

100 Gflop/s

100 Tflop/s

10 Gflop/s

10 Tflop/s

Page 11: CS 770G - Parallel Algorithms in Scientific Computing …cs770g/handout/introduction.pdf · SNI VP200EX Uni Dresden Hitachi/Tsukuba CP-PACS/2048 Intel ASCI Red Sandia IBM ASCI White

11

More Statistics

TOP500 11/00

Architectures

Single Processor

SMP

MPP

SIMDConstellati Cluster - NOW

0

100

200

300

400

500

Jun-

93No

v-93

Jun-

94No

v-94

Jun-

95No

v-95

Jun-

96No

v-96

Jun-

97No

v-97

Jun-

98No

v-98

Jun-

99No

v-99

Jun-

00No

v-00

CluMPs

Y-MP C90

Sun HPC

Paragon

CM5T3D

T3E

SP2

Cluster of Sun HPC

ASCI Red

CM2

VP500

SX3

Page 12: CS 770G - Parallel Algorithms in Scientific Computing …cs770g/handout/introduction.pdf · SNI VP200EX Uni Dresden Hitachi/Tsukuba CP-PACS/2048 Intel ASCI Red Sandia IBM ASCI White

12

More Statistics

TOP500 11/00

Chip Technology

Alpha

Power

HP

intel

MIPS

SUNother COTS

proprietary

0

100

200

300

400

500

Jun-

93No

v-93

Jun-

94No

v-94

Jun-

95No

v-95

Jun-

96No

v-96

Jun-

97No

v-97

Jun-

98No

v-98

Jun-

99No

v-99

Jun-

00No

v-00

Page 13: CS 770G - Parallel Algorithms in Scientific Computing …cs770g/handout/introduction.pdf · SNI VP200EX Uni Dresden Hitachi/Tsukuba CP-PACS/2048 Intel ASCI Red Sandia IBM ASCI White

13

More Statistics

TOP500 11/00

Manufacturer

Cray

SGI

IBM

Sun

Convex/HP

TMC

Intel

FujitsuNEC

Hitachiothers

0

100

200

300

400

500

Jun

93No

v 93

Jun

94No

v 94

Jun

95No

v 95

Jun

96No

v 96

Jun

97No

v 97

Jun

98No

v 98

Jun

99No

v 99

Jun

00No

v 00

Page 14: CS 770G - Parallel Algorithms in Scientific Computing …cs770g/handout/introduction.pdf · SNI VP200EX Uni Dresden Hitachi/Tsukuba CP-PACS/2048 Intel ASCI Red Sandia IBM ASCI White

14

More StatisticsTOP500 11/00

0%

5%

10%

15%

20%

25%

30%

35%

40%

45%

HP

Compaq

Cray Inc.

Fujitsu

Hitachi

IBM

Intel

NEC

SGI

SUN

Self Made

Number of systems installed

Total: 500

Page 15: CS 770G - Parallel Algorithms in Scientific Computing …cs770g/handout/introduction.pdf · SNI VP200EX Uni Dresden Hitachi/Tsukuba CP-PACS/2048 Intel ASCI Red Sandia IBM ASCI White

15

More Statistics

TOP500 11/00

198019

8119

8219

8319

8419

8519

8619

8719

8819

8919

9019

9119

9219

9319

9419

9519

9619

9719

9819

9920

00

Cray

Cray Computer

SGI

CDC/ETA

Fujitsu

NEC

Hitachi

Convex/HP

TMC

Intel

nCUBE

Alliant

FPS

Meiko

Parsytec

MasPar

DEC/Compaq

KSR

IBM

Sun

Page 16: CS 770G - Parallel Algorithms in Scientific Computing …cs770g/handout/introduction.pdf · SNI VP200EX Uni Dresden Hitachi/Tsukuba CP-PACS/2048 Intel ASCI Red Sandia IBM ASCI White

16

ASCI• Accelerated Strategic Computing Initiative (ASCI).• An initiative of defense programs at DOE, US.

– Shift from nuclear test-based methods to compute-based methods.

• Design nuclear weapons, analyze their performance, predict their safety and reliability, etc, without underground nuclear testing → vitual testings / computer simulations.

• Require higher-resolution, 3D, full-physics, full system capabilities → high performance computing.

Page 17: CS 770G - Parallel Algorithms in Scientific Computing …cs770g/handout/introduction.pdf · SNI VP200EX Uni Dresden Hitachi/Tsukuba CP-PACS/2048 Intel ASCI Red Sandia IBM ASCI White

17

C3• Canadian high performance computing organization.• 50 member institutions, 15 resources providers.

Page 18: CS 770G - Parallel Algorithms in Scientific Computing …cs770g/handout/introduction.pdf · SNI VP200EX Uni Dresden Hitachi/Tsukuba CP-PACS/2048 Intel ASCI Red Sandia IBM ASCI White

18

Other Grand Challenge Computations• Global climate modeling.• Dyna3D-crash simulation.• Astrophysical modeling.• Earthquake modeling.• Heart simulation.• Web search.• Transaction processing.• Drug design.

Page 19: CS 770G - Parallel Algorithms in Scientific Computing …cs770g/handout/introduction.pdf · SNI VP200EX Uni Dresden Hitachi/Tsukuba CP-PACS/2048 Intel ASCI Red Sandia IBM ASCI White

19

Why Need High Performance Computing ?

• Example: 24 hr weather prediction for North America.• Cover the region (2 x 107 km2 x 20 km) with grid

points, mesh size = 1 km → 4 x 1015 grid pts.• Suppose it takes 1 flop to calculate the weather at each

grid pt every hour → 1015 flops.• For a PC with 1Gflops → 12 days.• For a high performance computer with 1Tflops

→ 17 min.• How about weather prediction for the entire earth?!

Page 20: CS 770G - Parallel Algorithms in Scientific Computing …cs770g/handout/introduction.pdf · SNI VP200EX Uni Dresden Hitachi/Tsukuba CP-PACS/2048 Intel ASCI Red Sandia IBM ASCI White

20

Why Parallel Computers ?

• Suppose a CPU can perform 1Tflops.• Data must at least travel some distance r from memory

to CPU.

• 1 data per cycle → 1012 times per sec.• Suppose data travels at the speed of light (3 x 108 m/s) ⇒ r < 0.3 mm.

• Suppose 1 TB data is stored in 0.3mm x 0.3mm square ⇒ each side contains 106 data.

• Each byte occupies about 3 x 10-7 mm ≈ size of an atom!

Page 21: CS 770G - Parallel Algorithms in Scientific Computing …cs770g/handout/introduction.pdf · SNI VP200EX Uni Dresden Hitachi/Tsukuba CP-PACS/2048 Intel ASCI Red Sandia IBM ASCI White

21

Why Parallel Algorithms ?

• Fast methods on sequential machines may not be easily parallelized.

• Relatively slow methods on sequential machines may be highly parallel.

• Need redesign exisiting algorithms and/or design entirely new approaches.

• Need new theory to provide theoretical foundation for the new methods.