the numerical algorithms group

61
Over 30 years of mathematical excellence The Numerical Algorithms Group Combining mathematics and technology for enhanced performance Unlocking the Power of OpenMP Dr Stef Salvini NAG Ltd [email protected]

Upload: others

Post on 12-Sep-2021

18 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: The Numerical Algorithms Group

Over 30 years of mathematical excellence

The Numerical Algorithms GroupCombining mathematics and technology for enhanced performance

Unlocking the Power of OpenMPDr Stef SalviniNAG [email protected]

Page 2: The Numerical Algorithms Group

Stef Salvini 2

Over 30 years of mathematical excellenceEWOMP 2003 - 22-26 September 2003

Acknowledgements

Lawrence MulhollandEdward SmythThemos TsikasAnne E. TrefethenJeremy Du CrozRobert TongAnd too many others to mention here!

Page 3: The Numerical Algorithms Group

Stef Salvini 3

Over 30 years of mathematical excellenceEWOMP 2003 - 22-26 September 2003

Contents

Opening RemarksSMP Systems, Parallelism and OpenMPTuning OpenMP:

General PointsNag Specific Case: Needs, Solutions and Case Studies

OpenMP and Hybrid ParallelismSome Other Considerations and Conclusions

Page 4: The Numerical Algorithms Group

Stef Salvini 4

Over 30 years of mathematical excellenceEWOMP 2003 - 22-26 September 2003

Contents

Opening RemarksSMP Systems, Parallelism and OpenMPTuning OpenMP:

General PointsNag Specific Case: Needs, Solutions and Case Studies

OpenMP in Hybrid ParallelismSome Other Considerations and Conclusions

Page 5: The Numerical Algorithms Group

Stef Salvini 5

Over 30 years of mathematical excellenceEWOMP 2003 - 22-26 September 2003

NAG and HPC

NAG ProductsParallel Library Release 3 (MPI)NAG SMP Library Release 2 (Open MP)

Collaborations with external agenciesVendors (e.g. ACML Library for AMD Opteron)Research and academic institutionsIndustrial, commercial and financial concerns

Consultancy activities

Page 6: The Numerical Algorithms Group

Stef Salvini 6

Over 30 years of mathematical excellenceEWOMP 2003 - 22-26 September 2003

Who Can Benefit from Parallelism?

Anybody with large computationally intensive problems

Academic institutionsIndustry (Aerospace, car, etc.)Financial institutions (Forecasts, etc.)

Increasingly commercial systemsDatabasesOn-Line TransactionsData miningWeb Servers, etc

They want solutions!

Page 7: The Numerical Algorithms Group

Stef Salvini 7

Over 30 years of mathematical excellenceEWOMP 2003 - 22-26 September 2003

Some thoughts…

Short Life CycleHardware

Long Life CycleScientific Software

Is Software the Real Capital Investment ?

Page 8: The Numerical Algorithms Group

Stef Salvini 8

Over 30 years of mathematical excellenceEWOMP 2003 - 22-26 September 2003

The Changing World of HPC

The “(ir)resistible” rise of PC ClustersPrice/performance (claims or substantial?)Originally in-house built, now part of mainstreamIncreasing penetration of the server market

MPI as New LegacyIs Parallelism crystallised into MPI?Is anybody interested in other types of paralellism?

Hybrid systemsThe de facto standard for high-end systemsDo we need multi-level parallelism?

Page 9: The Numerical Algorithms Group

Stef Salvini 9

Over 30 years of mathematical excellenceEWOMP 2003 - 22-26 September 2003

Contents

Opening RemarksSMP Systems, Parallelism and OpenMPTuning OpenMP:

General PointsNag Specific Case: Needs, Solutions and Case Studies

OpenMP in Hybrid ParallelismSome Other Considerations and Conclusions

Page 10: The Numerical Algorithms Group

Stef Salvini 10

Over 30 years of mathematical excellenceEWOMP 2003 - 22-26 September 2003

Why SMPs?

HardwareRe-usable technologyModular technologyIncreasingly partitionable hardwareReliable Technology (minimum downtime)

Commercial applicationsDatabasesOLTPWeb Servers

Numerical and Scientific ApplicationsTremendous potential

Page 11: The Numerical Algorithms Group

Stef Salvini 11

Over 30 years of mathematical excellenceEWOMP 2003 - 22-26 September 2003

SMP Model

CPU

Cache

CPU

Cache

CPU

Cache

CPU

Cache

Memory

Interconnect Subsystem

Single memory visible to all processors

Memory can be physically partitioned

(NUMA systems)

Page 12: The Numerical Algorithms Group

Stef Salvini 12

Over 30 years of mathematical excellenceEWOMP 2003 - 22-26 September 2003

SMP Parallelism in a Nutshell

Multi-threaded Parallelism (Parallelism-on-demand)

Parallel Region

Spawn threads Destroy threads

Multi-threading•Parallel execution•Serial Interfaces•Details of parallelismare hidden outside theparallel region

Serial executionSerial execution

Parallelism carried out in distinct parallel regions

Page 13: The Numerical Algorithms Group

Stef Salvini 13

Over 30 years of mathematical excellenceEWOMP 2003 - 22-26 September 2003

The Data View

Local Memory

Primary Cache

Registers

Secondary Cache

Global Memory

Data must migrate through the different levels of memory in a

very coordinated fashion

Spee

d of

dat

a ac

cess

Size of available data space

Page 14: The Numerical Algorithms Group

Stef Salvini 14

Over 30 years of mathematical excellenceEWOMP 2003 - 22-26 September 2003

Memory and Data Transfer

Memory StructureMultiple-levels (increasing)Caches invaluable, essential and difficult

Some difficulties with data accessSingle processor effects

Cache misses and thrashingTLB misses

Multi-processor effectsFalse SharingRequired synchronisations

NUMA SystemsData allocation and distributionPage misses and migration

Page 15: The Numerical Algorithms Group

Stef Salvini 15

Over 30 years of mathematical excellenceEWOMP 2003 - 22-26 September 2003

SMP Parallelism: Dynamic View of Data

Computation Stage 1 Computation Stage 2Computation Stage 3

Data

Processor 1

Processor 2

Processor 3

Processor 4

Page 16: The Numerical Algorithms Group

Stef Salvini 16

Over 30 years of mathematical excellenceEWOMP 2003 - 22-26 September 2003

From SMP to OpenMP

OpenMP embodies SMP MechanismsConcise notation

Not all parallel structures representedSimple

To implement: hence wide acceptance by vendorsto understand (?)

Compiler DirectivesCompile cleanly on serial systems

However:“Local” references onlySome system calls

Page 17: The Numerical Algorithms Group

Stef Salvini 17

Over 30 years of mathematical excellenceEWOMP 2003 - 22-26 September 2003

Contents

Opening RemarksSMP Systems, Parallelism and OpenMPTuning OpenMP:

General PointsNag Specific Case: Needs, Solutions and Case Studies

OpenMP in Hybrid ParallelismSome Other Considerations and Conclusions

Page 18: The Numerical Algorithms Group

Stef Salvini 18

Over 30 years of mathematical excellenceEWOMP 2003 - 22-26 September 2003

Ensuring Efficiency on Modern Systems

Algorithm Must take into accountParallelismData access (multi-level memory layout)

Algorithms must haveDynamic load balancingSome strategy for serial or quasi-serial bottlenecks (beating Amdahl’s law?)Parametrised for easy configuration & porting

Should also algorithms take into account ofMulti-level memory (e.g. NUMA, clusters, etc)?Contingent (history of the computation) data layout?

Page 19: The Numerical Algorithms Group

Stef Salvini 19

Over 30 years of mathematical excellenceEWOMP 2003 - 22-26 September 2003

Levels of Parallelism

Coarse GrainApplication drivenMore potential for parallelismCloser to optimal performanceLess overheadsDesign complexityImplementation costsMaintenance costsRemoved from “serial” world“Non-local” data access problemNon modular designExpandable to non-SMP systems

Coarser

Fine

r Fine GrainClose to “elementary” algorithmsModular designDirect path from “serial” worldTop-down refinementHigher overheadsLess potential for parallelismSerial or quasi-serial bottlenecks

Page 20: The Numerical Algorithms Group

Stef Salvini 20

Over 30 years of mathematical excellenceEWOMP 2003 - 22-26 September 2003

Level of Parallelism

Coarse/Fine trade-off dictated byNature of applicationTechnological feasibilityAvailability of ComponentsExpertise and experienceTime scale and resources for developmentDeadlines for results

Page 21: The Numerical Algorithms Group

Stef Salvini 21

Over 30 years of mathematical excellenceEWOMP 2003 - 22-26 September 2003

The “reductionist approach”

Postulate: Parallelise the basic computational kernels

BLAS (Basic Linear Algebra Subroutines)Level-3 (Matrix-matrix product)Level-2 (Matrix-vector product)Level-1 (vector operations)

Basic FFTsTheorem: Anything built on them will have adequate parallelism

Proof by oral tradition

Page 22: The Numerical Algorithms Group

Stef Salvini 22

Over 30 years of mathematical excellenceEWOMP 2003 - 22-26 September 2003

Level-3 BLAS

N3 OperationsN2 Data ReferencesGood data re-use (cache friendly)Good parallelisability

Page 23: The Numerical Algorithms Group

Stef Salvini 23

Over 30 years of mathematical excellenceEWOMP 2003 - 22-26 September 2003

Level-2 and Level-1 BLAS

N2 and N OperationsN2 and N data ReferencesPoor data re-use Dubious parallelisabilityProblems if sequences of BLAS are applied to the same data space

= x

Page 24: The Numerical Algorithms Group

Stef Salvini 24

Over 30 years of mathematical excellenceEWOMP 2003 - 22-26 September 2003

DSYMV (Symm. Matrix Vector Product)

Requires Synchronisations, etcMany vendors do not parallelise it!

Page 25: The Numerical Algorithms Group

Stef Salvini 25

Over 30 years of mathematical excellenceEWOMP 2003 - 22-26 September 2003

Contents

Opening RemarksSMP Systems, Parallelism and OpenMPTuning OpenMP:

General PointsNag Specific Case: Needs, Solutions and Case Studies

OpenMP in Hybrid ParallelismSome Other Considerations and Conclusions

Page 26: The Numerical Algorithms Group

Stef Salvini 26

Over 30 years of mathematical excellenceEWOMP 2003 - 22-26 September 2003

NAG SMP Library

Our needsFill the gap between Serial and Parallel codesAllow maximum reuse of our products (NAG Library)Best performance achievable (parallelism must make a difference!)

Our ApproachBuild the Library based on our NAG Library (serial)Keep identical serial interfacesKeep identical functionality and numericsHide all details of parallelismUse parametrisable algorithms for easy portingThat dictates our level of granularity!

Page 27: The Numerical Algorithms Group

Stef Salvini 27

Over 30 years of mathematical excellenceEWOMP 2003 - 22-26 September 2003

NAG SMP Library Release 2

All the NAG Library Mark 19Over 1200 user-callable components

Parallelised numerical Routines inDense linear algebraSparse Linear Algebra FFTsRandom-number generationAll other routines dependent on thye above

Page 28: The Numerical Algorithms Group

Stef Salvini 28

Over 30 years of mathematical excellenceEWOMP 2003 - 22-26 September 2003

New at NAG SMP Library Release 3 (Soon)

Sparse TechnologyDirect MethodsPreconditioners for Iterative SolversEnhanced Iterative SolversBand SolversEigensolution of Large Sparse Matrices

Extended Linear Algebra CoverageMulti-Dimensional Quadrature

Page 29: The Numerical Algorithms Group

Stef Salvini 29

Over 30 years of mathematical excellenceEWOMP 2003 - 22-26 September 2003

LU Factorisation: A “Serial” Algorithm

Page 30: The Numerical Algorithms Group

Stef Salvini 30

Over 30 years of mathematical excellenceEWOMP 2003 - 22-26 September 2003

LU Factorisation: LAPACK Style

Active submatrix

Already Factorised

L Factor

U Factor

Permute the Rows (Level-1

BLAS)

Update the Trailing

Submatrix(Level-3 BLAS)

Factorise the pivot block

(Level-2 BLAS)

Solve the triangular

system (Level-3 BLAS)

U Factor

Page 31: The Numerical Algorithms Group

Stef Salvini 31

Over 30 years of mathematical excellenceEWOMP 2003 - 22-26 September 2003

About serial or quasi-serial bottlenecks

That is what we do

Identify serial bottlenecksIdentify memory access bottlenecksRemove the above or …“Hide” them using a “look ahead” strategy“Locally asynchronous” algorithms?

Page 32: The Numerical Algorithms Group

Stef Salvini 32

Over 30 years of mathematical excellenceEWOMP 2003 - 22-26 September 2003

About serial or quasi-serial bottlenecksLive with them …The reductionist approach

Further parallelisation of the bottlenecksPerhaps nested parallelism?

Use knowledge about the algorithmsPredecessor/successorTask queues

Use toolsInstrument the code to generate a stack of tasksProfile and analyse previous runs

“Hide” them using a “look ahead” strategy“Locally asynchronous” algorithms

Page 33: The Numerical Algorithms Group

Stef Salvini 33

Over 30 years of mathematical excellenceEWOMP 2003 - 22-26 September 2003

Look-ahead (Beating Amdahl’s Law?)

SerialParallelisable Parallelisable

Serial Execution

Parallel Execution

Processor 1

Processor 2

Processor 3

Processor 1

Processor 2

Processor 3

Parallel Execution with Look-Ahead

Page 34: The Numerical Algorithms Group

Stef Salvini 34

Over 30 years of mathematical excellenceEWOMP 2003 - 22-26 September 2003

LU Factorisation: LAPACK Style

Active submatrix

Already Factorised

L Factor

U Factor

Permute the Rows (Level-1

BLAS)

Update the Trailing

Submatrix(Level-3 BLAS)

Factorise the pivot block

(Level-2 BLAS)

Solve the triangular

system (Level-3 BLAS)

U Factor

Page 35: The Numerical Algorithms Group

Stef Salvini 35

Over 30 years of mathematical excellenceEWOMP 2003 - 22-26 September 2003

LU Factorisation: SMP Style (1)

Active submatrix

Already FactorisedU Factor

321 On ALL processors:•Permute the rows•Solve the Triangular system• Update

Pivot Block,Factorise on 1

After the update on 1 is done

Page 36: The Numerical Algorithms Group

Stef Salvini 36

Over 30 years of mathematical excellenceEWOMP 2003 - 22-26 September 2003

LU Factorisation: SMP Style (2)

12

31

23

Apply all the permutations from the

right in parallel

Page 37: The Numerical Algorithms Group

Stef Salvini 37

Over 30 years of mathematical excellenceEWOMP 2003 - 22-26 September 2003

LU Factorisation

0

10000

20000

30000

40000

50000

6000010

0

200

500

1000

2000

3000

4000

6000

8000 10

0

200

500

1000

2000

3000

4000

6000

8000

124816243248

Problem size (N)

Perf

orm

ance

(Mflo

ps)Sun PerflibSun F15K – 1050 MHz

N. Procs

NAG SMP Release 2

Page 38: The Numerical Algorithms Group

Stef Salvini 38

Over 30 years of mathematical excellenceEWOMP 2003 - 22-26 September 2003

QR Factorisation

0

5000

10000

15000

20000

25000

100

200

500

1000

2000

4000

8000 100

200

500

1000

2000

4000

6000

8000 100

200

500

1000

2000

4000

6000

8000

1248121620

Problem size (N)

Perf

orm

ance

(Mflo

ps) N. Procs

Sun Perflib

Sun E6800 – 900 MHz

NAG SMP Release 2LAPACK

Page 39: The Numerical Algorithms Group

Stef Salvini 39

Over 30 years of mathematical excellenceEWOMP 2003 - 22-26 September 2003

NEC SX4, LU Factorization

1000 2000 4000 1000 2000 4000

12481214

Problem size (n)

Perf

orm

ance

(Mflo

ps/s

ec)

LAPACK

NAG SMP Library

N. Procs

Page 40: The Numerical Algorithms Group

Stef Salvini 40

Over 30 years of mathematical excellenceEWOMP 2003 - 22-26 September 2003

0.0

50.0

100.0

150.0

200.0

250.0

1 2 4 8 16 24 32 48

100200

5001000

20003000

40006000

8000

LU Factorisation

Number of Processors

Exe

cutio

n T

ime

(sec

s)

Prob

lem S

ize (N

)

Sun PerflibSun F15K – 1050 MHz

Page 41: The Numerical Algorithms Group

Stef Salvini 41

Over 30 years of mathematical excellenceEWOMP 2003 - 22-26 September 2003

Other cases

TridiagonalisationHalf operations through DSYMVHalf operations through rank-k symmetric update (Level-3 BLAS)The gateway to symmetric eigenproblem

Full SVD (QR Algorithm) of a bidiagonalmatrix

Computational kernel: line rotations (Level-1 BLAS DROT)Statistics, LLS problems, rank-deficiency, etc

Page 42: The Numerical Algorithms Group

Stef Salvini 42

Over 30 years of mathematical excellenceEWOMP 2003 - 22-26 September 2003

0

5000

10000

15000

20000

25000

3000010

0

200

500

1000

2000

4000 100

200

500

1000

2000

4000

124816243248

Tridiagonalisation (Upper variant)

Problem size (N)

Perf

orm

ance

(Mflo

ps)

N. Procs

Sun PerflibSun F15K – 1050 MHz

NAG SMP Release 2

Page 43: The Numerical Algorithms Group

Stef Salvini 43

Over 30 years of mathematical excellenceEWOMP 2003 - 22-26 September 2003

0

2000

4000

6000

8000

10000

12000

1400010

0

200

500

1000

2000

4000 100

200

500

1000

2000

4000

124816243248

Full SVD of Bidiagonal Matrix (QR Algor.)

Problem size (N)

Perf

orm

ance

(Mflo

ps)

N. Procs

Sun PerflibSun F15K – 1050 MHz NAG SMP Release 2

Page 44: The Numerical Algorithms Group

Stef Salvini 44

Over 30 years of mathematical excellenceEWOMP 2003 - 22-26 September 2003

Full SVD of Bidiagonal Matrix (QR Algor.)

0

200

400

600

800

1000

1200

1400

1600

1 2 4 8 16 24 32 48 1 2 4 8 16 24 32 48

100200

5001000

20004000

N. Procs

Sun Perflib

Sun F15K – 1050 MHz

NAG SMP Release 2

Proble

m Size

(N)

Exe

cutio

n T

ime

(sec

s)

Page 45: The Numerical Algorithms Group

Stef Salvini 45

Over 30 years of mathematical excellenceEWOMP 2003 - 22-26 September 2003

Contents

Opening RemarksSMP Systems, Parallelism and OpenMPTuning OpenMP:

General PointsNag Specific Case: Needs, Solutions and Case Studies

OpenMP and Hybrid ParallelismSome Other Considerations and Conclusions

Page 46: The Numerical Algorithms Group

Stef Salvini 46

Over 30 years of mathematical excellenceEWOMP 2003 - 22-26 September 2003

Clusters of SMPs

Interconnection Sub-System

FutureHigh-endMedium-endLow-end

Technologyre-usableupgradeableLinux boxes?

Hybrid (Mixed) Model?NAG currently actively involved

Page 47: The Numerical Algorithms Group

Stef Salvini 47

Over 30 years of mathematical excellenceEWOMP 2003 - 22-26 September 2003

Some Considerations

Enormous increase of CPU performanceLess marked improvements to memory subsystemsUsing modular components

Relatively higher latency than currently

Page 48: The Numerical Algorithms Group

Stef Salvini 48

Over 30 years of mathematical excellenceEWOMP 2003 - 22-26 September 2003

Hybrid Model Paradigm

Currently:All processors the

same (MPI,etc)

“flattening mountains” …

Page 49: The Numerical Algorithms Group

Stef Salvini 49

Over 30 years of mathematical excellenceEWOMP 2003 - 22-26 September 2003

Hybrid Parallelism: Why?

High-Latency systemsIncreased levels of memoryPart-Serialisation of Message-PassingIncreased Number of Processors Competing for Communication

Interconnection Sub-System

Page 50: The Numerical Algorithms Group

Stef Salvini 50

Over 30 years of mathematical excellenceEWOMP 2003 - 22-26 September 2003

Mixed Mode Parallelism: A Model’s Goals

Maximize code re-useE.g., retain message-passing main code architectureUse existing SMP techniques and technology

Allow some form of top-down refinementIdentify bottlenecks in isolation from rest of code and improve their efficiency

Exploit a problem different levels of granularityCoarse granularity: mapped onto message-passingFine granularity: mapped onto SMP

‘Hide’ communication costs (look ahead again)Reduce load unbalancePerhaps best with problems consisting of loosely coupled components

Page 51: The Numerical Algorithms Group

Stef Salvini 51

Over 30 years of mathematical excellenceEWOMP 2003 - 22-26 September 2003

SMP on Clusters

OS-Level SSDMSCompiler-level“Architecture-aware” OPenMP

Explicit page allocation, etcRetracing HPF?Also, very much the topic of tomorrow’s panel

Page 52: The Numerical Algorithms Group

Stef Salvini 52

Over 30 years of mathematical excellenceEWOMP 2003 - 22-26 September 2003

Contents

Opening RemarksSMP Systems, Parallelism and OpenMPTuning OpenMP:

General PointsNag Specific Case: Needs, Solutions and Case Studies

OpenMP in Hybrid ParallelismSome Other Considerations and Conclusions

Page 53: The Numerical Algorithms Group

Stef Salvini 53

Over 30 years of mathematical excellenceEWOMP 2003 - 22-26 September 2003

Performance of Real Applications

Memory boundMPI “faster” than OpenMP

Data “segregation”Better access to memoryLimited cross-processor memory effects

We need

Block algorithmsBetter data locality

User-specified prefetching

Page 54: The Numerical Algorithms Group

Stef Salvini 54

Over 30 years of mathematical excellenceEWOMP 2003 - 22-26 September 2003

Example: Very Sparse Large Problems

Random matrix entries, almost diagonally dominant

Diagonal preconditioningVirtually removes the effects of preconditioning from the performance analysis

4 Case studiesNeither matrix nor vectors fit in secondary cache

N = 1000000, NNZ = 10000000, random patternN = 1000000, NNZ = 10000000, random pattern within narrow band bandwidth = 200)

Matrix does not fit in secondary cacheN = 153600, NNZ = 15000000, random patternN = 153600, NNZ = 15000000, random pattern within narrow band bandwidth = 2000)

Page 55: The Numerical Algorithms Group

Stef Salvini 55

Over 30 years of mathematical excellenceEWOMP 2003 - 22-26 September 2003

Some Scalability Results

0.0

1.0

2.0

3.0

4.0

1 2 3 4

Number of Processors

Spee

d-up

TFQMR Random, n=1000000

TFQMR Narrow band, n = 1000000

Bi-CGSTAB (4) Random, n =153600

Bi-CGSTAB (4) Narrow band, n =153600

Page 56: The Numerical Algorithms Group

Stef Salvini 56

Over 30 years of mathematical excellenceEWOMP 2003 - 22-26 September 2003

Performance of Matrix-Vector Product

0.00

1.00

2.00

3.00

4.00

5.00

6.00

7.00

8.00

Case 1 Case 2 Case 3 Case 4

Arb

itrar

y U

nits

Relative CPU times

Page 57: The Numerical Algorithms Group

Stef Salvini 57

Over 30 years of mathematical excellenceEWOMP 2003 - 22-26 September 2003

Some Food for Thought

Algorithms producersNew algorithms required

Block algorithmslatency-tolerantparallel-adaptivedynamically load-balanceable

Other aspectsExpertise

Dearth of SMP and mixed-mode parallel expertiseIncreasing need

Page 58: The Numerical Algorithms Group

Stef Salvini 58

Over 30 years of mathematical excellenceEWOMP 2003 - 22-26 September 2003

What do we need in OpenMP?More flexible synchronisations

“Level crossings” (some wait, one releases)?Partial barriers, relationships of precedence?

More flexible work sharing mechanismsData allocation/distribution

Avoid HPF constructsIs good page migration sufficient on NUMA?

User-specified prefetchingEssential for performancePortable API (or part of OpenMP)Compiler writers rather against it

Some message passing mechanism?

Page 59: The Numerical Algorithms Group

Stef Salvini 59

Over 30 years of mathematical excellenceEWOMP 2003 - 22-26 September 2003

Multi-Level Parallelism: Do We Need it?

Multi-level applicationsLoosely coupled componentsSMP Nested parallelism?

Heterogeneous applicationsVery difficult to map on OpenMP, currentlySMP nested parallelism?

Hardware requirements (Hybrid systems)Clusters of SMPs

Page 60: The Numerical Algorithms Group

Stef Salvini 60

Over 30 years of mathematical excellenceEWOMP 2003 - 22-26 September 2003

Summary

Tuning of Open MPDifficult but feasible

Considerable gainsGateway between serial and parallel worlds

Current algorithms may need revisionGood prefetching essential in the future

Challenges aheadDeveloping “look-ahead” strategies for numericalalgorithms Mapping existing numerical algorithm to future architectures (clusters of SMPs)Developing new “multi-level” algorithms

Page 61: The Numerical Algorithms Group

Stef Salvini 61

Over 30 years of mathematical excellenceEWOMP 2003 - 22-26 September 2003

Thank you

Thank youfor

your attention

Stef [email protected]