slide-1 what makes hpc applications challenging mitre isimit lincoln laboratory benchmarking working...

27
Slide-1 What Makes HPC Applications Challenging MITRE ISI MIT Lincoln Laboratory Benchmarking Working Group Session Agenda 1:00- 1:15 David Koester What Makes HPC Applications Challenging? 1:15- 1:30 Piotr Luszczek HPCchallenge Challenges 1:30- 1:45 Fred Tracy Algorithm Comparisons of Application Benchmarks 1:45- 2:00 Henry Newman I/O Challenges 2:00- 2:15 Phil Colella The Seven Dwarfs 2:15- 2:30 Glenn Luecke Run-Time Error Detection Benchmark 2:30- 3:00 Break 3:00- 3:15 Bill Mann SSCA #1 Draft Specification 3:15- 3:30 Theresa Meuse SSCA #6 Draft Specification 3:30-?? Discussions — User Needs HPCS Vendor Needs for the MS4 Review

Upload: annabelle-stewart

Post on 16-Dec-2015

215 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Slide-1 What Makes HPC Applications Challenging MITRE ISIMIT Lincoln Laboratory Benchmarking Working Group Session Agenda 1:00-1:15David KoesterWhat Makes

Slide-1What Makes HPC

Applications Challenging

MITRE ISIMIT Lincoln Laboratory

Benchmarking Working GroupSession Agenda

1:00-1:15 David Koester What Makes HPC Applications Challenging?

1:15-1:30 Piotr Luszczek HPCchallenge Challenges

1:30-1:45 Fred Tracy Algorithm Comparisons of Application Benchmarks

1:45-2:00 Henry Newman I/O Challenges

2:00-2:15 Phil Colella The Seven Dwarfs

2:15-2:30 Glenn Luecke Run-Time Error Detection Benchmark

2:30-3:00 Break

3:00-3:15 Bill Mann SSCA #1 Draft Specification

3:15-3:30 Theresa Meuse SSCA #6 Draft Specification

3:30-?? Discussions — User Needs

HPCS Vendor Needs for the MS4 Review

HPCS Vendor Needs for the MS5 Review

HPCS Productivity Team Working Groups

Page 2: Slide-1 What Makes HPC Applications Challenging MITRE ISIMIT Lincoln Laboratory Benchmarking Working Group Session Agenda 1:00-1:15David KoesterWhat Makes

Slide-2What Makes HPC

Applications Challenging

MITRE ISIMIT Lincoln Laboratory

This work is sponsored by the Department of Defense under Army Contract W15P7T-05-C-D001.Opinions, interpretations, conclusions, and recommendations are those of the author

and are not necessarily endorsed by the United States Government.

What Makes HPC Applications Challenging?

David Koester, Ph.D

11-13 January 2005HPCS Productivity Team Meeting

Marina Del Rey, CA

Page 3: Slide-1 What Makes HPC Applications Challenging MITRE ISIMIT Lincoln Laboratory Benchmarking Working Group Session Agenda 1:00-1:15David KoesterWhat Makes

Slide-3What Makes HPC

Applications Challenging

MITRE ISIMIT Lincoln Laboratory

Outline

• HPCS Benchmark Spectrum

• What Makes HPC Applications Challenging?– Memory access patterns/locality– Processor characteristics– Concurrency– I/O characteristics– What new challenges will arise from Petascale/s+

applications?

• Bottleneckology– Amdahl’s Law– Example: Random Stride Memory Access

• Summary

Page 4: Slide-1 What Makes HPC Applications Challenging MITRE ISIMIT Lincoln Laboratory Benchmarking Working Group Session Agenda 1:00-1:15David KoesterWhat Makes

Slide-4What Makes HPC

Applications Challenging

MITRE ISIMIT Lincoln Laboratory

HPCS Benchmark Spectrum

Data Generator

1. Kernel

2. Kernel

3. Kernel

4. Kernel

Data Generator

1. Kernel

2. Kernel

3. Kernel

4. Kernel

Data Generator

1. Kernel

2. Kernel

3. Kernel

4. Kernel

Data Generator

1. Kernel

2. Kernel

3. Kernel

4. Kernel

Data Generator

1. Kernel

2. Kernel

3. Kernel

4. Kernel

Data Generator

1. Kernel

2. Kernel

3. Kernel

4. Kernel

Data Generator

1. Kernel

2. Kernel

3. Kernel

4. Kernel

Data Generator

1. Kernel

2. Kernel

3. Kernel

4. Kernel

Data Generator

1. Kernel

2. Kernel

3. Kernel

4. Kernel

Data Generator

1. Kernel

2. Kernel

3. Kernel

4. Kernel

Data Generator

1. Kernel

2. Kernel

3. Kernel

4. Kernel

Data Generator

1. Kernel

2. Kernel

3. Kernel

4. Kernel

HPCchallengeBenchmarks

Micro &Kernel

BenchmarksMission Partner

ApplicationBenchmarks

2.Graph

Analysis

2.Graph

Analysis

6.Signal

ProcessingKnowledgeFormation

Ex

isti

ng

Ap

pli

cati

on

s

Em

erg

ing

Ap

pli

cati

on

s

Fu

ture

Ap

pli

cati

on

s

Sim

ula

tio

nIn

telli

gen

ce

Re

con

nai

ssa

nce5.

SimulationMulti-Physics

1.OptimalPattern

Matching

1.OptimalPattern

Matching

4.SimulationNAS PB AU

3.SimulationNWCHEM

Scalable SyntheticCompact Applications

HPCSSpanning

Set ofKernels

Kernels

DiscreteMath…GraphAnalysis…LinearSolvers…SignalProcessing…Simulation…I/O

ExecutionPerformance

Bounds

ExecutionPerformance

Indicators

LocalDGEMMSTREAM

RandomAccess1D FFT

GlobalLinpackPTRANS

RandomAccess1D FFT

CurrentUM2000

GAMESSOVERFLOW

LBMHDRFCTHHYCOM

Near-FutureNWChemALEGRA

CCSM

Execution andDevelopment

Performance Indicators

System Bounds

Page 5: Slide-1 What Makes HPC Applications Challenging MITRE ISIMIT Lincoln Laboratory Benchmarking Working Group Session Agenda 1:00-1:15David KoesterWhat Makes

Slide-5What Makes HPC

Applications Challenging

MITRE ISIMIT Lincoln Laboratory

HPCS Benchmark Spectrum

Data Generator

1. Kernel

2. Kernel

3. Kernel

4. Kernel

Data Generator

1. Kernel

2. Kernel

3. Kernel

4. Kernel

Data Generator

1. Kernel

2. Kernel

3. Kernel

4. Kernel

Data Generator

1. Kernel

2. Kernel

3. Kernel

4. Kernel

Data Generator

1. Kernel

2. Kernel

3. Kernel

4. Kernel

Data Generator

1. Kernel

2. Kernel

3. Kernel

4. Kernel

Data Generator

1. Kernel

2. Kernel

3. Kernel

4. Kernel

Data Generator

1. Kernel

2. Kernel

3. Kernel

4. Kernel

Data Generator

1. Kernel

2. Kernel

3. Kernel

4. Kernel

Data Generator

1. Kernel

2. Kernel

3. Kernel

4. Kernel

Data Generator

1. Kernel

2. Kernel

3. Kernel

4. Kernel

Data Generator

1. Kernel

2. Kernel

3. Kernel

4. Kernel

HPCchallengeBenchmarks

Micro &Kernel

BenchmarksMission Partner

ApplicationBenchmarks

2.Graph

Analysis

2.Graph

Analysis

6.Signal

ProcessingKnowledgeFormation

Ex

isti

ng

Ap

pli

cati

on

s

Em

erg

ing

Ap

pli

cati

on

s

Fu

ture

Ap

pli

cati

on

s

Sim

ula

tio

nIn

telli

gen

ce

Re

con

nai

ssa

nce5.

SimulationMulti-Physics

1.OptimalPattern

Matching

1.OptimalPattern

Matching

4.SimulationNAS PB AU

3.SimulationNWCHEM

Scalable SyntheticCompact Applications

HPCSSpanning

Set ofKernels

Kernels

DiscreteMath…GraphAnalysis…LinearSolvers…SignalProcessing…Simulation…I/O

ExecutionPerformance

Bounds

ExecutionPerformance

Indicators

LocalDGEMMSTREAM

RandomAccess1D FFT

GlobalLinpackPTRANS

RandomAccess1D FFT

CurrentUM2000

GAMESSOVERFLOW

LBMHDRFCTHHYCOM

Near-FutureNWChemALEGRA

CCSM

Execution andDevelopment

Performance Indicators

System BoundsWhat Makes

HPC Applications Challenging?

• Full applications may be challenging due to

– Killer Kernels– Global data layouts– Input/Output

• Killer Kernels are challenging because of many things that link directly to architecture

• Identify bottlenecks by mapping applications to architectures

Page 6: Slide-1 What Makes HPC Applications Challenging MITRE ISIMIT Lincoln Laboratory Benchmarking Working Group Session Agenda 1:00-1:15David KoesterWhat Makes

Slide-6What Makes HPC

Applications Challenging

MITRE ISIMIT Lincoln Laboratory

What Makes HPC Applications Challenging?

• Memory access patterns/locality– Spatial and Temporal

Indirect addressing Data dependencies

• Processor characteristics– Processor throughput (Instructions per cycle)

Low arithmetic density Floating point versus integer

– Special features GF(2) math Popcount Integer division

• Concurrency– Ubiquitous for Petascale/s– Load balance

• I/O characteristics– Bandwidth– Latency– File access patterns– File generation rates

Killer KernelsGlobal Data Layouts

Killer Kernels

Killer KernelsGlobal Data Layouts

Input/Output

Page 7: Slide-1 What Makes HPC Applications Challenging MITRE ISIMIT Lincoln Laboratory Benchmarking Working Group Session Agenda 1:00-1:15David KoesterWhat Makes

Slide-7What Makes HPC

Applications Challenging

MITRE ISIMIT Lincoln Laboratory

Cray“Parallel Performance Killer” Kernels

Kernel Performance Characteristic

RandomAccess High demand on remote memoryNo locality

3D FFT Non-unit stridesHigh bandwidth demand

Sparse matrix-vector multiply Irregular, unpredictable locality

Adaptive mesh refinement Dynamic data distribution; dynamic parallelism

Multi-frontal method Multiple levels of parallelism

Sparse incomplete factorization Amdahl’s Law bottlenecks

Preconditioned domain decomposition Frequent large messages

Triangular solver Frequent small messages; poor ratio of computation to communication

Branch-and-bound algorithm Frequent broadcast synchronization

Page 8: Slide-1 What Makes HPC Applications Challenging MITRE ISIMIT Lincoln Laboratory Benchmarking Working Group Session Agenda 1:00-1:15David KoesterWhat Makes

Slide-8What Makes HPC

Applications Challenging

MITRE ISIMIT Lincoln Laboratory

Killer KernelsPhil Colella —The Seven Dwarfs

COMPUTATIONAL RESEARCHDIVISION

1

Algorithms that consume the bulk of the cycles of current high-end systems in DOE

• Structured Grids (including locally structured grids, e.g. AMR)

• Unstructured Grids• Fast Fourier Transform• Dense Linear Algebra• Sparse Linear Algebra • Particles• Monte Carlo(Should also include optimization / solution of nonlinear

systems, which at the high end is something one uses mainly in conjunction with the other seven)

Page 9: Slide-1 What Makes HPC Applications Challenging MITRE ISIMIT Lincoln Laboratory Benchmarking Working Group Session Agenda 1:00-1:15David KoesterWhat Makes

Slide-9What Makes HPC

Applications Challenging

MITRE ISIMIT Lincoln Laboratory

Mission Partner Applications

• How do mission partner applications relate to HPCS spatial/temporal view of memory?– Kernels?– Full applications?

• How do mission partner applications relate to HPCS spatial/temporal view of memory?– Kernels?– Full applications?

0

0.2

0.4

0.6

0.8

1

0.75 0.8 0.85 0.9 0.95 1

Spatial Locality

Tem

po

ral L

oca

lity

AVUS

FFT

HPL

NAS CG C

RandomAccess STREAM

HPCS Challenge PointsHPCchallenge Benchmarks HPCS Challenge Points

HPCchallenge Benchmarks

HighLowLow

PTRANS

MissionPartner

Applications

Tem

pora

lLoc

ality

Spatial Locality

RandomAccess STREAM

HPLHighHigh

FFT

Memory Access Patterns/Locality

Page 10: Slide-1 What Makes HPC Applications Challenging MITRE ISIMIT Lincoln Laboratory Benchmarking Working Group Session Agenda 1:00-1:15David KoesterWhat Makes

Slide-10What Makes HPC

Applications Challenging

MITRE ISIMIT Lincoln Laboratory

Processor CharacteristicsSpecial Features

• Comparison of similar speed MIPS processors with and without

– GF(2) math– Popcount

• Similar or better performance reported using Alpha processors (Jack Collins (NCIFCRF))

• Codes– Cray-supplied library– The Portable Cray Bioinformatics

Library by ARSC

• References– http://www.cray.com/downloads/biolib.pdf

– http://cbl.sourceforge.net/

Algorithmic speedup of 120x

Page 11: Slide-1 What Makes HPC Applications Challenging MITRE ISIMIT Lincoln Laboratory Benchmarking Working Group Session Agenda 1:00-1:15David KoesterWhat Makes

Slide-11What Makes HPC

Applications Challenging

MITRE ISIMIT Lincoln Laboratory

Concurrency

Insert Cluttered VAMPIR Plot here

Page 12: Slide-1 What Makes HPC Applications Challenging MITRE ISIMIT Lincoln Laboratory Benchmarking Working Group Session Agenda 1:00-1:15David KoesterWhat Makes

Slide-12What Makes HPC

Applications Challenging

MITRE ISIMIT Lincoln Laboratory

I/O Relative Data Latency‡

Note: 11 orders of magnitude relative differences!

1.0E+001.0E+011.0E+021.0E+031.0E+041.0E+051.0E+061.0E+071.0E+081.0E+091.0E+101.0E+11

Late

ncy

Diff

eren

ces

‡Henry Newman (Instrumental)

Page 13: Slide-1 What Makes HPC Applications Challenging MITRE ISIMIT Lincoln Laboratory Benchmarking Working Group Session Agenda 1:00-1:15David KoesterWhat Makes

Slide-13What Makes HPC

Applications Challenging

MITRE ISIMIT Lincoln Laboratory

I/O Relative Data Bandwidth per CPU‡

1.0E-02

1.0E-01

1.0E+00

1.0E+01

1.0E+02

1.0E+03

CPURegisters

L1 Cache L2 Cache Memory Disk NAS Tape

Tim

es d

iffer

nce

Note: 5 orders of magnitude relative differences!‡Henry Newman (Instrumental)

Page 14: Slide-1 What Makes HPC Applications Challenging MITRE ISIMIT Lincoln Laboratory Benchmarking Working Group Session Agenda 1:00-1:15David KoesterWhat Makes

Slide-14What Makes HPC

Applications Challenging

MITRE ISIMIT Lincoln Laboratory

StrawmanHPCS I/O Goals/Challenges

• 1 Trillion files in a single file system– 32K file creates per second

• 10K metadata operations per second– Needed for Checkpoint/Restart files

• Streaming I/O at 30 GB/sec full duplex– Needed for data capture

• Support for 30K nodes– Future file system need low latency communication

An envelope on HPCS Mission Partner requirementsAn envelope on HPCS Mission Partner requirements

Page 15: Slide-1 What Makes HPC Applications Challenging MITRE ISIMIT Lincoln Laboratory Benchmarking Working Group Session Agenda 1:00-1:15David KoesterWhat Makes

Slide-15What Makes HPC

Applications Challenging

MITRE ISIMIT Lincoln Laboratory

HPCS Benchmark Spectrum Future and Emerging Applications

Data Generator

1. Kernel

2. Kernel

3. Kernel

4. Kernel

Data Generator

1. Kernel

2. Kernel

3. Kernel

4. Kernel

Data Generator

1. Kernel

2. Kernel

3. Kernel

4. Kernel

Data Generator

1. Kernel

2. Kernel

3. Kernel

4. Kernel

Data Generator

1. Kernel

2. Kernel

3. Kernel

4. Kernel

Data Generator

1. Kernel

2. Kernel

3. Kernel

4. Kernel

Data Generator

1. Kernel

2. Kernel

3. Kernel

4. Kernel

Data Generator

1. Kernel

2. Kernel

3. Kernel

4. Kernel

Data Generator

1. Kernel

2. Kernel

3. Kernel

4. Kernel

Data Generator

1. Kernel

2. Kernel

3. Kernel

4. Kernel

Data Generator

1. Kernel

2. Kernel

3. Kernel

4. Kernel

Data Generator

1. Kernel

2. Kernel

3. Kernel

4. Kernel

HPCchallengeBenchmarks

Micro &Kernel

BenchmarksMission Partner

ApplicationBenchmarks

2.Graph

Analysis

2.Graph

Analysis

6.Signal

ProcessingKnowledgeFormation

Ex

isti

ng

Ap

pli

cati

on

s

Em

erg

ing

Ap

pli

cati

on

s

Fu

ture

Ap

pli

cati

on

s

Sim

ula

tio

nIn

telli

gen

ce

Re

con

nai

ssa

nce5.

SimulationMulti-Physics

1.OptimalPattern

Matching

1.OptimalPattern

Matching

4.SimulationNAS PB AU

3.SimulationNWCHEM

Scalable SyntheticCompact Applications

HPCSSpanning

Set ofKernels

Kernels

DiscreteMath…GraphAnalysis…LinearSolvers…SignalProcessing…Simulation…I/O

ExecutionPerformance

Bounds

ExecutionPerformance

Indicators

LocalDGEMMSTREAM

RandomAccess1D FFT

GlobalLinpackPTRANS

RandomAccess1D FFT

CurrentUM2000

GAMESSOVERFLOW

LBMHDRFCTHHYCOM

Near-FutureNWChemALEGRA

CCSM

Execution andDevelopment

Performance Indicators

System Bounds

• Identifying HPCS Mission Partner efforts

– 10-20K processor — 10-100 Teraflop/s scale applications

– 20-120K processor — 100-300 Teraflop/s scale applications

– Petascale/s applications– Applications beyond Petascale/s

• LACSI Workshop — The Path to Extreme Supercomputing

– 12 October 2004– http://www.zettaflops/org

• What new challenges will arise from Petascale/s+ applications?• What new challenges will arise from Petascale/s+ applications?

Page 16: Slide-1 What Makes HPC Applications Challenging MITRE ISIMIT Lincoln Laboratory Benchmarking Working Group Session Agenda 1:00-1:15David KoesterWhat Makes

Slide-16What Makes HPC

Applications Challenging

MITRE ISIMIT Lincoln Laboratory

Outline

• HPCS Benchmark Spectrum

• What Makes HPC Applications Challenging?– Memory access patterns/locality– Processor characteristics– Parallelism– I/O characteristics– What new challenges will arise from Petascale/s+

applications?

• Bottleneckology– Amdahl’s Law– Example: Random Stride Memory Access

• Summary

Page 17: Slide-1 What Makes HPC Applications Challenging MITRE ISIMIT Lincoln Laboratory Benchmarking Working Group Session Agenda 1:00-1:15David KoesterWhat Makes

Slide-17What Makes HPC

Applications Challenging

MITRE ISIMIT Lincoln Laboratory

Bottleneckology

• Bottleneckology– Where is performance lost when an application is run on an

architecture?– When does it make sense to invest in architecture to improve

application performance?– System analysis driven by an extended Amdahl’s Law

Amdahl’s Law is not just about parallel and sequential parts of applications!

• References:– Jack Worlton, "Project Bottleneck: A Proposed Toolkit for

Evaluating Newly-Announced High Performance Computers", Worlton and Associates, Los Alamos, NM, Technical Report No.13,January 1988

– Montek Singh, “Lecture Notes — Computer Architecture and Implementation: COMP 206”, Dept. of Computer Science, Univ. of North Carolina at Chapel Hill, Aug 30, 2004www.cs.unc.edu/~montek/teaching/ fall-04/lectures/lecture-2.ppt

Page 18: Slide-1 What Makes HPC Applications Challenging MITRE ISIMIT Lincoln Laboratory Benchmarking Working Group Session Agenda 1:00-1:15David KoesterWhat Makes

Slide-18What Makes HPC

Applications Challenging

MITRE ISIMIT Lincoln Laboratory

Lecture Notes — Computer Architecture and Implementation (5)‡

‡Montek Singh (UNC)

Page 19: Slide-1 What Makes HPC Applications Challenging MITRE ISIMIT Lincoln Laboratory Benchmarking Working Group Session Agenda 1:00-1:15David KoesterWhat Makes

Slide-19What Makes HPC

Applications Challenging

MITRE ISIMIT Lincoln Laboratory

Lecture Notes — Computer Architecture and Implementation (6)‡

‡Montek Singh (UNC)

Page 20: Slide-1 What Makes HPC Applications Challenging MITRE ISIMIT Lincoln Laboratory Benchmarking Working Group Session Agenda 1:00-1:15David KoesterWhat Makes

Slide-20What Makes HPC

Applications Challenging

MITRE ISIMIT Lincoln Laboratory

Lecture Notes — Computer Architecture and Implementation (7)‡

Also works for Rate = Bandwidth!

‡Montek Singh (UNC)

Page 21: Slide-1 What Makes HPC Applications Challenging MITRE ISIMIT Lincoln Laboratory Benchmarking Working Group Session Agenda 1:00-1:15David KoesterWhat Makes

Slide-21What Makes HPC

Applications Challenging

MITRE ISIMIT Lincoln Laboratory

Lecture Notes — Computer Architecture and Implementation (8)‡

‡Montek Singh (UNC)

Page 22: Slide-1 What Makes HPC Applications Challenging MITRE ISIMIT Lincoln Laboratory Benchmarking Working Group Session Agenda 1:00-1:15David KoesterWhat Makes

Slide-22What Makes HPC

Applications Challenging

MITRE ISIMIT Lincoln Laboratory

Bottleneck Example (1)

• Combine stride 1 and random stride memory access

– 25% random stride access– 33% random stride access

• Memory bandwidth performance is dominated by the random stride memory access

SDSC MAPS on an IBM SP-3

Page 23: Slide-1 What Makes HPC Applications Challenging MITRE ISIMIT Lincoln Laboratory Benchmarking Working Group Session Agenda 1:00-1:15David KoesterWhat Makes

Slide-23What Makes HPC

Applications Challenging

MITRE ISIMIT Lincoln Laboratory

Bottleneck Example (2)

• Combine stride 1 and random stride memory access

– 25% random stride access– 33% random stride access

• Memory bandwidth performance is dominated by the random stride memory access

SDSC MAPS on a COMPAQ Alphaserver

Amdahl’s Law [ 7000 / (7*0.25 + 0.75) ] = 2800 MB/s

Page 24: Slide-1 What Makes HPC Applications Challenging MITRE ISIMIT Lincoln Laboratory Benchmarking Working Group Session Agenda 1:00-1:15David KoesterWhat Makes

Slide-24What Makes HPC

Applications Challenging

MITRE ISIMIT Lincoln Laboratory

Bottleneck Example (2)

• Combine stride 1 and random stride memory access

– 25% random stride access– 33% random stride access

• Memory bandwidth performance is dominated by the random stride memory access

SDSC MAPS on a COMPAQ Alphaserver

Amdahl’s Law [ 7000 / (7*0.25 + 0.75) ] = 2800 MB/s

• Some HPCS Mission Partner applications– Extensive random stride memory access – Some random stride memory access

• However, even a small amount of random memory access can cause significant bottlenecks!

• Some HPCS Mission Partner applications– Extensive random stride memory access – Some random stride memory access

• However, even a small amount of random memory access can cause significant bottlenecks!

Page 25: Slide-1 What Makes HPC Applications Challenging MITRE ISIMIT Lincoln Laboratory Benchmarking Working Group Session Agenda 1:00-1:15David KoesterWhat Makes

Slide-25What Makes HPC

Applications Challenging

MITRE ISIMIT Lincoln Laboratory

Outline

• HPCS Benchmark Spectrum

• What Makes HPC Applications Challenging?– Memory access patterns/locality– Processor characteristics– Parallelism– I/O characteristics– What new challenges will arise from Petascale/s+

applications?

• Bottleneckology– Amdahl’s Law– Example: Random Stride Memory Access

• Summary

Page 26: Slide-1 What Makes HPC Applications Challenging MITRE ISIMIT Lincoln Laboratory Benchmarking Working Group Session Agenda 1:00-1:15David KoesterWhat Makes

Slide-26What Makes HPC

Applications Challenging

MITRE ISIMIT Lincoln Laboratory

Summary (1)

• Memory access patterns/locality– Spatial and Temporal

Indirect addressing Data dependencies

• Processor characteristics– Processor throughput (Instructions per cycle)

Low arithmetic density Floating point versus integer

– Special features GF(2) math Popcount Integer division

• Parallelism– Ubiquitous for Petascale/s– Load balance

• I/O characteristics– Bandwidth– Latency– File access patterns– File generation rates

What makes Applications Challenging!

• Expand this List as required• Work toward consensus with

– HPCS Mission Partners– HPCS Vendors

• Understand Bottlenecks• Characterize applications• Characterize architectures

• Expand this List as required• Work toward consensus with

– HPCS Mission Partners– HPCS Vendors

• Understand Bottlenecks• Characterize applications• Characterize architectures

Page 27: Slide-1 What Makes HPC Applications Challenging MITRE ISIMIT Lincoln Laboratory Benchmarking Working Group Session Agenda 1:00-1:15David KoesterWhat Makes

Slide-27What Makes HPC

Applications Challenging

MITRE ISIMIT Lincoln Laboratory

HPCS Benchmark Spectrum

Data Generator

1. Kernel

2. Kernel

3. Kernel

4. Kernel

Data Generator

1. Kernel

2. Kernel

3. Kernel

4. Kernel

Data Generator

1. Kernel

2. Kernel

3. Kernel

4. Kernel

Data Generator

1. Kernel

2. Kernel

3. Kernel

4. Kernel

Data Generator

1. Kernel

2. Kernel

3. Kernel

4. Kernel

Data Generator

1. Kernel

2. Kernel

3. Kernel

4. Kernel

Data Generator

1. Kernel

2. Kernel

3. Kernel

4. Kernel

Data Generator

1. Kernel

2. Kernel

3. Kernel

4. Kernel

Data Generator

1. Kernel

2. Kernel

3. Kernel

4. Kernel

Data Generator

1. Kernel

2. Kernel

3. Kernel

4. Kernel

Data Generator

1. Kernel

2. Kernel

3. Kernel

4. Kernel

Data Generator

1. Kernel

2. Kernel

3. Kernel

4. Kernel

HPCchallengeBenchmarks

Micro &Kernel

BenchmarksMission Partner

ApplicationBenchmarks

2.Graph

Analysis

2.Graph

Analysis

6.Signal

ProcessingKnowledgeFormation

Ex

isti

ng

Ap

pli

cati

on

s

Em

erg

ing

Ap

pli

cati

on

s

Fu

ture

Ap

pli

cati

on

s

Sim

ula

tio

nIn

telli

gen

ce

Re

con

nai

ssa

nce5.

SimulationMulti-Physics

1.OptimalPattern

Matching

1.OptimalPattern

Matching

4.SimulationNAS PB AU

3.SimulationNWCHEM

Scalable SyntheticCompact Applications

HPCSSpanning

Set ofKernels

Kernels

DiscreteMath…GraphAnalysis…LinearSolvers…SignalProcessing…Simulation…I/O

ExecutionPerformance

Bounds

ExecutionPerformance

Indicators

LocalDGEMMSTREAM

RandomAccess1D FFT

GlobalLinpackPTRANS

RandomAccess1D FFT

CurrentUM2000

GAMESSOVERFLOW

LBMHDRFCTHHYCOM

Near-FutureNWChemALEGRA

CCSM

Execution andDevelopment

Performance Indicators

System BoundsWhat Makes

HPC Applications Challenging?

• Full applications may be challenging due to

– Killer Kernels– Global data layouts– Input/Output

• Killer Kernels are challenging because of many things that link directly to architecture

• Identify bottlenecks by mapping applications to architectures

Impress upon the HPCS community to identify

what makes the application challenging when using an existing

Mission Partner application for a systems

analysis in the MS4 review