exascale computing research center (ecr) · exascale computing research center (ecr) overview ......

36
Exascale Computing Research Center (ECR) Overview Bettina Krammer, UVSQ, [email protected] William Jalby, UVSQ, [email protected]

Upload: vuxuyen

Post on 03-Apr-2018

220 views

Category:

Documents


2 download

TRANSCRIPT

Exascale Computing Research Center (ECR)

Overview

Bettina Krammer, UVSQ, [email protected] Jalby, UVSQ, [email protected]

Journée HPC, May 15, 2012, Orsay 2

Outline

• Exascale Computing Research Center• Research activities

– Co-Design– Tools

• Application Characterization• Programming and Execution Models• Performance Evaluation Tools

Journée HPC, May 15, 2012, Orsay 3

Exascale – 2018?

• 1 ExaFlops = 10^18 Flops = 1 billion x 1 billion Flops

www.top500.org

Journée HPC, May 15, 2012, Orsay 4

Exascale Challenges

Objectives of Exascale Computing Research Center (ECR): • Tackle these issues from the software angle• Work with real applications • Take into account the whole software stack• Contribution to Open Source

Power Efficiency

Parallelism Memory Access

Resiliency

Journée HPC, May 15, 2012, Orsay 5

ECR Partners (1/2)

• CEA: designing large scale applications, using and operating large teraflop scale machines

– CEA DAM: Know how of Tera/Petaflop applications/algorithms– CEA DAM: Know how of operating large machines (OS, middle

ware etc …), hosting one of the 2 Tier0 in Europe– Rich network of industry/academic collaborations (TER@TEC)

• GENCI: coordinating structure of large teraflop/petaflop scale centers

– Access to large academic and industrial user communities– Access to different large scale computing centers– French representative in the PRACE Research Infrastructure

Journée HPC, May 15, 2012, Orsay 6

ECR Partners(2/2)• UVSQ/ITACA

(joined lab between CEA DAM and UVSQ)– Compiler/code optimization expertise – Performance evaluation (benchmarks, tools) expertise– Rich network of scientific collaborations (Europe and US)

• Intel Corp– One of the world leaders in computing innovation– Intel designs and builds the essential technologies for today and

future computing devices

Complementary expertise between the 4 actors covering the software stack from applications down to microarchitecture

behavior and rich network of academic and industrial collaborations.

Journée HPC, May 15, 2012, Orsay 7

ECR Team

• Over 25 high levelresearchers, supported by recognized hardware and software international experts

• Part of a dense network of research labs in Europe and in North America

• Member of the Intel EMEA HPC Exascale labs togetherwith ExaScience lab, ExaCluster lab, and Exascale Lab Barcelona

Journée HPC, May 15, 2012, Orsay 8

ECR Organization

SOFTWARE TOOLS FOR APPLICATION CHARACTERIZATION AND PERFORMANCEOPTIMIZATION (BETTINA KRAMMER, UVSQ)

Appcharacterization

information

Run time programming

models

Performance evaluation framework

ENABLING APPLICATION CO-DESIGN (MARIE-CHRISTINE SAWLEY, INTELCORP)

Numerical kernelssccalability and

stability

Implicit parallelism, all-2-all

communications

Guidelines for designing Exa-apps

Chief Technologist (Prof. William Jalby, UVSQ)

Communication (Marc Dollfus, Intel Corp )

HPC App

Software tools

Journée HPC, May 15, 2012, Orsay 9

Courtesyof M. Masella, CEA/DSV

Present engagements on Co-design

• Life science: MD, radiotherapy and medicalimaging

• Seismic and Earth Sciences• Quantum Chemistry

Journée HPC, May 15, 2012, Orsay 10

Outline

• Exascale Computing Research Center• Research activities

– Co-Design– Tools: 3 full projects running

• Application Characterization

• Programming and Execution Models• Performance Evaluation Tools

Programming and Execution Models

Project leader: Bettina Krammer, UVSQ

Team: Marc Pérache, Patrick Carribault, CEAMarc Tchiboukdjian, Sylvain Didelot, Aurèle Maheo, Souad

Koliai, UVSQ

Journée HPC, May 15, 2012, Orsay 12

Programming and Execution Models: Objectives

• Understand the limitations and requirements of large-scale applications/algorithms, propose or implement extensions to programming and execution models:- Work with real applications- Explore programming models (MPI, OpenMP, PGAS,

Cilk, CnC, Hybrid,…) - Propose language extensions (pragma-based)- Support easy adoption of proposed work for legacy code

(C/C++/Fortran, MPI, OpenMP,…)- Support debugging and performance tools

Journée HPC, May 15, 2012, Orsay 13

Programming and Execution Models: Objectives

• Optimize application performance through efficient mapping of concepts to underlying hardware- Reduce memory consumption of runtime- Optimise inter/intra-node communication - Take into account impact of topology

• Research on programming-model extensions and runtime mechanisms relying on MPC framework

Journée HPC, May 15, 2012, Orsay 14

MPC Framework

• Input framework: MPC (MultiProcessorComputing)– Originates from CEA– Unified parallel runtime for clusters of NUMA machines– Support of MPI 1.3, OpenMP 2.5 and Pthread (all thread-based)– Integration with other HPC Components (Compiler, OS,…)– Freely available at http://mpc.sourceforge.net

Journée HPC, May 15, 2012, Orsay 15

Programming and Execution Models Some Achievements

• Scalability and memory consumption of MPI implementations– Re-design of low-level communication part of MPC (Infiniband

module)– Tradeoff between memory consumption and performance– Large-scale scalability evaluation of EulerMHD benchmark on up to

75k cores on Tera100, comparing MPC with IntelMPI and Open MPI

• Adapting the data visibility to save memory– New mechanism Hierarchical Local Storage (HLS) to share data

among MPI tasks– Cooperation between compiler and runtime system– Validated with applications, e.g. EulerMHD with 128 MB of physics

constants per MPI task consumes 3.5 less memory on 736 cores than Open MPI

Journée HPC, May 15, 2012, Orsay 16

MPI Scalability and Memory Consumption

• Evaluation of launch time and memory consumption on a microbenchmark (collective MPI operations) on Tera 100 (4-socket 8-core Nehalem EX)

0

10

20

30

40

50

32 1024 2048 4096

Memory consumption per rank (MB)

MPC 2.2.0 IntelMPI 4.0.2 OpenMPI 1.5.0

0

50

100

150

200

250

300

350

32 1024 2048 4096

Launch time (s)

MPC 2.2.0 IntelMPI 4.0.2 OpenMPI 1.5.0

Journée HPC, May 15, 2012, Orsay 17

MPI Performance Results

• Weak-scalability study on Tera100 machine

150

155

160

165

170

175

64 256 1024 4096

Exe

cuti

on

tim

e (

s)

Nb cores

PN Bench

MPC OpenMPI

0

5

10

15

20

25

30

35

40

2048 4096 8192E

xecu

tio

nti

me

(s)

Nb cores

EulerMHD

MPC OpenMPI

Journée HPC, May 15, 2012, Orsay 18

MPI Scalability

0,00

10,00

20,00

30,00

40,00

50,00

60,00

32768 49152 60000 70016 75008

Exe

cuti

on

Tim

e (s

)

MPI Strong Scalability of EulerMHD (TERA100)

IntelMPI 4.0.3 (OFA) Open MPI 1.5.0.a1 MPC 2.3.0

Execution time is unstable. Performance reported is the minimum of 3 runs.

Journée HPC, May 15, 2012, Orsay 19

Programming and Execution Models Some Achievements

• Open Source Release MPC 2.3 available since end of November

• MPC 2.4 available soon• Port of MPC on KNF processor

Journée HPC, May 15, 2012, Orsay 20

Outline

• Exascale Computing Research Center• Research activities

– Co-Design– Tools: 3 full projects running

• Application Characterization• Programming and Execution Models

• Performance Evaluation Tools

Performance Evaluation Tools

Project leader: Jean-Thomas Acquaviva, UVSQ

Team: Emmanuel Oseret, Souad Koliai, Andres Charif Rubial, Cédric Valensi, Thibault Figheira, Zakaria Bendifallah, UVSQ

Journée HPC, May 15, 2012, Orsay 22

Performance Evaluation Tools

• Goals:- Identify key bottlenecks in Petaflop and Exaflop

machines

- Define a performance methodology for fast tuning of Exaflop applications

Journée HPC, May 15, 2012, Orsay 23

Performance Evaluation Tools

• Provide a set of static and dynamic performance tools to help users to quickly

– Identify performance problems– Evaluate the potential gains– Develop appropriate work-arounds

• ECR tools (MAQAO, DECAN,…) focus on memory behavior and single node optimization

• Provide a unified environment for a combined use of existing tools, e.g.: VTune Amplifier XE, Vampir, Scalasca, MAQAO, Tau, Likwid, etc.

• Deal efficiently with the prodigious amount of measurement data

Journée HPC, May 15, 2012, Orsay 24

• Disassemble, instrument, reassemble SSE and AVX binaries- Module can be used by other performance tools (TAU,…)

24

Journée HPC, May 15, 2012, Orsay 25

• Static Performance analysis• Loop-centric

• Vectorization ratio

• Detailed pipeline model:

• Dispatch, decoder, LSD, per port pressure

• Memory access/Register Use

• Aggregate memory instructions per group

• Unrolling factor

• Static performance prediction• 'What if' the code is fully vectorized

• 'What if' the data is stored in L1/L2/L3

25

Journée HPC, May 15, 2012, Orsay 26

Performance Evaluation Tools Some Achievements

• Support of AVX– Currently supported architectures: Core2, Nehalem, Sandybridge

• Validation on real applications, e.g.– RTM from Total– QCM chemistry code from M. Caffarel:

• Practically perfect scaling from single- to multi-node• Single-node optimization: 4x improvement over original version

• Open Source releases – MAQAO 2.0 available soon

Journée HPC, May 15, 2012, Orsay 27

Outline

• Exascale Computing Research Center• Research activities

– Co-Design– Tools: 3 full projects running

• Application Characterization• Programming and Execution Models• Performance Evaluation Tools

Application Characterization

Project leader: Jean Christophe Beyler, Intel

Team: Franck Talbart, Pablo de Oliveira, Mathieu Tribalat, Yuiry Kashnikov, José Noudohouenou, Thibault Figheira, Nicolas Triquenaux, Mathieu Bordet, Nicolas Petit, Benoit Pradelle, UVSQ

Journée HPC, May 15, 2012, Orsay 29

Application Characterization Objectives• Goal: Understand the “genes” of the application and the

complex relationship between hardware, compilers, and applications

- Help hardware designers improve and test new designs quickly

- Help compiler designers rapidly tune their compiler optimization strategies

- Help application and code designers by providing tips on proper code optimization

• Codelet = code fragment (loop-based) with input data and wrappers to build and run it in stand-alone manner

Journée HPC, May 15, 2012, Orsay 30

Application Characterization• 4-step Methodology

- Extraction: extract hot code fragments (“codelets”) from applications

- Performance analysis: systematically analyze performance behavior with respect to different architectures, different compiler optimizations, Flop/W, ...

- Build the repository: store the information in a special database (repository)

- Harvest the repository: evaluate the impact of hardware features, derive optimization strategies for “codelet” categories, etc …

• Automate Methodology as much as possible

Journée HPC, May 15, 2012, Orsay 31

7. Tying it all together: CTI

2. Small Representative Codelets

3. Coarse Grain Tools

MAQAO, DECAN

4. Underlying Architecture

6. Capacity and Prediction Models

Codelet

FinderSmall

Codelets

Codelet Profiles

Optimization

Opportunities

MicroBenchmarks MDL

5. Handling all the Data

ASKMachine

Learning

1. Full Application

Application Characterization

Journée HPC, May 15, 2012, Orsay 32

Application Characterization Some Achievements

• Characterization tools– Codelet Finder (CAPS entreprise): extract hot

spots from an application– ASK: automatic domain space explorer– Energy profiling tools: provide per function

Joules usage– MicroTools: Automatic program generation and

execution– REST: runtime energy saving technology

• Tools are being integrated into CTI– Common framework for experiments– Automatic system with a web user interface

Journée HPC, May 15, 2012, Orsay 33

Codelet Tuning Infastructure (CTI)

Legend

• A single place to store a huge amount of data

• File manager

• File sharing, updating, processing, viewing

• Codelet manager

• CSV automatic file insertion

• Query the data

• Automate experiments

• Tools integrator CTI

Codelet Finder

MicroTools

DECAN MAQAO

Files

Experiment Data

To Be Done

Done

Journée HPC, May 15, 2012, Orsay 34

Conclusions

• ECR alive and kicking with a number of projects running

• ECR open to external collaborations• Open Source releases and publications

available (soon)• See us at Teratec Forum, 27-28 June 2012

Journée HPC, May 15, 2012, Orsay 35

ECR contacts

AddressUVSQ, 45 Av. des Etats-Unis, Buffon building, 5th floor78 000 Versailles, France

Web site: www.exascale-computing.euTeam

William Jalby, CT, [email protected] Sawley, Co-design, [email protected] Krammer, Tools, [email protected]

Collaboration partners

Journée HPC, May 15, 2012, Orsay 36

THANKS

QUESTIONS ??