exascale computing research center (ecr) · exascale computing research center (ecr) overview ......
TRANSCRIPT
Exascale Computing Research Center (ECR)
Overview
Bettina Krammer, UVSQ, [email protected] Jalby, UVSQ, [email protected]
Journée HPC, May 15, 2012, Orsay 2
Outline
• Exascale Computing Research Center• Research activities
– Co-Design– Tools
• Application Characterization• Programming and Execution Models• Performance Evaluation Tools
Journée HPC, May 15, 2012, Orsay 3
Exascale – 2018?
• 1 ExaFlops = 10^18 Flops = 1 billion x 1 billion Flops
www.top500.org
Journée HPC, May 15, 2012, Orsay 4
Exascale Challenges
Objectives of Exascale Computing Research Center (ECR): • Tackle these issues from the software angle• Work with real applications • Take into account the whole software stack• Contribution to Open Source
Power Efficiency
Parallelism Memory Access
Resiliency
Journée HPC, May 15, 2012, Orsay 5
ECR Partners (1/2)
• CEA: designing large scale applications, using and operating large teraflop scale machines
– CEA DAM: Know how of Tera/Petaflop applications/algorithms– CEA DAM: Know how of operating large machines (OS, middle
ware etc …), hosting one of the 2 Tier0 in Europe– Rich network of industry/academic collaborations (TER@TEC)
• GENCI: coordinating structure of large teraflop/petaflop scale centers
– Access to large academic and industrial user communities– Access to different large scale computing centers– French representative in the PRACE Research Infrastructure
Journée HPC, May 15, 2012, Orsay 6
ECR Partners(2/2)• UVSQ/ITACA
(joined lab between CEA DAM and UVSQ)– Compiler/code optimization expertise – Performance evaluation (benchmarks, tools) expertise– Rich network of scientific collaborations (Europe and US)
• Intel Corp– One of the world leaders in computing innovation– Intel designs and builds the essential technologies for today and
future computing devices
Complementary expertise between the 4 actors covering the software stack from applications down to microarchitecture
behavior and rich network of academic and industrial collaborations.
Journée HPC, May 15, 2012, Orsay 7
ECR Team
• Over 25 high levelresearchers, supported by recognized hardware and software international experts
• Part of a dense network of research labs in Europe and in North America
• Member of the Intel EMEA HPC Exascale labs togetherwith ExaScience lab, ExaCluster lab, and Exascale Lab Barcelona
Journée HPC, May 15, 2012, Orsay 8
ECR Organization
SOFTWARE TOOLS FOR APPLICATION CHARACTERIZATION AND PERFORMANCEOPTIMIZATION (BETTINA KRAMMER, UVSQ)
Appcharacterization
information
Run time programming
models
Performance evaluation framework
ENABLING APPLICATION CO-DESIGN (MARIE-CHRISTINE SAWLEY, INTELCORP)
Numerical kernelssccalability and
stability
Implicit parallelism, all-2-all
communications
Guidelines for designing Exa-apps
Chief Technologist (Prof. William Jalby, UVSQ)
Communication (Marc Dollfus, Intel Corp )
HPC App
Software tools
Journée HPC, May 15, 2012, Orsay 9
Courtesyof M. Masella, CEA/DSV
Present engagements on Co-design
• Life science: MD, radiotherapy and medicalimaging
• Seismic and Earth Sciences• Quantum Chemistry
Journée HPC, May 15, 2012, Orsay 10
Outline
• Exascale Computing Research Center• Research activities
– Co-Design– Tools: 3 full projects running
• Application Characterization
• Programming and Execution Models• Performance Evaluation Tools
Programming and Execution Models
Project leader: Bettina Krammer, UVSQ
Team: Marc Pérache, Patrick Carribault, CEAMarc Tchiboukdjian, Sylvain Didelot, Aurèle Maheo, Souad
Koliai, UVSQ
Journée HPC, May 15, 2012, Orsay 12
Programming and Execution Models: Objectives
• Understand the limitations and requirements of large-scale applications/algorithms, propose or implement extensions to programming and execution models:- Work with real applications- Explore programming models (MPI, OpenMP, PGAS,
Cilk, CnC, Hybrid,…) - Propose language extensions (pragma-based)- Support easy adoption of proposed work for legacy code
(C/C++/Fortran, MPI, OpenMP,…)- Support debugging and performance tools
Journée HPC, May 15, 2012, Orsay 13
Programming and Execution Models: Objectives
• Optimize application performance through efficient mapping of concepts to underlying hardware- Reduce memory consumption of runtime- Optimise inter/intra-node communication - Take into account impact of topology
• Research on programming-model extensions and runtime mechanisms relying on MPC framework
Journée HPC, May 15, 2012, Orsay 14
MPC Framework
• Input framework: MPC (MultiProcessorComputing)– Originates from CEA– Unified parallel runtime for clusters of NUMA machines– Support of MPI 1.3, OpenMP 2.5 and Pthread (all thread-based)– Integration with other HPC Components (Compiler, OS,…)– Freely available at http://mpc.sourceforge.net
Journée HPC, May 15, 2012, Orsay 15
Programming and Execution Models Some Achievements
• Scalability and memory consumption of MPI implementations– Re-design of low-level communication part of MPC (Infiniband
module)– Tradeoff between memory consumption and performance– Large-scale scalability evaluation of EulerMHD benchmark on up to
75k cores on Tera100, comparing MPC with IntelMPI and Open MPI
• Adapting the data visibility to save memory– New mechanism Hierarchical Local Storage (HLS) to share data
among MPI tasks– Cooperation between compiler and runtime system– Validated with applications, e.g. EulerMHD with 128 MB of physics
constants per MPI task consumes 3.5 less memory on 736 cores than Open MPI
Journée HPC, May 15, 2012, Orsay 16
MPI Scalability and Memory Consumption
• Evaluation of launch time and memory consumption on a microbenchmark (collective MPI operations) on Tera 100 (4-socket 8-core Nehalem EX)
0
10
20
30
40
50
32 1024 2048 4096
Memory consumption per rank (MB)
MPC 2.2.0 IntelMPI 4.0.2 OpenMPI 1.5.0
0
50
100
150
200
250
300
350
32 1024 2048 4096
Launch time (s)
MPC 2.2.0 IntelMPI 4.0.2 OpenMPI 1.5.0
Journée HPC, May 15, 2012, Orsay 17
MPI Performance Results
• Weak-scalability study on Tera100 machine
150
155
160
165
170
175
64 256 1024 4096
Exe
cuti
on
tim
e (
s)
Nb cores
PN Bench
MPC OpenMPI
0
5
10
15
20
25
30
35
40
2048 4096 8192E
xecu
tio
nti
me
(s)
Nb cores
EulerMHD
MPC OpenMPI
Journée HPC, May 15, 2012, Orsay 18
MPI Scalability
0,00
10,00
20,00
30,00
40,00
50,00
60,00
32768 49152 60000 70016 75008
Exe
cuti
on
Tim
e (s
)
MPI Strong Scalability of EulerMHD (TERA100)
IntelMPI 4.0.3 (OFA) Open MPI 1.5.0.a1 MPC 2.3.0
Execution time is unstable. Performance reported is the minimum of 3 runs.
Journée HPC, May 15, 2012, Orsay 19
Programming and Execution Models Some Achievements
• Open Source Release MPC 2.3 available since end of November
• MPC 2.4 available soon• Port of MPC on KNF processor
Journée HPC, May 15, 2012, Orsay 20
Outline
• Exascale Computing Research Center• Research activities
– Co-Design– Tools: 3 full projects running
• Application Characterization• Programming and Execution Models
• Performance Evaluation Tools
Performance Evaluation Tools
Project leader: Jean-Thomas Acquaviva, UVSQ
Team: Emmanuel Oseret, Souad Koliai, Andres Charif Rubial, Cédric Valensi, Thibault Figheira, Zakaria Bendifallah, UVSQ
Journée HPC, May 15, 2012, Orsay 22
Performance Evaluation Tools
• Goals:- Identify key bottlenecks in Petaflop and Exaflop
machines
- Define a performance methodology for fast tuning of Exaflop applications
Journée HPC, May 15, 2012, Orsay 23
Performance Evaluation Tools
• Provide a set of static and dynamic performance tools to help users to quickly
– Identify performance problems– Evaluate the potential gains– Develop appropriate work-arounds
• ECR tools (MAQAO, DECAN,…) focus on memory behavior and single node optimization
• Provide a unified environment for a combined use of existing tools, e.g.: VTune Amplifier XE, Vampir, Scalasca, MAQAO, Tau, Likwid, etc.
• Deal efficiently with the prodigious amount of measurement data
Journée HPC, May 15, 2012, Orsay 24
• Disassemble, instrument, reassemble SSE and AVX binaries- Module can be used by other performance tools (TAU,…)
24
Journée HPC, May 15, 2012, Orsay 25
• Static Performance analysis• Loop-centric
• Vectorization ratio
• Detailed pipeline model:
• Dispatch, decoder, LSD, per port pressure
• Memory access/Register Use
• Aggregate memory instructions per group
• Unrolling factor
• Static performance prediction• 'What if' the code is fully vectorized
• 'What if' the data is stored in L1/L2/L3
25
Journée HPC, May 15, 2012, Orsay 26
Performance Evaluation Tools Some Achievements
• Support of AVX– Currently supported architectures: Core2, Nehalem, Sandybridge
• Validation on real applications, e.g.– RTM from Total– QCM chemistry code from M. Caffarel:
• Practically perfect scaling from single- to multi-node• Single-node optimization: 4x improvement over original version
• Open Source releases – MAQAO 2.0 available soon
Journée HPC, May 15, 2012, Orsay 27
Outline
• Exascale Computing Research Center• Research activities
– Co-Design– Tools: 3 full projects running
• Application Characterization• Programming and Execution Models• Performance Evaluation Tools
Application Characterization
Project leader: Jean Christophe Beyler, Intel
Team: Franck Talbart, Pablo de Oliveira, Mathieu Tribalat, Yuiry Kashnikov, José Noudohouenou, Thibault Figheira, Nicolas Triquenaux, Mathieu Bordet, Nicolas Petit, Benoit Pradelle, UVSQ
Journée HPC, May 15, 2012, Orsay 29
Application Characterization Objectives• Goal: Understand the “genes” of the application and the
complex relationship between hardware, compilers, and applications
- Help hardware designers improve and test new designs quickly
- Help compiler designers rapidly tune their compiler optimization strategies
- Help application and code designers by providing tips on proper code optimization
• Codelet = code fragment (loop-based) with input data and wrappers to build and run it in stand-alone manner
Journée HPC, May 15, 2012, Orsay 30
Application Characterization• 4-step Methodology
- Extraction: extract hot code fragments (“codelets”) from applications
- Performance analysis: systematically analyze performance behavior with respect to different architectures, different compiler optimizations, Flop/W, ...
- Build the repository: store the information in a special database (repository)
- Harvest the repository: evaluate the impact of hardware features, derive optimization strategies for “codelet” categories, etc …
• Automate Methodology as much as possible
Journée HPC, May 15, 2012, Orsay 31
7. Tying it all together: CTI
2. Small Representative Codelets
3. Coarse Grain Tools
MAQAO, DECAN
4. Underlying Architecture
6. Capacity and Prediction Models
Codelet
FinderSmall
Codelets
Codelet Profiles
Optimization
Opportunities
MicroBenchmarks MDL
5. Handling all the Data
ASKMachine
Learning
1. Full Application
Application Characterization
Journée HPC, May 15, 2012, Orsay 32
Application Characterization Some Achievements
• Characterization tools– Codelet Finder (CAPS entreprise): extract hot
spots from an application– ASK: automatic domain space explorer– Energy profiling tools: provide per function
Joules usage– MicroTools: Automatic program generation and
execution– REST: runtime energy saving technology
• Tools are being integrated into CTI– Common framework for experiments– Automatic system with a web user interface
Journée HPC, May 15, 2012, Orsay 33
Codelet Tuning Infastructure (CTI)
Legend
• A single place to store a huge amount of data
• File manager
• File sharing, updating, processing, viewing
• Codelet manager
• CSV automatic file insertion
• Query the data
• Automate experiments
• Tools integrator CTI
Codelet Finder
MicroTools
DECAN MAQAO
Files
Experiment Data
To Be Done
Done
Journée HPC, May 15, 2012, Orsay 34
Conclusions
• ECR alive and kicking with a number of projects running
• ECR open to external collaborations• Open Source releases and publications
available (soon)• See us at Teratec Forum, 27-28 June 2012
Journée HPC, May 15, 2012, Orsay 35
ECR contacts
AddressUVSQ, 45 Av. des Etats-Unis, Buffon building, 5th floor78 000 Versailles, France
Web site: www.exascale-computing.euTeam
William Jalby, CT, [email protected] Sawley, Co-design, [email protected] Krammer, Tools, [email protected]
Collaboration partners