the race towards exascale - hpc advisory council · exascale will be enabled via co-design...
TRANSCRIPT
The Race Towards Exascale Co-Design Architecture For Next Generation HPC Systems
Gilad Shainer
HPC Advisory Council
China, November 2015
2
The Ever Growing Demand for Higher Performance
2000 202020102005
“Roadrunner”
1st
2015
Terascale Petascale Exascale
Single-Core to Many-CoreSMP to Clusters
Performance Development
Co-Design
HW SW
APP
Hardware
Software
Application
Technology Development
3
Co-Design Architecture to Enable Exascale Performance
CPU-Centric Co-Design
Limited to Main CPU Usage
Results in Performance Limitation
Creating Synergies
Enables Higher Performance and Scale
Software
In-CPU
Computing
In-Network
Computing
In-Storage
Computing
4
Exascale will be Enabled via Co-Design Architecture
Software – Hardware
Hardware – Hardware (e.g. GPU-Direct)
Software – Software (e.g. OpenUCX)
Industry – Users – Academia
Standard, Open Source, Eco-System
Programmable, Configurable, Innovative
HW SW
APP Hardware
Software
Application
5
Breaking the Application Latency Wall
• Today: Network device latencies are on the order of 100 nanoseconds
• Challenge: Enabling the next order of magnitude improvement in application performance
• Solution: Creating synergies between software and hardware
Co-Design Paves the Road to Exascale Performance
10 years ago
~10
microsecond
~100
microsecond
NetworkCommunication
Framework
Today
~10
microsecond
Communication
Framework
~0.1
microsecond
Network
~1
microsecond
Communication
Framework
Future
~0.05
microsecond
Co-Design
Network
6
The Pizza Example
Processing Processing Processing Transportation
More CPUs – More Pizzas
Same Time per Pizza
7
Accelerating The Process Via Co-Design
Processing Processing Processing Transportation
Hiding The Baking Time
Faster Pizza Delivery
8
• The road to Exascale requires order of magnitude performance improvements
• Co-Design architecture enables all active devices to become co-processors
The Road to Exascale – Co-Design System Architecture
Mapping Communication Frameworks on All Active Devices
9
Co-Design Architecture Depends on Offloading Technologies
Programmability
RDMA GPUDirect Virtualization
Backward and Future Compatibility
Direct Communications
Applications (Innovations, Scalability, Performance)
Software-Defined Network
(SDN)
Co-Design Requires Intelligence Everywhere
Offloading Technologies: Intelligent Components (Interconnect, Storage etc.)
10
Paving The Road to Exascale
11
Exascale Co-Design Collaboration
Collaborative Effort
Industry, National Laboratories and Academia
The Next Generation
HPC Software Framework
12
Collaboration
• Mellanox co-designs network interface and contributes MXM technology
– Infrastructure, transport, shared memory, protocols, integration with OpenMPI/SHMEM,
MPICH
• ORNL co-designs network interface and contributes UCCS project
– InfiniBand optimizations, Cray devices, shared memory
• NVIDIA co-designs high-quality support for GPU devices
– GPU-Direct, GDR copy, etc.
• IBM co-designs network interface and contributes ideas and concepts from PAMI
• UH/UTK focus on integration with their research platforms
13
What We Don’t Want to Achieve
14
Different Model - Co-Design Effort
• Co-design effort between national laboratories, academia, and industry
Applications: LAMMPS, NWCHEM, etc.
Programming models: MPI, PGAS/Gasnet, etc.
Middleware: UCX
Driver and Hardware
Co
-Desig
n
15
UCX Framework Mission
• Collaboration between industry, laboratories, and academia
• Create open-source production grade communication framework for HPC applications
• To enable the highest performance through co-design of software-hardware interfaces
• To unify industry - national laboratories - academia efforts
Performance oriented
Optimization for low-software overheads in
communication path allows near native-level
performance
Community driven
Collaboration between industry, laboratories,
and academia
Production quality
Developed, maintained, tested, and used by
industry and researcher community
API
Exposes broad semantics that target data centric and HPC programming models and
applications
Research
The framework concepts and ideas are
driven by research in academia, laboratories,
and industry
Cross platform
Support for Infiniband, Cray, various shared
memory (x86-64 and Power), GPUs
Co-design of Exascale Network APIs
16
UCX High-level Overview
17
UCX Information
18
GPUDirect RDMA and Sync (3.0 and 4.0)
• Hardware – Hardware co-design
19
HPC|Music Project
HPC music is an advanced research project about High Performance Computing and Music
Production dedicated to enable HPC in music creation. Its goal is to develop HPC cluster
and cloud solutions that further enable the future of music production and reproduction.
Thank You!
Web: www.hpcadvisorycouncil.com
Email: [email protected]
Facebook: http://www.facebook.com/HPCAdvisoryCouncil
Twitter: www.twitter.com/hpccouncil
YouTube: www.youtube.com/user/hpcadvisorycouncil