tools and systems for hpc and ai at lrz - esoleibniz supercomputing centre computer centre 250 for...
TRANSCRIPT
![Page 1: Tools and systems for HPC and AI at LRZ - ESOLeibniz Supercomputing Centre Computer Centre 250 for all Munich Universities employees approx. 56 ... SuperMUC-NG, Linux cluster Big Data](https://reader034.vdocuments.net/reader034/viewer/2022052103/603d3b931b595252bb54fb1f/html5/thumbnails/1.jpg)
![Page 2: Tools and systems for HPC and AI at LRZ - ESOLeibniz Supercomputing Centre Computer Centre 250 for all Munich Universities employees approx. 56 ... SuperMUC-NG, Linux cluster Big Data](https://reader034.vdocuments.net/reader034/viewer/2022052103/603d3b931b595252bb54fb1f/html5/thumbnails/2.jpg)
2
Tools and systems for HPC and AI at LRZ 22.7.2019 | Luigi Iapichino
Tools and systems for HPC and AI at LRZ | 22.7.2019 | Luigi Iapichino
![Page 3: Tools and systems for HPC and AI at LRZ - ESOLeibniz Supercomputing Centre Computer Centre 250 for all Munich Universities employees approx. 56 ... SuperMUC-NG, Linux cluster Big Data](https://reader034.vdocuments.net/reader034/viewer/2022052103/603d3b931b595252bb54fb1f/html5/thumbnails/3.jpg)
About the presenter
• Team leader of the Application Lab for Astrophysics (LRZ
AstroLab)
• Lead of Quantum Computing @ LRZ
• Expert in computational astrophysics and simulations
• Member of the PRACE High-Level Support Team
Email: [email protected]
Thanks to Nicolay Hammer and David Brayford (LRZ) for having
provided many of the next slides
3
Dr. Luigi Iapichino
• Astrophysics and
Quantum
Computing
Application
Specialist
• High Performance
Systems Division,
LRZ
Tools and systems for HPC and AI at LRZ | 22.7.2019 | Luigi Iapichino
![Page 4: Tools and systems for HPC and AI at LRZ - ESOLeibniz Supercomputing Centre Computer Centre 250 for all Munich Universities employees approx. 56 ... SuperMUC-NG, Linux cluster Big Data](https://reader034.vdocuments.net/reader034/viewer/2022052103/603d3b931b595252bb54fb1f/html5/thumbnails/4.jpg)
4
Leibniz Supercomputing CentreGarching, Germany
Tools and systems for HPC and AI at LRZ | 22.7.2019 | Luigi Iapichino
![Page 5: Tools and systems for HPC and AI at LRZ - ESOLeibniz Supercomputing Centre Computer Centre 250 for all Munich Universities employees approx. 56 ... SuperMUC-NG, Linux cluster Big Data](https://reader034.vdocuments.net/reader034/viewer/2022052103/603d3b931b595252bb54fb1f/html5/thumbnails/5.jpg)
Bavarian Academy of Sciences and HumanitiesLeibniz Supercomputing Centre
Computer Centre
for all Munich Universities250employees
approx.
56years of
IT support
We are the computing backbone for advanced research science in Bavaria
Regional Computer Centre
for all Bavarian Universities
National Supercomputing Centre
(GCS)
European Supercomputing Centre
(PRACE)
5Tools and systems for HPC and AI at LRZ | 22.7.2019 | Luigi Iapichino
![Page 6: Tools and systems for HPC and AI at LRZ - ESOLeibniz Supercomputing Centre Computer Centre 250 for all Munich Universities employees approx. 56 ... SuperMUC-NG, Linux cluster Big Data](https://reader034.vdocuments.net/reader034/viewer/2022052103/603d3b931b595252bb54fb1f/html5/thumbnails/6.jpg)
6
Operating Cutting-Edge IT InfrastructureLRZ as an IT Center of Excellence
Storage
Network
Cloud Computing
Cluster
HPC
Training
Consultancy
High Speed Networking
Munich Scientific Network
High Performance Computing
SuperMUC-NG, Linux cluster
Big Data
Bavarian State Library Digital Archive
Virtual Reality and Visualisation
V2C (CAVE, Powerwall)
Tools and systems for HPC and AI at LRZ | 22.7.2019 | Luigi Iapichino
![Page 7: Tools and systems for HPC and AI at LRZ - ESOLeibniz Supercomputing Centre Computer Centre 250 for all Munich Universities employees approx. 56 ... SuperMUC-NG, Linux cluster Big Data](https://reader034.vdocuments.net/reader034/viewer/2022052103/603d3b931b595252bb54fb1f/html5/thumbnails/7.jpg)
7
For German ResearchLRZ as IT Service Provider
Gauss Centre for Supercomputing (GCS)
Alliance for Germany’s Tier-0
high performance computing centers
• LRZ | Munich | SuperMUC-NG
• HLRS | Stuttgart | Hazel Hen
• JSC | Jülich | JUWELS
Founded 13. April 2007
Tools and systems for HPC and AI at LRZ | 22.7.2019 | Luigi
Iapichino
![Page 8: Tools and systems for HPC and AI at LRZ - ESOLeibniz Supercomputing Centre Computer Centre 250 for all Munich Universities employees approx. 56 ... SuperMUC-NG, Linux cluster Big Data](https://reader034.vdocuments.net/reader034/viewer/2022052103/603d3b931b595252bb54fb1f/html5/thumbnails/8.jpg)
8
For European ResearchLRZ as IT Service Provider
Partnership for Advanced Computing
in Europe (PRACE)
Federated, pan-European Tier-0
supercomputing infrastructure
25 Countries
Hosting Members:
• GCS (LRZ, HLRS, JSC)
BSC (Spain)
• CSCS (Switzerland)
• CINECA (Italy)
• GENCI (France)
PRACE 2: 2017 – 2020
Tools and systems for HPC and AI at LRZ | 22.7.2019 | Luigi
Iapichino
![Page 9: Tools and systems for HPC and AI at LRZ - ESOLeibniz Supercomputing Centre Computer Centre 250 for all Munich Universities employees approx. 56 ... SuperMUC-NG, Linux cluster Big Data](https://reader034.vdocuments.net/reader034/viewer/2022052103/603d3b931b595252bb54fb1f/html5/thumbnails/9.jpg)
LRZ Systems and Access
9
National and International
PRACE
GCS
SuperMUC-NGLocal and Regional
Munich TUM and LMU
~30% of SuperMUC
usage
*Students
Training future experts
Bavarian projects
<1 million CPU hours
Cluster
• CoolMUC-2
• CoolMUC-3
• Teramem
• DGX-1
• VM WareHigh Availability Cloud
• Compute CloudOpen Stack
Open Nebula
Tools and systems for HPC and AI at LRZ | 22.7.2019 | Luigi Iapichino
![Page 10: Tools and systems for HPC and AI at LRZ - ESOLeibniz Supercomputing Centre Computer Centre 250 for all Munich Universities employees approx. 56 ... SuperMUC-NG, Linux cluster Big Data](https://reader034.vdocuments.net/reader034/viewer/2022052103/603d3b931b595252bb54fb1f/html5/thumbnails/10.jpg)
Data Intensive
Computing,
Data Analytics
& AI
High Performance Data Computing at LRZ
10
Emerging
Communities
HPC User
Communities
Increasing
computing demands
Increasing
analytics demands
Tools and systems for HPC and AI at LRZ | 22.7.2019 | Luigi Iapichino
![Page 11: Tools and systems for HPC and AI at LRZ - ESOLeibniz Supercomputing Centre Computer Centre 250 for all Munich Universities employees approx. 56 ... SuperMUC-NG, Linux cluster Big Data](https://reader034.vdocuments.net/reader034/viewer/2022052103/603d3b931b595252bb54fb1f/html5/thumbnails/11.jpg)
Big Data and AI
Ability and Expertise
to Target Large
Scale Problems
A New World is Emerging: High Performance AI (HPAI)
New User
Communities with
New Workflows
HPC
11Tools and systems for HPC and AI at LRZ | 22.7.2019 | Luigi Iapichino
![Page 12: Tools and systems for HPC and AI at LRZ - ESOLeibniz Supercomputing Centre Computer Centre 250 for all Munich Universities employees approx. 56 ... SuperMUC-NG, Linux cluster Big Data](https://reader034.vdocuments.net/reader034/viewer/2022052103/603d3b931b595252bb54fb1f/html5/thumbnails/12.jpg)
HPC AI &
Machine
Learning
Big Data
THE COMPUTING INNOVATION CYCLEAdvanced computing and huge volumes of data creates new opportunities for information insight.
Data ▻ Algorithms ▻
Computing ▻ Pattern Recognition Modeling & Simulation (M&S)
Natural World ▻ Hypothesis ▻
Equations ▻ Algorithms ▻
Computing ▻ Data ▻ Analysis
12Tools and systems for HPC and AI at LRZ | 22.7.2019 | Luigi Iapichino
![Page 13: Tools and systems for HPC and AI at LRZ - ESOLeibniz Supercomputing Centre Computer Centre 250 for all Munich Universities employees approx. 56 ... SuperMUC-NG, Linux cluster Big Data](https://reader034.vdocuments.net/reader034/viewer/2022052103/603d3b931b595252bb54fb1f/html5/thumbnails/13.jpg)
HPC AI &
Machine
Learning
Big Data
THE COMPUTING INNOVATION CYCLEAdvanced computing and huge volumes of data creates new opportunities for information insight.
Data ▻ Algorithms ▻
Computing ▻ Pattern Recognition Modeling & Simulation (M&S)
Natural World ▻ Hypothesis ▻
Equations ▻ Algorithms ▻
Computing ▻ Data ▻ Analysis
13Tools and systems for HPC and AI at LRZ | 22.7.2019 | Luigi Iapichino
![Page 14: Tools and systems for HPC and AI at LRZ - ESOLeibniz Supercomputing Centre Computer Centre 250 for all Munich Universities employees approx. 56 ... SuperMUC-NG, Linux cluster Big Data](https://reader034.vdocuments.net/reader034/viewer/2022052103/603d3b931b595252bb54fb1f/html5/thumbnails/14.jpg)
High Performance AI (HPAI) in a Container
14
Transition AI algorithms from the
laptop to supercomputer
with minimal effort
“It just works”
Tools and systems for HPC and AI at LRZ | 22.7.2019 | Luigi Iapichino
![Page 15: Tools and systems for HPC and AI at LRZ - ESOLeibniz Supercomputing Centre Computer Centre 250 for all Munich Universities employees approx. 56 ... SuperMUC-NG, Linux cluster Big Data](https://reader034.vdocuments.net/reader034/viewer/2022052103/603d3b931b595252bb54fb1f/html5/thumbnails/15.jpg)
HPAI =
M&S• Equation based on model
• Computing driven
• Numerically intensive
• Creates simulations
• Monte Carlo
• Larger problems
• Iterative methods
• PDE
Analytics• Finds patterns
• Correlations in data
• Logic driven
• Creates inferences
• Knowledge discovery
• Graphs
• Data-driven science
• Predictions
• CNN
• RNN
+• Linear algebra
• Matrix operations
• Iterative methods
• Compute intensive
• Data transfer
• Predictive
• Probabilities
• Stencil codes
• Calculus
• Pattern recognition
• Graphs
15Tools and systems for HPC and AI at LRZ | 22.7.2019 | Luigi Iapichino
![Page 16: Tools and systems for HPC and AI at LRZ - ESOLeibniz Supercomputing Centre Computer Centre 250 for all Munich Universities employees approx. 56 ... SuperMUC-NG, Linux cluster Big Data](https://reader034.vdocuments.net/reader034/viewer/2022052103/603d3b931b595252bb54fb1f/html5/thumbnails/16.jpg)
AI
Large number of small files
Large memory nodes (+1TB)
Single node
Single GPU/accelerator node
Local node storage
Data transfer within a single node. (PCI bus)
Matrices are typically small
Root privileges
HPC
Small number of large files
Memory per node (32/64GB)
Multiple nodes
Distribute compute over many nodes
Typically diskless systems (no local storage)
Data transfer between multiple nodes
Medium to large matrices
User privileges
16
Differences Between HPC & AIHPAI @ LRZ
Tools and systems for HPC and AI at LRZ | 22.7.2019 | Luigi Iapichino
![Page 17: Tools and systems for HPC and AI at LRZ - ESOLeibniz Supercomputing Centre Computer Centre 250 for all Munich Universities employees approx. 56 ... SuperMUC-NG, Linux cluster Big Data](https://reader034.vdocuments.net/reader034/viewer/2022052103/603d3b931b595252bb54fb1f/html5/thumbnails/17.jpg)
Compute intensive hardware
Optimized AI frameworks (TensorFlow, Caffe)
Optimized software (numerical libraries, Python)
HPC specific software (distributed computing, workload manager)
Method of deploying the AI software in a simple, straightforward and flexible way
17
Requirements for AI on HPCHPAI @ LRZ
Need to get to: “It just works”
Tools and systems for HPC and AI at LRZ | 22.7.2019 | Luigi Iapichino
![Page 18: Tools and systems for HPC and AI at LRZ - ESOLeibniz Supercomputing Centre Computer Centre 250 for all Munich Universities employees approx. 56 ... SuperMUC-NG, Linux cluster Big Data](https://reader034.vdocuments.net/reader034/viewer/2022052103/603d3b931b595252bb54fb1f/html5/thumbnails/18.jpg)
18
SuperMUC-NG
HPC System
(Tier-1)
LRZ Linux
Cluster
HPC System
(Tier-2)
LRZ
Compute Cloud
Compute System Diversity with fully integrated Central Data Silos
Special System
DGX-1,
Teramem,
…
Sharing Data
with outside world
LRZ Data
Science Storage
(DSS) Systems
Tools and systems for HPC and AI at LRZ | 22.7.2019 | Luigi Iapichino
![Page 19: Tools and systems for HPC and AI at LRZ - ESOLeibniz Supercomputing Centre Computer Centre 250 for all Munich Universities employees approx. 56 ... SuperMUC-NG, Linux cluster Big Data](https://reader034.vdocuments.net/reader034/viewer/2022052103/603d3b931b595252bb54fb1f/html5/thumbnails/19.jpg)
Our training system: the LRZ Compute Cloud
A new service for LRZ users
It allows to upload and use your own virtual machines
Hardware overview:
• 82 nodes: Intel® Xeon ® Gold 6148 (40 cores) @ 2.40GHz, 192 GB memory
• 32 nodes: Intel ® Xeon ® Gold 6148 (40 cores) @ 2.40GHz, 768 GB memory, each with 2x
Nvidia Tesla V100 16 GB RAM
• 1 huge memory node: Intel® Xeon® Platinum 8160 (192 cores) @ 2.10GHz, 6000 GB
memory
19Tools and systems for HPC and AI at LRZ | 22.7.2019 | Luigi Iapichino