the challenges of exascale computing · the challenges of exascale computing dell accelerating...
TRANSCRIPT
The Challenges of Exascale Computing
Karl SolchenbachDirector European Exascale Labs, Intel
The Challenges of Exascale Computing
Dell Accelerating Understanding Summit 2015
Cambridge, September 1, 2015
Karl Solchenbach, Director Intel European Exascale Labs
DCG: IPAG, GPO (Global Pathfinding Operations)Intel Internal Only
3
Intel Data Center Group
DCG: IPAG, GPO (Global Pathfinding Operations)
High End HPC Roadmap
4
0.0
0.1
1.0
10.0
2000 2005 2010 2015 2020 2025
Energy Efficiency GF/W
Tianhe-2
34 PF, 24 MW
(KNC based)
Titan (Livermore)
17.6 PF, 8.2 MW
Opteron+Tesla
CORAL
150 PF, ~20 MW?
KNH based
IBM Roadrunner
1 PF, 2.4 MWTianhe-1
2.6 PF, 4 MW
Gig
aF
LO
Ps/W
DCG: IPAG, GPO (Global Pathfinding Operations)
High End HPC Roadmap
5
0.0
0.1
1.0
10.0
2000 2005 2010 2015 2020 2025
Energy Efficiency GF/W
Tianhe-2
34 PF, 24 MW
(KNC based)
Titan (Livermore)
17.6 PF, 8.2 MW
Opteron+Tesla
CORAL
150 PF, ~20 MW?
KNH based
IBM Roadrunner
1 PF, 2.4 MWTianhe-1
2.6 PF, 4 MW
Gig
aF
LO
Ps/W
DCG: IPAG, GPO (Global Pathfinding Operations)
High End HPC Roadmap
6
0.0
0.1
1.0
10.0
2000 2005 2010 2015 2020 2025
Energy Efficiency GF/W
Tianhe-2
34 PF, 24 MW
(KNC based)
Titan (Livermore)
17.6 PF, 8.2 MW
Opteron+Tesla
CORAL
150 PF, ~20 MW?
KNH based
IBM Roadrunner
1 PF, 2.4 MWTianhe-1
2.6 PF, 4 MW
Gig
aF
LO
Ps/W
DCG: IPAG, GPO (Global Pathfinding Operations)
High End HPC Roadmap
7
0.0
0.1
1.0
10.0
2000 2005 2010 2015 2020 2025
Energy Efficiency GF/W
Tianhe-2
34 PF, 24 MW
(KNC based)
Titan (Livermore)
17.6 PF, 8.2 MW
Opteron+Tesla
CORAL
150 PF, ~20 MW?
KNH based
IBM Roadrunner
1 PF, 2.4 MWTianhe-1
2.6 PF, 4 MW
Gig
aF
LO
Ps/W Exascale 2022
1 EF, ~100 MW
Goal
50GF/W
DCG: IPAG, GPO (Global Pathfinding Operations)
Implications to HPC Roadmap
1
10
100
1000
1986 1996 2006 2016
Rela
tive T
ransis
tor
Perf
Giga
Tera
PetaExa
DCG: IPAG, GPO (Global Pathfinding Operations)
Implications to HPC Roadmap
1
10
100
1000
1986 1996 2006 2016
Rela
tive T
ransis
tor
Perf
Giga
Tera
PetaExa
32x from transistor
32x from parallelism
DCG: IPAG, GPO (Global Pathfinding Operations)
Implications to HPC Roadmap
1
10
100
1000
1986 1996 2006 2016
Rela
tive T
ransis
tor
Perf
Giga
Tera
PetaExa
32x from transistor
32x from parallelism
8x from transistor
128x from parallelism
DCG: IPAG, GPO (Global Pathfinding Operations)
Implications to HPC Roadmap
1
10
100
1000
1986 1996 2006 2016
Rela
tive T
ransis
tor
Perf
Giga
Tera
PetaExa
32x from transistor
32x from parallelism
8x from transistor
128x from parallelism
1.5x from transistor
670x from parallelism
DCG: IPAG, GPO (Global Pathfinding Operations) 12
System Efficiency – Linpack and HPCG
DCG: IPAG, GPO (Global Pathfinding Operations)
Top Exascale Technology Challenges
• System Power & Energy
• New, efficient, memory subsystem
• Extreme parallelism
• New execution model comprehending self-awareness and
introspection
• Resiliency to provide system reliability
• Cost and affordability by enhancing system-efficiency
Source: Exascale Computing Study: Technology Challenges in achieving Exascale Systems (2008)
Voltage has been a big knob but we need more than
voltage and technology scaling
Addressing the Power ChallengeNear Threshold Voltage Operation
DCG: IPAG, GPO (Global Pathfinding Operations)
Applications Reveal a lot of Scalability
0
5
10
15
64 32 16 8 4 2 0
Binned Application Count - Mira Production Thread Scalability
0
5
10
15
3M 1.5M 1.0M .75M .5M .25M <.25M
Binned Application Count - Mira Production MPI Rank Scalability
Includes 7 applications running >1 MPI rank/core.
The DRAM Scaling Challenge
8Gb DRAM die
100 mm2
FPU1 = 0.03mm2
Core= 3mm2
DARPA: Exascale computing study: Exascale_Final_report_100208.pdf
2 FLOPS/cycleAssuming 100x larger than FPU
not cost balanced withMemory Compute
For cost balance:
1. Make physical size of memory capacity much smaller (not happening soon)
2. Improve balance by using lesser memory per compute via threading
3. Invest & innovate in new high density memory technologies
4. Add system flexibility through compute silicon configurability
5. Architect system around new memory capabilities.
When we think about cost, we need to start from the optimal usage of memory.
Memory is changing how we architect systems
32 GB DRAMDIMMS
High BW Memory
Capacity 32 GB 4 to 8 GB
Bandwidth 16 GB/s 250 GB/s
BW/Capacity ½ to 1 30 to 60
There is little reason over time to utilize DRAM DIMMSHigh BW memory is constrained to reside inside package.
Need to…• Get enough capacity to support applications• Develop correct integrated fabric to support
DRAMDIMMS
High BW Memory
Cost/Capacity
1x 1.2x to 1.5x
Cost/BW 1x 1/10x to 1/40x
Power/bit 1x 1/2 x to 1/10 x
DCG: IPAG, GPO (Global Pathfinding Operations)
Intel Exascale Labs in Europe
Juelich
Leuven
Paris
Geneva
Barcelona
DCG: IPAG, GPO (Global Pathfinding Operations)
Intel European Exascale Labs
• Technical Collaboration with Leading HPC organisations as partners
• Driven by projects, either internal or in H2020
• Mixed Intel/partner teams, embedded at partner locations
• Established „co-design“ process
o Partner: understands Intel roadmap, can anticipate future architectures
o Intel: understands partner‘s future applications and usage roadmap
DCG: IPAG, GPO (Global Pathfinding Operations) 20
Juelich ExaCluster Lab
Collaboration between FZ Juelich –
ParTec – Intel
Goal is to push out the scalability of
cluster architectures
Developing innovative cluster-booster
architecture
Building DEEP and DEEP-ER systems,
as European exascale entry system
DCG: IPAG, GPO (Global Pathfinding Operations) 21
Other European Exascale Labs Paris (with CEA and UVSQ)
– Scalability of applications (geoscience, combustion, ...)
– Proto (Mini)-Apps concept
Leuven (with IMEC)
– HPC for Life Science
– Machine learning for chemogenomics
– Fast execution driven simulator for future HW platforms
Barcelona (with BSC)
– Scalability tools, expanding from HPC towards Big Data
– Tasking programming model (OMPSs)
Geneva (with CERN)
– High Throughput Computing for LHC data
DCG: IPAG, GPO (Global Pathfinding Operations)
Conclusion
• System Power & Energy
• 5-10x improvement (perf/Watt) through combination of technology and architecture
• Data Center operation will be energy-aware
• Will applications be optimized for energy?
• New, efficient, memory subsystem
• High-BW memory plus NM will replace todays DRAM
• Applications will adjust to take advantage
• Extreme parallelism
• Some (but not all) applications will scale to >>1M threads
• „Ensemble“ simulations will become more important
• Need to exploit more threading parallelism
• MPI + X or new programming models?
THANK YOU !