the challenges of exascale computing · the challenges of exascale computing dell accelerating...

The Challenges of Exascale Computing

Karl SolchenbachDirector European Exascale Labs, Intel

The Challenges of Exascale Computing

Dell Accelerating Understanding Summit 2015

Cambridge, September 1, 2015

Karl Solchenbach, Director Intel European Exascale Labs

DCG: IPAG, GPO (Global Pathfinding Operations)Intel Internal Only

3

Intel Data Center Group

DCG: IPAG, GPO (Global Pathfinding Operations)

High End HPC Roadmap

4

0.0

0.1

1.0

10.0

2000 2005 2010 2015 2020 2025

Energy Efficiency GF/W

Tianhe-2

34 PF, 24 MW

(KNC based)

Titan (Livermore)

17.6 PF, 8.2 MW

Opteron+Tesla

CORAL

150 PF, ~20 MW?

KNH based

IBM Roadrunner

1 PF, 2.4 MWTianhe-1

2.6 PF, 4 MW

Gig

aF

LO

Ps/W



5

0.0

0.1

1.0

10.0

2000 2005 2010 2015 2020 2025


Tianhe-2

34 PF, 24 MW

(KNC based)

Titan (Livermore)

17.6 PF, 8.2 MW

Opteron+Tesla

CORAL

150 PF, ~20 MW?

KNH based

IBM Roadrunner


2.6 PF, 4 MW

Gig

aF

LO

Ps/W



6

0.0

0.1

1.0

10.0

2000 2005 2010 2015 2020 2025


Tianhe-2

34 PF, 24 MW

(KNC based)

Titan (Livermore)

17.6 PF, 8.2 MW

Opteron+Tesla

CORAL

150 PF, ~20 MW?

KNH based

IBM Roadrunner


2.6 PF, 4 MW

Gig

aF

LO

Ps/W



7

0.0

0.1

1.0

10.0

2000 2005 2010 2015 2020 2025


Tianhe-2

34 PF, 24 MW

(KNC based)

Titan (Livermore)

17.6 PF, 8.2 MW

Opteron+Tesla

CORAL

150 PF, ~20 MW?

KNH based

IBM Roadrunner


2.6 PF, 4 MW

Gig

aF

LO

Ps/W Exascale 2022

1 EF, ~100 MW

Goal

50GF/W


Implications to HPC Roadmap

1

10

100

1000

1986 1996 2006 2016

Rela

tive T

ransis

tor

Perf

Giga

Tera

PetaExa



1

10

100

1000

1986 1996 2006 2016

Rela

tive T

ransis

tor

Perf

Giga

Tera

PetaExa

32x from transistor

32x from parallelism



1

10

100

1000

1986 1996 2006 2016

Rela

tive T

ransis

tor

Perf

Giga

Tera

PetaExa

32x from transistor


8x from transistor




1

10

100

1000

1986 1996 2006 2016

Rela

tive T

ransis

tor

Perf

Giga

Tera

PetaExa

32x from transistor


8x from transistor


1.5x from transistor


DCG: IPAG, GPO (Global Pathfinding Operations) 12

System Efficiency – Linpack and HPCG


Top Exascale Technology Challenges

• System Power & Energy

• New, efficient, memory subsystem

• Extreme parallelism

• New execution model comprehending self-awareness and

introspection

• Resiliency to provide system reliability

• Cost and affordability by enhancing system-efficiency

Source: Exascale Computing Study: Technology Challenges in achieving Exascale Systems (2008)

Voltage has been a big knob but we need more than

voltage and technology scaling

Addressing the Power ChallengeNear Threshold Voltage Operation


Applications Reveal a lot of Scalability

0

5

10

15

64 32 16 8 4 2 0

Binned Application Count - Mira Production Thread Scalability

0

5

10

15

3M 1.5M 1.0M .75M .5M .25M <.25M

Binned Application Count - Mira Production MPI Rank Scalability

Includes 7 applications running >1 MPI rank/core.

The DRAM Scaling Challenge

8Gb DRAM die

100 mm2

FPU1 = 0.03mm2

Core= 3mm2

DARPA: Exascale computing study: Exascale_Final_report_100208.pdf

2 FLOPS/cycleAssuming 100x larger than FPU

not cost balanced withMemory Compute

For cost balance:

1. Make physical size of memory capacity much smaller (not happening soon)

2. Improve balance by using lesser memory per compute via threading

3. Invest & innovate in new high density memory technologies

4. Add system flexibility through compute silicon configurability

5. Architect system around new memory capabilities.

When we think about cost, we need to start from the optimal usage of memory.

Memory is changing how we architect systems

32 GB DRAMDIMMS

High BW Memory

Capacity 32 GB 4 to 8 GB

Bandwidth 16 GB/s 250 GB/s

BW/Capacity ½ to 1 30 to 60

There is little reason over time to utilize DRAM DIMMSHigh BW memory is constrained to reside inside package.

Need to…• Get enough capacity to support applications• Develop correct integrated fabric to support

DRAMDIMMS

High BW Memory

Cost/Capacity

1x 1.2x to 1.5x

Cost/BW 1x 1/10x to 1/40x

Power/bit 1x 1/2 x to 1/10 x


Intel Exascale Labs in Europe

Juelich

Leuven

Paris

Geneva

Barcelona

http://www.google.com/url?sa=i&rct=j&q=&esrc=s&frm=1&source=images&cd=&cad=rja&uact=8&ved=0CAcQjRw&url=http://www.youreuropemap.com/europe_map_5.html&ei=Bze6VK2QGsPgywPw8oCABg&bvm=bv.83829542,d.bGQ&psig=AFQjCNGfqisFSCZixv_MHWS7-tO2zmE8Pw&ust=1421576303173674

http://www.google.com/url?sa=i&rct=j&q=&esrc=s&frm=1&source=images&cd=&cad=rja&uact=8&ved=0CAcQjRw&url=http://www.youreuropemap.com/europe_map_5.html&ei=Bze6VK2QGsPgywPw8oCABg&bvm=bv.83829542,d.bGQ&psig=AFQjCNGfqisFSCZixv_MHWS7-tO2zmE8Pw&ust=1421576303173674


Intel European Exascale Labs

• Technical Collaboration with Leading HPC organisations as partners

• Driven by projects, either internal or in H2020

• Mixed Intel/partner teams, embedded at partner locations

• Established „co-design“ process

o Partner: understands Intel roadmap, can anticipate future architectures

o Intel: understands partner‘s future applications and usage roadmap


Juelich ExaCluster Lab

Collaboration between FZ Juelich –

ParTec – Intel

Goal is to push out the scalability of

cluster architectures

Developing innovative cluster-booster

architecture

Building DEEP and DEEP-ER systems,

as European exascale entry system


Other European Exascale Labs Paris (with CEA and UVSQ)

– Scalability of applications (geoscience, combustion, ...)

– Proto (Mini)-Apps concept

Leuven (with IMEC)

– HPC for Life Science

– Machine learning for chemogenomics

– Fast execution driven simulator for future HW platforms

Barcelona (with BSC)

– Scalability tools, expanding from HPC towards Big Data

– Tasking programming model (OMPSs)

Geneva (with CERN)

– High Throughput Computing for LHC data


Conclusion

• System Power & Energy

• 5-10x improvement (perf/Watt) through combination of technology and architecture

• Data Center operation will be energy-aware

• Will applications be optimized for energy?

• New, efficient, memory subsystem

• High-BW memory plus NM will replace todays DRAM

• Applications will adjust to take advantage

• Extreme parallelism

• Some (but not all) applications will scale to >>1M threads

• „Ensemble“ simulations will become more important

• Need to exploit more threading parallelism

• MPI + X or new programming models?

THANK YOU !

the challenges of exascale computing · the challenges of exascale computing dell accelerating...

Documents