celsius lecture 2/14/13 1 - uppsala university · 2019. 9. 9. · celsius lecture 2/14/13 2...
TRANSCRIPT
![Page 1: Celsius Lecture 2/14/13 1 - Uppsala University · 2019. 9. 9. · Celsius Lecture 2/14/13 2 Exascale Computing Will Enable Transformational Science. Supercomputers are scientific](https://reader033.vdocuments.net/reader033/viewer/2022060806/608b4001c5ea8726584f62b6/html5/thumbnails/1.jpg)
Celsius Lecture2/14/13 1
![Page 2: Celsius Lecture 2/14/13 1 - Uppsala University · 2019. 9. 9. · Celsius Lecture 2/14/13 2 Exascale Computing Will Enable Transformational Science. Supercomputers are scientific](https://reader033.vdocuments.net/reader033/viewer/2022060806/608b4001c5ea8726584f62b6/html5/thumbnails/2.jpg)
Celsius Lecture2/14/13 2
Exascale Computing Will Enable Transformational Science
![Page 3: Celsius Lecture 2/14/13 1 - Uppsala University · 2019. 9. 9. · Celsius Lecture 2/14/13 2 Exascale Computing Will Enable Transformational Science. Supercomputers are scientific](https://reader033.vdocuments.net/reader033/viewer/2022060806/608b4001c5ea8726584f62b6/html5/thumbnails/3.jpg)
Celsius Lecture2/14/13 3
Climate
Comprehensive Earth System Model at 1KM scale, enabling modeling of cloud convection and ocean eddies.
![Page 4: Celsius Lecture 2/14/13 1 - Uppsala University · 2019. 9. 9. · Celsius Lecture 2/14/13 2 Exascale Computing Will Enable Transformational Science. Supercomputers are scientific](https://reader033.vdocuments.net/reader033/viewer/2022060806/608b4001c5ea8726584f62b6/html5/thumbnails/4.jpg)
Celsius Lecture2/14/13 4
Combustion
First-principles simulation of combustion for new high- efficiency, low-emision engines.
![Page 5: Celsius Lecture 2/14/13 1 - Uppsala University · 2019. 9. 9. · Celsius Lecture 2/14/13 2 Exascale Computing Will Enable Transformational Science. Supercomputers are scientific](https://reader033.vdocuments.net/reader033/viewer/2022060806/608b4001c5ea8726584f62b6/html5/thumbnails/5.jpg)
Celsius Lecture2/14/13 5
Biology
Coupled simulation of entire cells at molecular, genetic, chemical and biological levels.
![Page 6: Celsius Lecture 2/14/13 1 - Uppsala University · 2019. 9. 9. · Celsius Lecture 2/14/13 2 Exascale Computing Will Enable Transformational Science. Supercomputers are scientific](https://reader033.vdocuments.net/reader033/viewer/2022060806/608b4001c5ea8726584f62b6/html5/thumbnails/6.jpg)
Celsius Lecture2/14/13 6
Astrophysics
Predictive calculations for thermonuclear and core- collapse supernovae, allowing confirmation of theoretical models.
![Page 7: Celsius Lecture 2/14/13 1 - Uppsala University · 2019. 9. 9. · Celsius Lecture 2/14/13 2 Exascale Computing Will Enable Transformational Science. Supercomputers are scientific](https://reader033.vdocuments.net/reader033/viewer/2022060806/608b4001c5ea8726584f62b6/html5/thumbnails/7.jpg)
Celsius Lecture2/14/13 7
Exascale Computing Will Enable Transformational Science
High-Performance Computers are Scientific Instruments
![Page 8: Celsius Lecture 2/14/13 1 - Uppsala University · 2019. 9. 9. · Celsius Lecture 2/14/13 2 Exascale Computing Will Enable Transformational Science. Supercomputers are scientific](https://reader033.vdocuments.net/reader033/viewer/2022060806/608b4001c5ea8726584f62b6/html5/thumbnails/8.jpg)
Celsius Lecture2/14/13 8
Titan: World’s #1 Open Science Supercomputer
18,688 NVIDIA Tesla K20X GPUs
27 Petaflops
Peak: 90% of Performance from GPUs
17.59 Petaflops
Sustained Performance on Linpack
![Page 9: Celsius Lecture 2/14/13 1 - Uppsala University · 2019. 9. 9. · Celsius Lecture 2/14/13 2 Exascale Computing Will Enable Transformational Science. Supercomputers are scientific](https://reader033.vdocuments.net/reader033/viewer/2022060806/608b4001c5ea8726584f62b6/html5/thumbnails/9.jpg)
Celsius Lecture2/14/13 9
Titan & Kepler
18,688 NVIDIA Kepler GK11027 PF peak (90% from GPUs)17.6PF HP Linpack2.12 GF/W
GK110 is 7GF/W
![Page 10: Celsius Lecture 2/14/13 1 - Uppsala University · 2019. 9. 9. · Celsius Lecture 2/14/13 2 Exascale Computing Will Enable Transformational Science. Supercomputers are scientific](https://reader033.vdocuments.net/reader033/viewer/2022060806/608b4001c5ea8726584f62b6/html5/thumbnails/10.jpg)
Celsius Lecture2/14/13 10
The Road to Exascale
201220PF
18,000GPUs10MW
2GFLOPs/W~107 Threads
You are Here2020
1000PF (50x)72,000HCNs (4x)
20MW (2x)50GFLOPs/W (25x)
~1010 Threads (1000x)
![Page 11: Celsius Lecture 2/14/13 1 - Uppsala University · 2019. 9. 9. · Celsius Lecture 2/14/13 2 Exascale Computing Will Enable Transformational Science. Supercomputers are scientific](https://reader033.vdocuments.net/reader033/viewer/2022060806/608b4001c5ea8726584f62b6/html5/thumbnails/11.jpg)
Celsius Lecture2/14/13 11
Technical Challenges on The Road to Exascale
201220PF
18,000GPUs10MW
2GFLOPs/W~107 Threads
20201000PF (50x)
72,000HCNs (4x)20MW (2x)
50GFLOPs/W (25x)~1010 Threads (1000x)
1. Energy Efficiency
![Page 12: Celsius Lecture 2/14/13 1 - Uppsala University · 2019. 9. 9. · Celsius Lecture 2/14/13 2 Exascale Computing Will Enable Transformational Science. Supercomputers are scientific](https://reader033.vdocuments.net/reader033/viewer/2022060806/608b4001c5ea8726584f62b6/html5/thumbnails/12.jpg)
Celsius Lecture2/14/13 12
Technical Challenges on The Road to Exascale
201220PF
18,000GPUs10MW
2GFLOPs/W~107 Threads
20201000PF (50x)
72,000HCNs (4x)20MW (2x)
50GFLOPs/W (25x)~1010 Threads (1000x)
1. Energy Efficiency2. Parallel Programmability
![Page 13: Celsius Lecture 2/14/13 1 - Uppsala University · 2019. 9. 9. · Celsius Lecture 2/14/13 2 Exascale Computing Will Enable Transformational Science. Supercomputers are scientific](https://reader033.vdocuments.net/reader033/viewer/2022060806/608b4001c5ea8726584f62b6/html5/thumbnails/13.jpg)
Celsius Lecture2/14/13 13
Technical Challenges on The Road to Exascale
201220PF
18,000GPUs10MW
2GFLOPs/W~107 Threads
20201000PF (50x)
72,000HCNs (4x)20MW (2x)
50GFLOPs/W (25x)~1010 Threads (1000x)
1. Energy Efficiency2. Parallel Programmability3. Resilience
![Page 14: Celsius Lecture 2/14/13 1 - Uppsala University · 2019. 9. 9. · Celsius Lecture 2/14/13 2 Exascale Computing Will Enable Transformational Science. Supercomputers are scientific](https://reader033.vdocuments.net/reader033/viewer/2022060806/608b4001c5ea8726584f62b6/html5/thumbnails/14.jpg)
Celsius Lecture2/14/13 14
50x performance in 8 years, Moore’s Law will take care of that, right?
![Page 15: Celsius Lecture 2/14/13 1 - Uppsala University · 2019. 9. 9. · Celsius Lecture 2/14/13 2 Exascale Computing Will Enable Transformational Science. Supercomputers are scientific](https://reader033.vdocuments.net/reader033/viewer/2022060806/608b4001c5ea8726584f62b6/html5/thumbnails/15.jpg)
Celsius Lecture2/14/13 15
50x performance in 8 years, Moore’s Law will take care of that, right?
Wrong!
![Page 16: Celsius Lecture 2/14/13 1 - Uppsala University · 2019. 9. 9. · Celsius Lecture 2/14/13 2 Exascale Computing Will Enable Transformational Science. Supercomputers are scientific](https://reader033.vdocuments.net/reader033/viewer/2022060806/608b4001c5ea8726584f62b6/html5/thumbnails/16.jpg)
Celsius Lecture2/14/13 16
Moore’s Law gives us transistors Which we used to turn into scalar performance
Moore, Electronics 38(8) April 19, 1965
![Page 17: Celsius Lecture 2/14/13 1 - Uppsala University · 2019. 9. 9. · Celsius Lecture 2/14/13 2 Exascale Computing Will Enable Transformational Science. Supercomputers are scientific](https://reader033.vdocuments.net/reader033/viewer/2022060806/608b4001c5ea8726584f62b6/html5/thumbnails/17.jpg)
Celsius Lecture2/14/13 17
ISAT LCC: 17
But ILP was ‘mined out’ in 2000
1e-41e-31e-21e-11e+01e+11e+21e+31e+41e+51e+61e+7
1980 1990 2000 2010 2020
Perf (ps/Inst)Linear (ps/Inst)
52%/year
74%/year
19%/year30:1
1,000:1
30,000:1
Dally et al. “The Last Classical Computer”, ISAT Study, 2001
![Page 18: Celsius Lecture 2/14/13 1 - Uppsala University · 2019. 9. 9. · Celsius Lecture 2/14/13 2 Exascale Computing Will Enable Transformational Science. Supercomputers are scientific](https://reader033.vdocuments.net/reader033/viewer/2022060806/608b4001c5ea8726584f62b6/html5/thumbnails/18.jpg)
Celsius Lecture2/14/13 18
And L3 energy scaling ended in 2005
Gordon Moore, ISSCC 2003Moore, ISSCC Keynote, 2003
![Page 19: Celsius Lecture 2/14/13 1 - Uppsala University · 2019. 9. 9. · Celsius Lecture 2/14/13 2 Exascale Computing Will Enable Transformational Science. Supercomputers are scientific](https://reader033.vdocuments.net/reader033/viewer/2022060806/608b4001c5ea8726584f62b6/html5/thumbnails/19.jpg)
Celsius Lecture2/14/13 19
Result: The End of Historic Scaling
C Moore, Data Processing in ExaScale-ClassComputer Systems, Salishan, April 2011
![Page 20: Celsius Lecture 2/14/13 1 - Uppsala University · 2019. 9. 9. · Celsius Lecture 2/14/13 2 Exascale Computing Will Enable Transformational Science. Supercomputers are scientific](https://reader033.vdocuments.net/reader033/viewer/2022060806/608b4001c5ea8726584f62b6/html5/thumbnails/20.jpg)
Celsius Lecture2/14/13 20
Historic scaling is at an end!
To continue performance scaling of all sizes of computer systems requires addressing two challenges:
Power and Parallelism
Much of the economy depends on this
![Page 21: Celsius Lecture 2/14/13 1 - Uppsala University · 2019. 9. 9. · Celsius Lecture 2/14/13 2 Exascale Computing Will Enable Transformational Science. Supercomputers are scientific](https://reader033.vdocuments.net/reader033/viewer/2022060806/608b4001c5ea8726584f62b6/html5/thumbnails/21.jpg)
Celsius Lecture2/14/13 21
The Power Challenge
![Page 22: Celsius Lecture 2/14/13 1 - Uppsala University · 2019. 9. 9. · Celsius Lecture 2/14/13 2 Exascale Computing Will Enable Transformational Science. Supercomputers are scientific](https://reader033.vdocuments.net/reader033/viewer/2022060806/608b4001c5ea8726584f62b6/html5/thumbnails/22.jpg)
Celsius Lecture2/14/13 22
In the past we had constant-field scaling L’ = L/2 V’ = V/2
E’ = CV2 = E/8 f’ = 2f
D’ = 1/L2 = 4D P’ = P
Halve L and get 8x the capability for the same power
![Page 23: Celsius Lecture 2/14/13 1 - Uppsala University · 2019. 9. 9. · Celsius Lecture 2/14/13 2 Exascale Computing Will Enable Transformational Science. Supercomputers are scientific](https://reader033.vdocuments.net/reader033/viewer/2022060806/608b4001c5ea8726584f62b6/html5/thumbnails/23.jpg)
Celsius Lecture2/14/13 23
Now voltage is held nearly constant L’ = L/2 V’ = V
E’ = CV2 = E/2 f’ = 2f*
D’ = 1/L2 = 4D P’ = 4P
Halve L and get 2x the capability for the same power in ¼ the area
*f is no longer scaling as 1/L, but it doesn’t matter, we couldn’t power it if it did
![Page 24: Celsius Lecture 2/14/13 1 - Uppsala University · 2019. 9. 9. · Celsius Lecture 2/14/13 2 Exascale Computing Will Enable Transformational Science. Supercomputers are scientific](https://reader033.vdocuments.net/reader033/viewer/2022060806/608b4001c5ea8726584f62b6/html5/thumbnails/24.jpg)
Celsius Lecture2/14/13 24
Performance = Efficiency
Efficiency = Locality
![Page 25: Celsius Lecture 2/14/13 1 - Uppsala University · 2019. 9. 9. · Celsius Lecture 2/14/13 2 Exascale Computing Will Enable Transformational Science. Supercomputers are scientific](https://reader033.vdocuments.net/reader033/viewer/2022060806/608b4001c5ea8726584f62b6/html5/thumbnails/25.jpg)
Celsius Lecture2/14/13 25
Locality
![Page 26: Celsius Lecture 2/14/13 1 - Uppsala University · 2019. 9. 9. · Celsius Lecture 2/14/13 2 Exascale Computing Will Enable Transformational Science. Supercomputers are scientific](https://reader033.vdocuments.net/reader033/viewer/2022060806/608b4001c5ea8726584f62b6/html5/thumbnails/26.jpg)
Celsius Lecture2/14/13 26
The High Cost of Data Movement Fetching operands costs more than computing on them
20mm
64-bit DP20pJ 26 pJ 256 pJ
1 nJ
500 pJ Efficientoff-chip link
28nm
256-bitbuses
16 nJ DRAMRd/Wr
256-bit access8 kB SRAM
50 pJ
![Page 27: Celsius Lecture 2/14/13 1 - Uppsala University · 2019. 9. 9. · Celsius Lecture 2/14/13 2 Exascale Computing Will Enable Transformational Science. Supercomputers are scientific](https://reader033.vdocuments.net/reader033/viewer/2022060806/608b4001c5ea8726584f62b6/html5/thumbnails/27.jpg)
Celsius Lecture2/14/13 27
Scaling makes locality even more important
![Page 28: Celsius Lecture 2/14/13 1 - Uppsala University · 2019. 9. 9. · Celsius Lecture 2/14/13 2 Exascale Computing Will Enable Transformational Science. Supercomputers are scientific](https://reader033.vdocuments.net/reader033/viewer/2022060806/608b4001c5ea8726584f62b6/html5/thumbnails/28.jpg)
Celsius Lecture2/14/13 28
Its not about the FLOPS
Its about data movement
Algorithms should be designed to perform more work per unit data movement.
Programming systems should further optimize this data movement.
Architectures should facilitate this by providing an exposed hierarchy and efficient communication.
![Page 29: Celsius Lecture 2/14/13 1 - Uppsala University · 2019. 9. 9. · Celsius Lecture 2/14/13 2 Exascale Computing Will Enable Transformational Science. Supercomputers are scientific](https://reader033.vdocuments.net/reader033/viewer/2022060806/608b4001c5ea8726584f62b6/html5/thumbnails/29.jpg)
Celsius Lecture2/14/13 29
Move Bits More Efficiently
![Page 30: Celsius Lecture 2/14/13 1 - Uppsala University · 2019. 9. 9. · Celsius Lecture 2/14/13 2 Exascale Computing Will Enable Transformational Science. Supercomputers are scientific](https://reader033.vdocuments.net/reader033/viewer/2022060806/608b4001c5ea8726584f62b6/html5/thumbnails/30.jpg)
Celsius Lecture2/14/13 30
Move Fewer Bits
forall cells in set { compute_x_flux(cell) ;
}forall cells in set {
compute_y_flux(cell) ;}forall cells in set {
compute_z_flux(cell) ;}forall cells in set {
compute_p(cell) ;}
![Page 31: Celsius Lecture 2/14/13 1 - Uppsala University · 2019. 9. 9. · Celsius Lecture 2/14/13 2 Exascale Computing Will Enable Transformational Science. Supercomputers are scientific](https://reader033.vdocuments.net/reader033/viewer/2022060806/608b4001c5ea8726584f62b6/html5/thumbnails/31.jpg)
Celsius Lecture2/14/13 31
Move Fewer Bits
forall cells in set { compute_x_flux(cell) ;compute_y_flux(cell) ;compute_z_flux(cell) ;compute_p(cell) ;
}
![Page 32: Celsius Lecture 2/14/13 1 - Uppsala University · 2019. 9. 9. · Celsius Lecture 2/14/13 2 Exascale Computing Will Enable Transformational Science. Supercomputers are scientific](https://reader033.vdocuments.net/reader033/viewer/2022060806/608b4001c5ea8726584f62b6/html5/thumbnails/32.jpg)
Celsius Lecture2/14/13 32
Move Fewer Bits
forall blocks in set {// hierarchicallylocalize(block)forall cells in block {
compute_x_flux(cell) ;compute_y_flux(cell) ;compute_z_flux(cell) ;compute_p(cell) ;
}}
![Page 33: Celsius Lecture 2/14/13 1 - Uppsala University · 2019. 9. 9. · Celsius Lecture 2/14/13 2 Exascale Computing Will Enable Transformational Science. Supercomputers are scientific](https://reader033.vdocuments.net/reader033/viewer/2022060806/608b4001c5ea8726584f62b6/html5/thumbnails/33.jpg)
Celsius Lecture2/14/13 33
System SketchSystem Sketch
![Page 34: Celsius Lecture 2/14/13 1 - Uppsala University · 2019. 9. 9. · Celsius Lecture 2/14/13 2 Exascale Computing Will Enable Transformational Science. Supercomputers are scientific](https://reader033.vdocuments.net/reader033/viewer/2022060806/608b4001c5ea8726584f62b6/html5/thumbnails/34.jpg)
Celsius Lecture2/14/13 34
Echelon Chip Floorplan
L2 Banks
XBAR
NOC
SMLa
ne
Lane
Lane
Lane
Lane
Lane
Lane
Lane
SMSM
DRAM I/O DRAM I/O DRAM I/O DRAM I/ONW I/O
LOC
NOC
SMSMSMSM
NOC
SMSMSMSM
NOCSMSMSMSM
NOC
SMSMSMSM
NOC
SMSMSMSM
NOC
SMSMSMSM
NOC
SMSMSMSM
NOC
SMSMSMSM
LOC
NOC
SM SM SM SM
NOC
SM SM SM SMNOC
SM SM SM SM
NOC
SM SM SM SM
NOC
SM SM SM SM
NOC
SM SM SM SM
NOC
SM SM SM SM
NOC
SM SM SM SM
LOC
NOCSMSMSMSM
NOC
SMSMSMSM
NOC
SMSMSMSM
NOC
SMSMSMSM
NOC
SMSMSMSM
NOC
SMSMSMSM
NOC
SMSMSMSM
NOC
SMSMSMSM
LOC
NOC
SM SM SM SM
NOC
SM SM SM SM
NOC
SM SM SM SM
NOC
SM SM SM SMNOC
SM SM SM SM
NOC
SM SM SM SM
NOC
SM SM SM SM
NOC
SM SM SM SM
LOC
NOC
SMSMSMSM
NOC
SMSMSMSM
NOC
SMSMSMSM
NOC
SMSMSMSM
NOC
SMSMSMSM
NOC
SMSMSMSMNOC
SMSMSMSM
NOC
SMSMSMSM
LOC
NOC
SM SM SM SM
NOC
SM SM SM SM
NOC
SM SM SM SM
NOC
SM SM SM SM
NOC
SM SM SM SM
NOC
SM SM SM SM
NOC
SM SM SM SM
NOC
SM SM SM SM
LOC
NOC
SMSMSMSM
NOC
SMSMSMSM
NOC
SMSMSMSM
NOC
SMSMSMSM
NOC
SMSMSMSM
NOC
SMSMSMSM
NOC
SMSMSMSM
NOC
SMSMSMSM
LOC
NOC
SM SM SM SM
NOC
SM SM SM SM
NOC
SM SM SM SM
NOC
SM SM SM SM
NOC
SM SM SM SM
NOC
SM SM SM SM
NOC
SM SM SM SM
NOC
SM SM SM SMDRAM I/O DRAM I/O DRAM I/O DRAM I/ONW I/O
DR
AM
I/OD
RA
M I/O
DR
AM
I/OD
RA
M I/O
NW
I/O
DR
AM
I/OD
RA
M I/O
DR
AM
I/OD
RA
M I/O
NW
I/O 17mm
10nm process290mm2
![Page 35: Celsius Lecture 2/14/13 1 - Uppsala University · 2019. 9. 9. · Celsius Lecture 2/14/13 2 Exascale Computing Will Enable Transformational Science. Supercomputers are scientific](https://reader033.vdocuments.net/reader033/viewer/2022060806/608b4001c5ea8726584f62b6/html5/thumbnails/35.jpg)
Celsius Lecture2/14/13 35
Overhead
![Page 36: Celsius Lecture 2/14/13 1 - Uppsala University · 2019. 9. 9. · Celsius Lecture 2/14/13 2 Exascale Computing Will Enable Transformational Science. Supercomputers are scientific](https://reader033.vdocuments.net/reader033/viewer/2022060806/608b4001c5ea8726584f62b6/html5/thumbnails/36.jpg)
Celsius Lecture2/14/13 36
4/11/11 Milad Mohammadi 36
An Out-of-Order CoreSpends 2nJ to schedule a 25pJ FMUL (or an 0.5pJ integer add)
![Page 37: Celsius Lecture 2/14/13 1 - Uppsala University · 2019. 9. 9. · Celsius Lecture 2/14/13 2 Exascale Computing Will Enable Transformational Science. Supercomputers are scientific](https://reader033.vdocuments.net/reader033/viewer/2022060806/608b4001c5ea8726584f62b6/html5/thumbnails/37.jpg)
Celsius Lecture2/14/13 37
SM Lane Architecture
ORF ORFORF
LS/BRFP/IntFP/Int
To LD/ST
L0AddrL1Addr
Net
LM Bank
0
To LD/ST
LM Bank
3
RFL0AddrL1Addr
Net
RF
Net
DataPath
L0I$
Thre
ad P
Cs
Act
ive
PCs
Inst
ControlPath
Sch
edul
er
64 threads4 active threads2 DFMAs (4 FLOPS/clock)ORF bank: 16 entries (128 Bytes)L0 I$: 64 instructions (1KByte)LM Bank: 8KB (32KB total)
![Page 38: Celsius Lecture 2/14/13 1 - Uppsala University · 2019. 9. 9. · Celsius Lecture 2/14/13 2 Exascale Computing Will Enable Transformational Science. Supercomputers are scientific](https://reader033.vdocuments.net/reader033/viewer/2022060806/608b4001c5ea8726584f62b6/html5/thumbnails/38.jpg)
Celsius Lecture2/14/13 38
Solving the Power Challenge – 1, 2, 3
![Page 39: Celsius Lecture 2/14/13 1 - Uppsala University · 2019. 9. 9. · Celsius Lecture 2/14/13 2 Exascale Computing Will Enable Transformational Science. Supercomputers are scientific](https://reader033.vdocuments.net/reader033/viewer/2022060806/608b4001c5ea8726584f62b6/html5/thumbnails/39.jpg)
Celsius Lecture2/14/13 39
Solving the ExaScale Power Problem
![Page 40: Celsius Lecture 2/14/13 1 - Uppsala University · 2019. 9. 9. · Celsius Lecture 2/14/13 2 Exascale Computing Will Enable Transformational Science. Supercomputers are scientific](https://reader033.vdocuments.net/reader033/viewer/2022060806/608b4001c5ea8726584f62b6/html5/thumbnails/40.jpg)
Celsius Lecture2/14/13 40
Parallelism
![Page 41: Celsius Lecture 2/14/13 1 - Uppsala University · 2019. 9. 9. · Celsius Lecture 2/14/13 2 Exascale Computing Will Enable Transformational Science. Supercomputers are scientific](https://reader033.vdocuments.net/reader033/viewer/2022060806/608b4001c5ea8726584f62b6/html5/thumbnails/41.jpg)
Celsius Lecture2/14/13 41
Parallel programming is not inherently any more difficult than serial programming
However, we can make it a lot more difficult
![Page 42: Celsius Lecture 2/14/13 1 - Uppsala University · 2019. 9. 9. · Celsius Lecture 2/14/13 2 Exascale Computing Will Enable Transformational Science. Supercomputers are scientific](https://reader033.vdocuments.net/reader033/viewer/2022060806/608b4001c5ea8726584f62b6/html5/thumbnails/42.jpg)
Celsius Lecture2/14/13 42
A simple parallel program
forall molecule in set { // launch a thread arrayforall neighbor in molecule.neighbors { // nested
forall force in forces { // doubly nestedmolecule.force = reduce_sum(force(molecule, neighbor))
}}
}
![Page 43: Celsius Lecture 2/14/13 1 - Uppsala University · 2019. 9. 9. · Celsius Lecture 2/14/13 2 Exascale Computing Will Enable Transformational Science. Supercomputers are scientific](https://reader033.vdocuments.net/reader033/viewer/2022060806/608b4001c5ea8726584f62b6/html5/thumbnails/43.jpg)
Celsius Lecture2/14/13 43
Why is this easy?
forall molecule in set { // launch a thread arrayforall neighbor in molecule.neighbors { // nested
forall force in forces { // doubly nestedmolecule.force = reduce_sum(force(molecule, neighbor))
}}
}
No machine detailsAll parallelism is expressedSynchronization is semantic (in reduction)
![Page 44: Celsius Lecture 2/14/13 1 - Uppsala University · 2019. 9. 9. · Celsius Lecture 2/14/13 2 Exascale Computing Will Enable Transformational Science. Supercomputers are scientific](https://reader033.vdocuments.net/reader033/viewer/2022060806/608b4001c5ea8726584f62b6/html5/thumbnails/44.jpg)
Celsius Lecture2/14/13 44
We could make it hard
pid = fork() ; // explicitly managing threads
lock(struct.lock) ; // complicated, error-prone synchronization// manipulate structunlock(struct.lock) ;
code = send(pid, tag, &msg) ; // partition across nodes
![Page 45: Celsius Lecture 2/14/13 1 - Uppsala University · 2019. 9. 9. · Celsius Lecture 2/14/13 2 Exascale Computing Will Enable Transformational Science. Supercomputers are scientific](https://reader033.vdocuments.net/reader033/viewer/2022060806/608b4001c5ea8726584f62b6/html5/thumbnails/45.jpg)
Celsius Lecture2/14/13 45
Programmers, tools, and architecture Need to play their positions
Programmer
Architectur eTools
![Page 46: Celsius Lecture 2/14/13 1 - Uppsala University · 2019. 9. 9. · Celsius Lecture 2/14/13 2 Exascale Computing Will Enable Transformational Science. Supercomputers are scientific](https://reader033.vdocuments.net/reader033/viewer/2022060806/608b4001c5ea8726584f62b6/html5/thumbnails/46.jpg)
Celsius Lecture2/14/13 46
Programmers, tools, and architecture Need to play their positions
Programmer
Architectur eTools
AlgorithmAll of the parallelismAbstract locality
Fast mechanismsExposed costs
Combinatorial optimizationMappingSelection of mechanisms
![Page 47: Celsius Lecture 2/14/13 1 - Uppsala University · 2019. 9. 9. · Celsius Lecture 2/14/13 2 Exascale Computing Will Enable Transformational Science. Supercomputers are scientific](https://reader033.vdocuments.net/reader033/viewer/2022060806/608b4001c5ea8726584f62b6/html5/thumbnails/47.jpg)
Celsius Lecture2/14/13 47
Programmers, tools, and architecture Need to play their positions
Programmer
Architectur eTools
forall molecule in set { // launch a thread arrayforall neighbor in molecule.neighbors { //
forall force in forces { // doubly nestedmolecule.force =
reduce_sum(force(molecule, neighbor))}
}}
Map foralls in time and spaceMap molecules across memoriesStage data up/down hierarchySelect mechanisms
Exposed storage hierarchyFast comm/sync/thread mechanisms
![Page 48: Celsius Lecture 2/14/13 1 - Uppsala University · 2019. 9. 9. · Celsius Lecture 2/14/13 2 Exascale Computing Will Enable Transformational Science. Supercomputers are scientific](https://reader033.vdocuments.net/reader033/viewer/2022060806/608b4001c5ea8726584f62b6/html5/thumbnails/48.jpg)
Celsius Lecture2/14/13 48
Abstract description of Locality – not mapping
compute_forces::inner(molecules, forces) {tunable N ;set part_molecules[N] ;part_molecules = subdivide(molecules, N) ;
forall(i in 0:N-1) {compute_forces(part_molecules[i]) ;
}
![Page 49: Celsius Lecture 2/14/13 1 - Uppsala University · 2019. 9. 9. · Celsius Lecture 2/14/13 2 Exascale Computing Will Enable Transformational Science. Supercomputers are scientific](https://reader033.vdocuments.net/reader033/viewer/2022060806/608b4001c5ea8726584f62b6/html5/thumbnails/49.jpg)
Celsius Lecture2/14/13 49
Abstract description of Locality – not mapping
compute_forces::inner(molecules, forces) {tunable N ;set part_molecules[N] ;part_molecules = subdivide(molecules, N) ;
forall(i in 0:N-1) {compute_forces(part_molecules) ;
}
Autotuner picks number and size of partitions - recursively
No need to worry about “ghost molecules”with global address space, it just works
![Page 50: Celsius Lecture 2/14/13 1 - Uppsala University · 2019. 9. 9. · Celsius Lecture 2/14/13 2 Exascale Computing Will Enable Transformational Science. Supercomputers are scientific](https://reader033.vdocuments.net/reader033/viewer/2022060806/608b4001c5ea8726584f62b6/html5/thumbnails/50.jpg)
Celsius Lecture2/14/13 50
Autotuning Search Spaces
T. Kisuki and P. M. W. Knijnenburg and Michael F. P. O'BoyleCombined Selection of Tile Sizes and Unroll Factors Using Iterative Compilation.In IEEE PACT, pages 237-248, 2000.
ExeExecution Time of Matrix Multiplication for Unrolling and Tiling
Architecture enables simple and effective autotuning
![Page 51: Celsius Lecture 2/14/13 1 - Uppsala University · 2019. 9. 9. · Celsius Lecture 2/14/13 2 Exascale Computing Will Enable Transformational Science. Supercomputers are scientific](https://reader033.vdocuments.net/reader033/viewer/2022060806/608b4001c5ea8726584f62b6/html5/thumbnails/51.jpg)
Celsius Lecture2/14/13 51
Performance of Auto-tuner
Conv2D SGEMM FFT3D SUmb
Cell Auto 96.4 129 57 10.5
Hand 85 119 54
Cluster Auto 26.7 91.3 5.5 1.65
Hand 24 90 5.5
Cluster of PS3s
Auto 19.5 32.4 0.55 0.49
Hand 19 30 0.23
Measured Raw Performance of Benchmarks: auto-tuner vs. hand-tuned version in GFLOPS.
For FFT3D, performances is with fusion of leaf tasks.
SUmb is too complicated to be hand-tuned.
![Page 52: Celsius Lecture 2/14/13 1 - Uppsala University · 2019. 9. 9. · Celsius Lecture 2/14/13 2 Exascale Computing Will Enable Transformational Science. Supercomputers are scientific](https://reader033.vdocuments.net/reader033/viewer/2022060806/608b4001c5ea8726584f62b6/html5/thumbnails/52.jpg)
Celsius Lecture2/14/13 52
Fundamental and Incidental Obstacles to Programmability
FundamentalExpressing 109 way parallelismExpressing locality to deal with >100:1 global:local energyBalancing load across 109 cores
IncidentalDealing with multiple address spacesPartitioning data across nodesAggregating data to amortize message overhead
![Page 53: Celsius Lecture 2/14/13 1 - Uppsala University · 2019. 9. 9. · Celsius Lecture 2/14/13 2 Exascale Computing Will Enable Transformational Science. Supercomputers are scientific](https://reader033.vdocuments.net/reader033/viewer/2022060806/608b4001c5ea8726584f62b6/html5/thumbnails/53.jpg)
Celsius Lecture2/14/13 53
The fundamental problems are hard enough. We must eliminate the incidental ones.
![Page 54: Celsius Lecture 2/14/13 1 - Uppsala University · 2019. 9. 9. · Celsius Lecture 2/14/13 2 Exascale Computing Will Enable Transformational Science. Supercomputers are scientific](https://reader033.vdocuments.net/reader033/viewer/2022060806/608b4001c5ea8726584f62b6/html5/thumbnails/54.jpg)
Celsius Lecture2/14/13 54
Execution ModelExecution Model
A B
Active Message
Abstract Memory
Hierarchy
Global Address Space
ThreadObject
B
Load
/Sto
re
A
B Bulk Xfer
![Page 55: Celsius Lecture 2/14/13 1 - Uppsala University · 2019. 9. 9. · Celsius Lecture 2/14/13 2 Exascale Computing Will Enable Transformational Science. Supercomputers are scientific](https://reader033.vdocuments.net/reader033/viewer/2022060806/608b4001c5ea8726584f62b6/html5/thumbnails/55.jpg)
Celsius Lecture2/14/13 55
Thread array creation, messages, block transfers, collective operations – at the “speed of light”
![Page 56: Celsius Lecture 2/14/13 1 - Uppsala University · 2019. 9. 9. · Celsius Lecture 2/14/13 2 Exascale Computing Will Enable Transformational Science. Supercomputers are scientific](https://reader033.vdocuments.net/reader033/viewer/2022060806/608b4001c5ea8726584f62b6/html5/thumbnails/56.jpg)
Celsius Lecture2/14/13 56
Kepler
Hardware thread-array creation
Fast syncthreads() ;
Shared memory
![Page 57: Celsius Lecture 2/14/13 1 - Uppsala University · 2019. 9. 9. · Celsius Lecture 2/14/13 2 Exascale Computing Will Enable Transformational Science. Supercomputers are scientific](https://reader033.vdocuments.net/reader033/viewer/2022060806/608b4001c5ea8726584f62b6/html5/thumbnails/57.jpg)
Celsius Lecture2/14/13 57
Scalar ISAs don’t matter
Parallel ISAs – the mechanisms for threads, communication, and synchronization make a huge difference.
![Page 58: Celsius Lecture 2/14/13 1 - Uppsala University · 2019. 9. 9. · Celsius Lecture 2/14/13 2 Exascale Computing Will Enable Transformational Science. Supercomputers are scientific](https://reader033.vdocuments.net/reader033/viewer/2022060806/608b4001c5ea8726584f62b6/html5/thumbnails/58.jpg)
Celsius Lecture2/14/13 58
A Prescription
![Page 59: Celsius Lecture 2/14/13 1 - Uppsala University · 2019. 9. 9. · Celsius Lecture 2/14/13 2 Exascale Computing Will Enable Transformational Science. Supercomputers are scientific](https://reader033.vdocuments.net/reader033/viewer/2022060806/608b4001c5ea8726584f62b6/html5/thumbnails/59.jpg)
Celsius Lecture2/14/13 59
Research
Need a research vehicle (experimental system)Co-design architecture, programming system, applications
Productive parallel programmingExpress all the parallelism and localityCompiler and run-time map to the target machineLeverage an existing eco-system
Mechanisms – for: threads, comm, syncEliminate ‘incidental’ programming issuesEnable fine-grain execution
PowerLocality – exposed memory hierarchy and software to use itOverhead – move scheduling to compiler
Others are investing, if we don’t invest we will be left behind.
![Page 60: Celsius Lecture 2/14/13 1 - Uppsala University · 2019. 9. 9. · Celsius Lecture 2/14/13 2 Exascale Computing Will Enable Transformational Science. Supercomputers are scientific](https://reader033.vdocuments.net/reader033/viewer/2022060806/608b4001c5ea8726584f62b6/html5/thumbnails/60.jpg)
Celsius Lecture2/14/13 60
Education
We need parallel programmersBut we are training serial programmersand serial thinkers
Parallelism throughout the CS curriculumProgrammingAlgorithms
Parallel algorithmsAnalysis focused on communications, not counting ops
Systems
Models need to include locality
![Page 61: Celsius Lecture 2/14/13 1 - Uppsala University · 2019. 9. 9. · Celsius Lecture 2/14/13 2 Exascale Computing Will Enable Transformational Science. Supercomputers are scientific](https://reader033.vdocuments.net/reader033/viewer/2022060806/608b4001c5ea8726584f62b6/html5/thumbnails/61.jpg)
Celsius Lecture2/14/13 61
A Bright Future from Supercomputers to Cellphones
Eliminate overhead and exploit locality to get 100x power efficiency
Easy parallelism with a coordinated team
ProgrammerToolsArchitectureHD Video
Decoder
HD VideoEncoder
Audio ISP
GPU
MEM I/O
HDMI
SecurityEngine
Display
Core 1
Core 3
Core 2
Core 4
Core 0
![Page 62: Celsius Lecture 2/14/13 1 - Uppsala University · 2019. 9. 9. · Celsius Lecture 2/14/13 2 Exascale Computing Will Enable Transformational Science. Supercomputers are scientific](https://reader033.vdocuments.net/reader033/viewer/2022060806/608b4001c5ea8726584f62b6/html5/thumbnails/62.jpg)
Celsius Lecture2/14/13 62
![Page 63: Celsius Lecture 2/14/13 1 - Uppsala University · 2019. 9. 9. · Celsius Lecture 2/14/13 2 Exascale Computing Will Enable Transformational Science. Supercomputers are scientific](https://reader033.vdocuments.net/reader033/viewer/2022060806/608b4001c5ea8726584f62b6/html5/thumbnails/63.jpg)
Celsius Lecture2/14/13 63
More Fundamentally
Both
are power limited
get performance from parallelism
need 100x performance increase in 10 years
![Page 64: Celsius Lecture 2/14/13 1 - Uppsala University · 2019. 9. 9. · Celsius Lecture 2/14/13 2 Exascale Computing Will Enable Transformational Science. Supercomputers are scientific](https://reader033.vdocuments.net/reader033/viewer/2022060806/608b4001c5ea8726584f62b6/html5/thumbnails/64.jpg)
Celsius Lecture2/14/13 64
More Fundamentally
Both
are power limited
get performance from parallelism
need 100x performance increase in 10 years
![Page 65: Celsius Lecture 2/14/13 1 - Uppsala University · 2019. 9. 9. · Celsius Lecture 2/14/13 2 Exascale Computing Will Enable Transformational Science. Supercomputers are scientific](https://reader033.vdocuments.net/reader033/viewer/2022060806/608b4001c5ea8726584f62b6/html5/thumbnails/65.jpg)
Celsius Lecture2/14/13 65
Granularity
![Page 66: Celsius Lecture 2/14/13 1 - Uppsala University · 2019. 9. 9. · Celsius Lecture 2/14/13 2 Exascale Computing Will Enable Transformational Science. Supercomputers are scientific](https://reader033.vdocuments.net/reader033/viewer/2022060806/608b4001c5ea8726584f62b6/html5/thumbnails/66.jpg)
Celsius Lecture2/14/13 66
#Threads increasing faster than problem size.
![Page 67: Celsius Lecture 2/14/13 1 - Uppsala University · 2019. 9. 9. · Celsius Lecture 2/14/13 2 Exascale Computing Will Enable Transformational Science. Supercomputers are scientific](https://reader033.vdocuments.net/reader033/viewer/2022060806/608b4001c5ea8726584f62b6/html5/thumbnails/67.jpg)
Celsius Lecture2/14/13 67
Number of Threads increasing faster than problem size
![Page 68: Celsius Lecture 2/14/13 1 - Uppsala University · 2019. 9. 9. · Celsius Lecture 2/14/13 2 Exascale Computing Will Enable Transformational Science. Supercomputers are scientific](https://reader033.vdocuments.net/reader033/viewer/2022060806/608b4001c5ea8726584f62b6/html5/thumbnails/68.jpg)
Celsius Lecture2/14/13 68
Number of Threads increasing faster than problem size
WeakScalingWeak
Scaling
![Page 69: Celsius Lecture 2/14/13 1 - Uppsala University · 2019. 9. 9. · Celsius Lecture 2/14/13 2 Exascale Computing Will Enable Transformational Science. Supercomputers are scientific](https://reader033.vdocuments.net/reader033/viewer/2022060806/608b4001c5ea8726584f62b6/html5/thumbnails/69.jpg)
Celsius Lecture2/14/13 69
Number of Threads increasing faster than problem size
WeakScalingWeak
ScalingStrongScalingStrongScaling
![Page 70: Celsius Lecture 2/14/13 1 - Uppsala University · 2019. 9. 9. · Celsius Lecture 2/14/13 2 Exascale Computing Will Enable Transformational Science. Supercomputers are scientific](https://reader033.vdocuments.net/reader033/viewer/2022060806/608b4001c5ea8726584f62b6/html5/thumbnails/70.jpg)
Celsius Lecture2/14/13 70
Smaller sub-problem per thread
![Page 71: Celsius Lecture 2/14/13 1 - Uppsala University · 2019. 9. 9. · Celsius Lecture 2/14/13 2 Exascale Computing Will Enable Transformational Science. Supercomputers are scientific](https://reader033.vdocuments.net/reader033/viewer/2022060806/608b4001c5ea8726584f62b6/html5/thumbnails/71.jpg)
Celsius Lecture2/14/13 71
Smaller sub-problem per thread
![Page 72: Celsius Lecture 2/14/13 1 - Uppsala University · 2019. 9. 9. · Celsius Lecture 2/14/13 2 Exascale Computing Will Enable Transformational Science. Supercomputers are scientific](https://reader033.vdocuments.net/reader033/viewer/2022060806/608b4001c5ea8726584f62b6/html5/thumbnails/72.jpg)
Celsius Lecture2/14/13 72
Smaller sub-problem per thread
More frequent comm, sync, and thread operations
![Page 73: Celsius Lecture 2/14/13 1 - Uppsala University · 2019. 9. 9. · Celsius Lecture 2/14/13 2 Exascale Computing Will Enable Transformational Science. Supercomputers are scientific](https://reader033.vdocuments.net/reader033/viewer/2022060806/608b4001c5ea8726584f62b6/html5/thumbnails/73.jpg)
Celsius Lecture2/14/13 73
Smaller sub-problem per thread
More frequent comm, sync, and thread operations
![Page 74: Celsius Lecture 2/14/13 1 - Uppsala University · 2019. 9. 9. · Celsius Lecture 2/14/13 2 Exascale Computing Will Enable Transformational Science. Supercomputers are scientific](https://reader033.vdocuments.net/reader033/viewer/2022060806/608b4001c5ea8726584f62b6/html5/thumbnails/74.jpg)
Celsius Lecture2/14/13 74
This fine-grain parallelism is multi- level and irregular
![Page 75: Celsius Lecture 2/14/13 1 - Uppsala University · 2019. 9. 9. · Celsius Lecture 2/14/13 2 Exascale Computing Will Enable Transformational Science. Supercomputers are scientific](https://reader033.vdocuments.net/reader033/viewer/2022060806/608b4001c5ea8726584f62b6/html5/thumbnails/75.jpg)
Celsius Lecture2/14/13 75
To support this requires fast mechanisms for
Thread arrays – create, terminate, suspend, resumeHardware allocation of resources to a thread array
threads, registers, shared memoryWith locality
CommunicationData movement up and down the hierarchyFast active messages (message-driven computing)
SynchronizationCollective operations (e.g., barrier, reduce)Pairwise (producer-consumer)
![Page 76: Celsius Lecture 2/14/13 1 - Uppsala University · 2019. 9. 9. · Celsius Lecture 2/14/13 2 Exascale Computing Will Enable Transformational Science. Supercomputers are scientific](https://reader033.vdocuments.net/reader033/viewer/2022060806/608b4001c5ea8726584f62b6/html5/thumbnails/76.jpg)
Celsius Lecture2/14/13 76
Execution ModelExecution Model
A B
Active Message
Abstract Memory
Hierarchy
Global Address Space
ThreadObject
B
Load
/Sto
re
A
B Bulk Xfer
![Page 77: Celsius Lecture 2/14/13 1 - Uppsala University · 2019. 9. 9. · Celsius Lecture 2/14/13 2 Exascale Computing Will Enable Transformational Science. Supercomputers are scientific](https://reader033.vdocuments.net/reader033/viewer/2022060806/608b4001c5ea8726584f62b6/html5/thumbnails/77.jpg)
J-Machine Speedup with Strong Scaling
Noakes et al. “The J-Machine Multicomputer: an Architectural Evaluation”, ISCA, 1993, pp.224-235
![Page 78: Celsius Lecture 2/14/13 1 - Uppsala University · 2019. 9. 9. · Celsius Lecture 2/14/13 2 Exascale Computing Will Enable Transformational Science. Supercomputers are scientific](https://reader033.vdocuments.net/reader033/viewer/2022060806/608b4001c5ea8726584f62b6/html5/thumbnails/78.jpg)
J-Machine Speedup with Strong Scaling
Noakes et al. “The J-Machine Multicomputer: an Architectural Evaluation”, ISCA, 1993, pp.224-235
2 characters per node