addressing future hpc demand with multi-core processors stephen s. pawlowski intel senior fellow gm,...
TRANSCRIPT
Addressing Future HPC Demand Addressing Future HPC Demand with Multi-core Processorswith Multi-core Processors
Stephen S. PawlowskiStephen S. PawlowskiIntel Senior FellowIntel Senior Fellow
GM, Architecture and PlanningGM, Architecture and PlanningCTO, Digital Enterprise GroupCTO, Digital Enterprise Group
September 5, 2007
22
Legal DisclaimerLegal DisclaimerINFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL® PRODUCTS. INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL® PRODUCTS. EXCEPT AS PROVIDED IN INTEL'S TERMS AND CONDITIONS OF SALE FOR SUCH PRODUCTS, EXCEPT AS PROVIDED IN INTEL'S TERMS AND CONDITIONS OF SALE FOR SUCH PRODUCTS, INTEL ASSUMES NO LIABILITY WHATSOEVER, AND INTEL DISCLAIMS ANY EXPRESS OR INTEL ASSUMES NO LIABILITY WHATSOEVER, AND INTEL DISCLAIMS ANY EXPRESS OR IMPLIED WARRANTY RELATING TO SALE AND/OR USE OF INTEL PRODUCTS, INCLUDING IMPLIED WARRANTY RELATING TO SALE AND/OR USE OF INTEL PRODUCTS, INCLUDING LIABILITY OR WARRANTIES RELATING TO FITNESS FOR A PARTICULAR PURPOSE, LIABILITY OR WARRANTIES RELATING TO FITNESS FOR A PARTICULAR PURPOSE, MERCHANTABILITY, OR INFRINGEMENT OF ANY PATENT, COPYRIGHT, OR OTHER MERCHANTABILITY, OR INFRINGEMENT OF ANY PATENT, COPYRIGHT, OR OTHER INTELLECTUAL PROPERTY RIGHT.INTELLECTUAL PROPERTY RIGHT.Intel products are not intended for use in medical, life saving, life sustaining, critical control or safety Intel products are not intended for use in medical, life saving, life sustaining, critical control or safety systems, or in nuclear facility applications.systems, or in nuclear facility applications.Intel Corporation may have patents or pending patent applications, trademarks, copyrights, or other Intel Corporation may have patents or pending patent applications, trademarks, copyrights, or other intellectual property rights that relate to the presented subject matter. The furnishing of documents and intellectual property rights that relate to the presented subject matter. The furnishing of documents and other materials and information does not provide any license, express or implied, by estoppel or other materials and information does not provide any license, express or implied, by estoppel or otherwise, to any such patents, trademarks, copyrights, or other intellectual property rights.otherwise, to any such patents, trademarks, copyrights, or other intellectual property rights.Intel may make changes to specifications, product descriptions, and plans at any time, without notice.Intel may make changes to specifications, product descriptions, and plans at any time, without notice.The processors, the chipsets, or the ICH components may contain design defects or errors known as The processors, the chipsets, or the ICH components may contain design defects or errors known as errata, which may cause the product to deviate from published specifications. Current characterized errata, which may cause the product to deviate from published specifications. Current characterized errata are available upon request.errata are available upon request.All dates provided are subject to change without notice. All dates specified are target dates, are provided All dates provided are subject to change without notice. All dates specified are target dates, are provided for planningfor planningpurposes only and are subject to change.purposes only and are subject to change.
Intel and are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United Intel and are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States and other countries.States and other countries.* Other names and brands may be claimed as the property of others.* Other names and brands may be claimed as the property of others.Copyright © 2007, Intel Corporation. All rights reserved.Copyright © 2007, Intel Corporation. All rights reserved.
33
Intel Is Listening…Intel Is Listening…
With multi-core processors, however, we finally get to a scheme where the With multi-core processors, however, we finally get to a scheme where the HEP execution profile shines. Thanks to the fact that our jobs are HEP execution profile shines. Thanks to the fact that our jobs are embarrassingly parallel (each physics event is independent of the others) embarrassingly parallel (each physics event is independent of the others) we can launch as many processes (or jobs) as there are cores on a die. we can launch as many processes (or jobs) as there are cores on a die. However, this requires that the memory size is increased to accommodate However, this requires that the memory size is increased to accommodate the extra processes (making the computers more expensive). As long as the extra processes (making the computers more expensive). As long as the memory traffic does not become a bottleneck, we see a practically the memory traffic does not become a bottleneck, we see a practically linear increase in the throughput on such a system. A ROOT/PROOF linear increase in the throughput on such a system. A ROOT/PROOF analysis demo elegantly demonstrated this during the launch of the quad-analysis demo elegantly demonstrated this during the launch of the quad-core chip at CERN.core chip at CERN.
Source: “Source: “Processors size up for physics at the LHC”Processors size up for physics at the LHC” Sverre Jarp,Sverre Jarp, CERN Courier, March 28, 2007CERN Courier, March 28, 2007
HEP – High Energy PhysicsLHC - Large Hadron Collider
44
Accelerating Multi- and Many-coreAccelerating Multi- and Many-core
Performance Through ParallelismPerformance Through Parallelism
Power delivery and management
High bandwidth memory
Reconfigurable cache
Scalable fabric
Fixed-function units
Big Core
Core
Core
Core
Core Core
Core Core
Core CoreBig Core
•Big cores for Single Thread Performance
Big cores for Single Thread Performance
•Small cores for Multi-Thread Performance
Small cores for Multi-Thread Performance
55
Memory Bandwidth DemandMemory Bandwidth DemandComputational Fluid Dynamic (CFD) - SplashComputational Fluid Dynamic (CFD) - Splash
00
500500
10001000
15001500
20002000
25002500
30003000
35003500
40004000
45004500
Mem
ory
Bandw
idth
(G
B/S
ec)
Mem
ory
Bandw
idth
(G
B/S
ec)
Com
puta
tion (
GFl
ops)
Com
puta
tion (
GFl
ops)
92 277
738
2215
75
x5
0x5
0,
10
fps
15
0x1
00
x1
00
, 1
0fp
s
75
x5
0x5
0,
30
fps
15
0x1
00
x1
00
, 3
0fp
s50005000
Source: Intel Labs
547
1642
4380
131401000010000
2500025000
1 TFLOPOr
1 TB/s
0.17 Byte/Flop
DD
R3
Bandw
idth
(G
B/S
ec)
DD
R3
Bandw
idth
(G
B/S
ec)
Ass
um
ing f
requency
: 2
13
3M
Hz
Ass
um
ing f
requency
: 2
13
3M
Hz
34 68 136272
544
1088
2,4,8,16,32,64 DDR3
channels
66
Increasing Signaling Rate Increasing Signaling Rate More Bandwidth & Less PowerMore Bandwidth & Less Power
Differential Copper Interconnect & Silicon Photonics
10
11
12
0 1333
Signaling Rate MHz
Po
wer
(m
W/G
bp
s)
State
of t
he ar
t DDR
1066 1600 1867 2133 2400 2600 2800
Single-ended Interconnect
Future
State of the Art with FBD (@25mW/Gbps & 5Gb/s):100 GB/sec ~ 1 Tb/sec = 1,000 Gb/sec 25mw/Gb/sec = 25 WattsBus-width = 1,000 Gb/sec / 5 = 200, about 400 pins (differential)
0
5
10
15
20
25
30
0 5 10 15 20Signaling Rate GBit/sec
Po
wer
(m
W/G
bp
s)
State of the art with FBD
Future Copper Interconnect
Optical today
40
Silicon Photonics Vision
40
77
Addressing Memory BandwidthAddressing Memory Bandwidth
Bringing Memory Closer to the CoresBringing Memory Closer to the Cores
Package
DRAM
CPU
Heat-sink
Last Level Cache
Fast DRAM
Memory on Package
*Future Vision, does not represent real Intel product
3D Memory Stacking
Package
Si Chip Si Chip
88
Photonics For Memory BW and CapacityPhotonics For Memory BW and CapacityHigh Performance with Remote MemoryHigh Performance with Remote Memory
RxRx
TxTx
The Indium Phosphide emits the light into the silicon waveguide
The silicon acts as laser cavity - Light bounces back and forth and get amplified by InP based material
integrated silicon integrated silicon photonic chipphotonic chip
Remote Memory Blade Remote Memory Blade integrated into a systemintegrated into a system
Integrated Tb/s Optical Link on a Single ChipIntegrated Tb/s Optical Link on a Single Chip
99 2008 2010 2012 2014
Increasing Ethernet BandwidthIncreasing Ethernet Bandwidth
TPC-H
SPECweb05
SPECweb05
TPC-C
SPECjApps
SPECjAppsSPECjApps
SPECjApps
10 Gb/s
20Gb/s
SPECweb05
TPC-H
40Gb/s
30Gb/s
50Gb/s
60Gb/s
70Gb/s
80Gb/s
SPECweb05TPC-H
TPC-C
TPC-C
TPC-CTPC-H
Source: Intel, 2006. TPC & SPEC are standard server application benchmarks
Today server I/O is fragmented•1GbE performance doesn’t meet current server I/O requirements•10GbE is still too expensive•2/4G Fibre Channel & 10G Infiniband are being deployed to meet the growing I/O demands for storage & HPC clusters
In 2-4 Years, convergence on 10GbE looks promising•10GBASE-T availability will drive costs down•10GbE will offer a compelling value-proposition for LAN, SAN, & HPC cluster connections
7-10 Years•Look beyond to 40GbE and 100GbE~2x Perf G
ain
~2x Perf Gain
Every 2
Every 2
YearsYears
1010
Outside the Box:Outside the Box:HPC Networking with Optical CablesHPC Networking with Optical Cables
Benefit Over Copper:Benefit Over Copper: Scalable, BW, throughputScalable, BW, throughput Longer distance – today up to 100mLonger distance – today up to 100m Higher reliability: 10ˉHigher reliability: 10ˉ15 15 Bit Error Rate Bit Error Rate
(BER) or lower(BER) or lower Smaller & lighter form factorSmaller & lighter form factor
Electrical
Socket
Optical Transceiver
in Plug
Optical CableOptical Cable Optical Transceiver
in PlugUp to 100m
Electrical
Socket
2007 2009 2011 2013
20 Gb/s
40Gb/s
100Gb/s80Gb/s
120Gb/s
Doubling the Data Rate Every 2 YearsDoubling the Data Rate Every 2 Years
1111
Power Aware EverywherePower Aware Everywhere
SiliconSilicon Heat SinksHeat Sinks
SystemsSystemsFacilitiesFacilities
PackagesPackages
Silicon:Silicon:
Moore’s law, Strained silicon, Transistor Moore’s law, Strained silicon, Transistor leakage control techniques, Hi-K Dielectric, leakage control techniques, Hi-K Dielectric, Clock gating, etc.Clock gating, etc.
Processor and System Power:Processor and System Power:
Multi-core, Integrated Voltage Regulators Multi-core, Integrated Voltage Regulators (IVR), Fine grain power management, etc.(IVR), Fine grain power management, etc.
Facilities:Facilities:
Efficient Power Conversion and Cooling
1212
Reliability Challenges & VisionReliability Challenges & Vision
Source: IntelSource: Intel
+ V+ V
++ -- ++--
-- ++ --++
Depletion RegionDepletion RegionDriftDrift
DiffusionDiffusion
Ion PathIon Path
An exponential growth in FIT/chipAn exponential growth in FIT/chip • FIT/bit of memory cellFIT/bit of memory cell:: expected expected
to be roughly constant, but, to be roughly constant, but, • Moore’s lawMoore’s law:: increasing the bit increasing the bit
count exponentially: 2x every 2 count exponentially: 2x every 2 years years
Source: IntelSource: Intel
Soft Error: one of many reliability challengesSoft Error: one of many reliability challenges
Process TechniquesProcess Techniques
Device Param TuningDevice Param Tuning Rad-hard Cell CreationRad-hard Cell Creation
Circuit TechniquesCircuit Techniques
Architectural TechniquesArchitectural Techniques
Micro SolutionsMicro Solutions Macro SolutionsMacro Solutions
Parity Parity SECDED ECC SECDED ECC
ππ bit bit
Parity Parity SECDED ECC SECDED ECC
ππ bit bit
LocksteppingLocksteppingRedundant multithreading (RMT)Redundant multithreading (RMT)
Redundant multi-core CPURedundant multi-core CPU
LocksteppingLocksteppingRedundant multithreading (RMT)Redundant multithreading (RMT)
Redundant multi-core CPURedundant multi-core CPU
State-of-Art ProcessesState-of-Art Processes
1313
What Can We Expect?!What Can We Expect?!
10 PFlops10 PFlops
1 PFlops1 PFlops
100 TFlops100 TFlops
10 TFlops10 TFlops
1 TFlops1 TFlops
100 GFlops100 GFlops
10 GFlops10 GFlops
1 GFlops1 GFlops
100 MFlops100 MFlops
100 PFlops100 PFlops
10 EFlops10 EFlops
1 EFlops1 EFlops
100 EFlops100 EFlops
19931993 2017201719991999 20052005 20112011 20232023
HPCHPC
1 ZFlops1 ZFlops
#1#1
20292029
Background picture from CERN OpenLab, Intel HPC Roundtable ‘06
SUMSUMOf Of
Top500Top500
Beyond PetascaleBeyond Petascale
CERN: 115th of Top500 in June CERN: 115th of Top500 in June 20072007source: CERN Courier, Aug. 20, 2007source: CERN Courier, Aug. 20, 2007
1414
What can Intel do for you?What can Intel do for you?