numascale stream benchmark podcast
TRANSCRIPT
2
Distributed Memory (Clusters)
Network
Partial views of the Data Set
Requires Explicit Message Passing to exchange data between processes
3
Shared MemoryShared view of the Entire Data Set
All processes can access all data directly with standard load/store instructions
4
Standard Cluster Architecture
Caches
CPUs
I/O
Memory
Caches Caches
CPUs
I/O
Memory
Caches Caches
CPUs
I/O
Memory
Caches Caches
CPUs
I/O
Memory
Caches
Network Switch
Standard Server
7
Numascale System Architecture
Shared Resources -‐ One Single Operating System Image
Caches
CPUs
I/O
Memory
Caches Caches
CPUs
I/O
Memory
Caches Caches
CPUs
I/O
Memory
Caches
I/O
Memory
Caches
CPUs
Caches
NumaConnect Fabric -‐ On-‐Chip Distributed Switching
NumaChip
Numa Cache
NumaChip
Numa Cache
Numa Cache
NumaChip
Numa Cache
NumaChip
9
NumaConnect™ Node Configuration
NumaChip MemoryNuma Cache+Tags
Multi-Core CPU
I/O Bridge
MemoryMemoryMemoryMemory
Multi-Core CPU
MemoryMemoryMemoryMemory
Coherent HyperTransport
6 x4 SERDES links
10
NumaConnect™ System Architecture
6 external links - flexible system configurations in multi-dimensional topologies
Multi-CPU Node
Num
aChi
pM
emor
yN
uma
Cac
he
Mul
ti-Cor
e CPU
I/O
Bridg
e
Mem
ory
Mem
ory
Mem
ory
Mem
ory
Mul
ti-Cor
e CPUMem
ory
Mem
ory
Mem
ory
Mem
ory
13
2-D Dataflow
RequestResponse
CPUs
CachesNumaChip
MemoryMemoryMemoryMemory
CPUs
Caches
MemoryMemoryMemoryMemory
NumaChip
CPUs
Caches
MemoryMemoryMemoryMemory
NumaChip
14
Stream Benchmark
• 108 nodes • 6 x 6 x 3 Torus • 5 184 CPU cores • 58 TFlops • 20.7 TBytes Shared Memory • Single Image OS • 10TB/s Memory BW
15
McCalpin Stream Benchmark
--------------------------------------------------------------------------------------------Sub. Date Machine ID ncpus COPY SCALE ADD TRIAD--------------------------------------------------------------------------------------------2015.04.08 NumaConnect_648node 1296 9139226.2 10062237.6 8985643.0 8871850.0
COPY: 9.1 3TBytes/s SCALE: 10.06TBytes/s ADD: 8.98TBytes/s TRIAD: 8.87TBytes/s
-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐ Function Best Rate MB/s Avg time Min time Max time Copy: 9139226.2 0.240336 0.236344 0.246761 Scale: 10062237.6 0.217982 0.214664 0.220526 Add: 8985643.0 0.361473 0.360575 0.363363 Triad: 8871850.0 0.366032 0.365200 0.366646 -‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐
16
Standard Cluster Architecture
Caches
CPUs
I/O
Memory
Caches Caches
CPUs
I/O
Memory
Caches Caches
CPUs
I/O
Memory
Caches Caches
CPUs
I/O
Memory
Caches
Network Switch
Standard Server