numascale stream benchmark podcast

17
NumaConnect Technology Einar Rustad, CTO & Co-Founder January 2015

Upload: insidehpc

Post on 05-Aug-2015

147 views

Category:

Technology


0 download

TRANSCRIPT

NumaConnect Technology

Einar Rustad, CTO & Co-Founder

January 2015

2

Distributed Memory (Clusters)

Network  

Partial views of the Data Set

Requires Explicit Message Passing to exchange data between processes

3

Shared MemoryShared view of the Entire Data Set

All processes can access all data directly with standard load/store instructions

4

Standard  Cluster  Architecture

Caches  

CPUs

I/O

Memory

Caches Caches  

CPUs

I/O

Memory

Caches Caches  

CPUs

I/O

Memory

Caches Caches  

CPUs

I/O

Memory

Caches

Network  Switch

Standard  Server

5

NumaConnect-1 Card

6

NumaConnect in Supermicro 1042

7

Numascale  System  Architecture

Shared  Resources  -­‐  One  Single  Operating  System  Image

Caches  

CPUs

I/O

Memory

Caches Caches  

CPUs

I/O

Memory

Caches Caches  

CPUs

I/O

Memory

Caches

I/O

Memory

Caches  

CPUs

Caches

NumaConnect  Fabric  -­‐  On-­‐Chip  Distributed  Switching

 NumaChip

Numa Cache

NumaChip

Numa Cache

Numa Cache

NumaChip

Numa Cache

NumaChip

8

Cabling Example

9

NumaConnect™ Node Configuration

NumaChip MemoryNuma Cache+Tags

Multi-Core CPU

I/O Bridge

MemoryMemoryMemoryMemory

Multi-Core CPU

MemoryMemoryMemoryMemory

Coherent HyperTransport

6  x4  SERDES  links

10

NumaConnect™  System  Architecture

6 external links - flexible system configurations in multi-dimensional topologies

Multi-CPU Node

Num

aChi

pM

emor

yN

uma

Cac

he

Mul

ti-Cor

e CPU

I/O

Bridg

e

Mem

ory

Mem

ory

Mem

ory

Mem

ory

Mul

ti-Cor

e CPUMem

ory

Mem

ory

Mem

ory

Mem

ory

11

Scalable Torus Topologies

12

System Partitioning

•Independent  OS  Instances  •Shared  Fabric  •Re-­‐Partitioning  at  Boot-­‐Time

13

2-D Dataflow

RequestResponse

CPUs

CachesNumaChip

MemoryMemoryMemoryMemory

CPUs

Caches

MemoryMemoryMemoryMemory

NumaChip

CPUs

Caches

MemoryMemoryMemoryMemory

NumaChip

14

Stream  Benchmark

• 108  nodes  • 6  x  6  x  3  Torus  • 5  184  CPU  cores  • 58  TFlops  • 20.7  TBytes  Shared  Memory  • Single  Image  OS  • 10TB/s  Memory  BW

15

McCalpin  Stream  Benchmark

--------------------------------------------------------------------------------------------Sub. Date Machine ID ncpus COPY SCALE ADD TRIAD--------------------------------------------------------------------------------------------2015.04.08 NumaConnect_648node 1296 9139226.2 10062237.6 8985643.0 8871850.0

COPY:            9.1  3TBytes/s  SCALE:    10.06TBytes/s  ADD:              8.98TBytes/s  TRIAD:        8.87TBytes/s

-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐  Function  Best  Rate  MB/s    Avg  time          Min  time          Max  time  Copy:                  9139226.2          0.240336          0.236344          0.246761  Scale:              10062237.6          0.217982          0.214664          0.220526  Add:                      8985643.0          0.361473          0.360575          0.363363  Triad:                  8871850.0          0.366032          0.365200          0.366646  -­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐  

16

Standard  Cluster  Architecture

Caches  

CPUs

I/O

Memory

Caches Caches  

CPUs

I/O

Memory

Caches Caches  

CPUs

I/O

Memory

Caches Caches  

CPUs

I/O

Memory

Caches

Network  Switch

Standard  Server

17

Scale-Up and Scale-Out Capacity

• Single  System  Image  or  Multiple  Partitions  in  one  Fabric  

• Max  Numbers  - 256  TeraBytes  Physical  Address  Space  - 4096  Nodes  - 196  608  cores