reconfigurable computing: a first look at the cray-xd1 mitch sukalski, david thompson, rob...

18
Reconfigurable Computing: A First Look at the Cray-XD1 Mitch Sukalski, David Thompson, Rob Armstrong, Curtis Janssen, and Matt Leininger Orgs: 8961 & 8963 September 1, 2004 Craig Ulmer

Upload: gabriel-leonard

Post on 02-Jan-2016

217 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: Reconfigurable Computing: A First Look at the Cray-XD1 Mitch Sukalski, David Thompson, Rob Armstrong, Curtis Janssen, and Matt Leininger Orgs: 8961 & 8963

Reconfigurable Computing:A First Look at the Cray-XD1

Mitch Sukalski, David Thompson, Rob Armstrong,Curtis Janssen, and Matt Leininger

Orgs: 8961 & 8963

September 1, 2004

Craig Ulmer

Page 2: Reconfigurable Computing: A First Look at the Cray-XD1 Mitch Sukalski, David Thompson, Rob Armstrong, Curtis Janssen, and Matt Leininger Orgs: 8961 & 8963

Outline

• Reconfigurable computing refresher– Progress update

• Cray XD1– Architecture

– General message passing

– Reconfigurable Computing and the XD1

Page 3: Reconfigurable Computing: A First Look at the Cray-XD1 Mitch Sukalski, David Thompson, Rob Armstrong, Curtis Janssen, and Matt Leininger Orgs: 8961 & 8963

Reconfigurable Computing Update

Page 4: Reconfigurable Computing: A First Look at the Cray-XD1 Mitch Sukalski, David Thompson, Rob Armstrong, Curtis Janssen, and Matt Leininger Orgs: 8961 & 8963

Reconfigurable Computing

• Use reconfigurable hardware devices to implement key computations in hardware

double doX( double *a, int n) {int i;double x;

x=0;for(i=0;i<n;i+=3){

x+= a[i] * a[i+1] + a[i+2];…

}…

return x;}

* +

+

a[i] a[i+1]

Z -1

a[i+2]

Page 5: Reconfigurable Computing: A First Look at the Cray-XD1 Mitch Sukalski, David Thompson, Rob Armstrong, Curtis Janssen, and Matt Leininger Orgs: 8961 & 8963

First Year Progress

• Computation (Underwood SNL/NM)– Double-precision Floating Point Cores

• Communication– Multi-gigabit Transceiver (MGT) interface– Gigabit Ethernet work

• Early application experiments– Simplified isosurfacing– Networked pattern matching

Page 6: Reconfigurable Computing: A First Look at the Cray-XD1 Mitch Sukalski, David Thompson, Rob Armstrong, Curtis Janssen, and Matt Leininger Orgs: 8961 & 8963

Peak Floating-Point Performance

Core

Single Precision Double Precision

SpeedCores per V2P100-6

Peak Performance

SpeedCores per V2P100-6

Peak Performance

Addition 195 MHz 89 17 GFLOPS 143 MHz 40 5.7 GFLOPS

Multiplication 176 MHz 74 13 GFLOPS 142 MHz 27 3.8 GFLOPS

Division 120 MHz 22 2.6 GFLOPS 98 MHz 6 0.58 GFLOPS

From Underwood’s, “FPGAs vs. CPUs: Trends in Peak Floating-Point Performance,” in FPGA’04

Page 7: Reconfigurable Computing: A First Look at the Cray-XD1 Mitch Sukalski, David Thompson, Rob Armstrong, Curtis Janssen, and Matt Leininger Orgs: 8961 & 8963

Connecting FPGAs to the Network Fabric

• Modern FPGAs feature multi-gigabit transceivers– Experimented with GigE, Myrinet 2000, and IB– Implemented TCP Offload Engine (TOE) in hardware– Working on OpenTOE and OpenGigE cores

MGTControl

TxIP Header

ARPPing

ARPCache

MAC Framer

Align

CRC

Rx

CRCGT_Ethernet_2

Rocket I/OMGT

Pad

PingReply

CRC

DecodeIncoming Data Queue

TimeoutMonitor

SEQGen

ACKMonitor

CRCGen

ARPReply

Outgoing Data Queue

SNL_OpenTOE

TCP

I/F

Socket

I/F

SNL_OpenGigE

Page 8: Reconfigurable Computing: A First Look at the Cray-XD1 Mitch Sukalski, David Thompson, Rob Armstrong, Curtis Janssen, and Matt Leininger Orgs: 8961 & 8963

Cray XD1 Overview

Page 9: Reconfigurable Computing: A First Look at the Cray-XD1 Mitch Sukalski, David Thompson, Rob Armstrong, Curtis Janssen, and Matt Leininger Orgs: 8961 & 8963

NDA Notice

We do have an NDA with Cray Canada

The XD1 we have on loan is an early Beta system

Page 10: Reconfigurable Computing: A First Look at the Cray-XD1 Mitch Sukalski, David Thompson, Rob Armstrong, Curtis Janssen, and Matt Leininger Orgs: 8961 & 8963

Cray XD1 Overview

• Dense MP system– 12 AMD Opterons on 6 blades

– 6 Xilinx Virtex-II/Pro FPGAs

– InfiniBand-like interconnect

– 6 SATA hard drives

– 4 PCI-X slots

– 3U Rack

Page 11: Reconfigurable Computing: A First Look at the Cray-XD1 Mitch Sukalski, David Thompson, Rob Armstrong, Curtis Janssen, and Matt Leininger Orgs: 8961 & 8963

Individual Blade

DDRMemory

DDRMemory

RAPNI

Opteron Opteron

RapidArray Fabric(24 4x IB Ports)

* All data rates are aggregates (i.e., 3.2 GB/s = 1.6 GB/s + 1.6 GB/s)

HT: 3.2 GB/s

4xIB: 2 GB/s

HT: 6.4 GB/s

“Einstein”Chip

“HT”: 3.2 GB/s

RAPNI

RapidArray Fabric(24 4x IB Ports)

Page 12: Reconfigurable Computing: A First Look at the Cray-XD1 Mitch Sukalski, David Thompson, Rob Armstrong, Curtis Janssen, and Matt Leininger Orgs: 8961 & 8963

Message Passing

• MPICH 1.2.5– Latency: 2.25 μs– Bandwidth: 1.3 GB/s

(82% of HT-IB link)

• RapidArray message layer– Open source– MP, RDMA– Global address space

0

200

400

600

800

1,000

1,200

1,400

1,600

1,800

1 10 100 1,000 10,000 100,000 1,000,000 10,000,000

Message Size (Bytes)

Ba

nd

wid

th (

Mill

ion

Byt

es/

s)

MPI Bandwidth

PCI-X 133

1.6GB/s HT

Page 13: Reconfigurable Computing: A First Look at the Cray-XD1 Mitch Sukalski, David Thompson, Rob Armstrong, Curtis Janssen, and Matt Leininger Orgs: 8961 & 8963

System Administration

• Active manager– Synchronize each node’s OS

– Partition blade functionality

– Control access rights

• Embedded processor– Monitors health (heartbeats)

– Can restart nodes

• Issues?

Page 14: Reconfigurable Computing: A First Look at the Cray-XD1 Mitch Sukalski, David Thompson, Rob Armstrong, Curtis Janssen, and Matt Leininger Orgs: 8961 & 8963

Reconfigurable Computing and the Cray XD1

Page 15: Reconfigurable Computing: A First Look at the Cray-XD1 Mitch Sukalski, David Thompson, Rob Armstrong, Curtis Janssen, and Matt Leininger Orgs: 8961 & 8963

Connecting to the “Einstein” Accelerator

RAPNI

Host HT

Net IB

HTUser-defined

Circuits

FPGA

HTI/F

FPGAPort

FabricPort

1.6+1.6GB/s

QDR2I/F

QDR2I/F

QDR2I/F

QDR2I/F

2MBSRAM

2MBSRAM

2MBSRAM

2MBSRAM

1.6+1.6GB/s

Page 16: Reconfigurable Computing: A First Look at the Cray-XD1 Mitch Sukalski, David Thompson, Rob Armstrong, Curtis Janssen, and Matt Leininger Orgs: 8961 & 8963

Example: Random Number Generator

• Monte Carlo app in need of good random numbers– Mersenne twister

• Implemented in FPGA– FPGA pushes to host memory

– 301 vs 101 Million Integers/s

– ~1.2 GB/sNI

CPUHost

Memory

RNG

FPGA

Page 17: Reconfigurable Computing: A First Look at the Cray-XD1 Mitch Sukalski, David Thompson, Rob Armstrong, Curtis Janssen, and Matt Leininger Orgs: 8961 & 8963

General XD1 Comments

• Reconfigurable computing– FPGA in memory

– Fast local memory

• Other accelerators– ClearSpeed

• Global address space– Opteron limits (40b PA)

• Vendor lock-in– Incompatible network

– All-in-one box?

• Current NI is a bottleneck

• Density vs. Reliability

• Value-added features

Good Not-so-Good

Page 18: Reconfigurable Computing: A First Look at the Cray-XD1 Mitch Sukalski, David Thompson, Rob Armstrong, Curtis Janssen, and Matt Leininger Orgs: 8961 & 8963

Friendly Users?

• We have a month left on evaluation– Could use feedback from other users

http://cdulmer.ran.sandia.gov/xd1 [email protected]