reconfigurable computing: hpc network aspects mitch sukalski (8961) david thompson (8963) craig...
DESCRIPTION
RC Challenge #1: Floating Point Most FPGAs fine grained Floating point units are large –32b FP occupies ~1,000 CLBs –Commercial capacity improving 2000: 6,000 CLBs 2003: 40,000 CLBs (Max: 220,000) Keith Underwood at Sandia/NM –LDRD: Working on high-speed 64b floating-point cores 32b FP in Xilinx V2P7TRANSCRIPT
![Page 1: Reconfigurable Computing: HPC Network Aspects Mitch Sukalski (8961) David Thompson (8963) Craig Ulmer (8963) Pete Dean R&D Seminar December](https://reader035.vdocuments.net/reader035/viewer/2022070605/5a4d1acd7f8b9ab059970475/html5/thumbnails/1.jpg)
Reconfigurable Computing: HPC Network Aspects
Mitch Sukalski (8961)David Thompson (8963)
Craig Ulmer (8963)[email protected]
Pete Dean R&D SeminarDecember 11, 2003
![Page 2: Reconfigurable Computing: HPC Network Aspects Mitch Sukalski (8961) David Thompson (8963) Craig Ulmer (8963) Pete Dean R&D Seminar December](https://reader035.vdocuments.net/reader035/viewer/2022070605/5a4d1acd7f8b9ab059970475/html5/thumbnails/2.jpg)
FPGAs are promising…
But what’s the catch?
There are three main challenges that need to be addressed in order to apply to practical, scientific computing.
![Page 3: Reconfigurable Computing: HPC Network Aspects Mitch Sukalski (8961) David Thompson (8963) Craig Ulmer (8963) Pete Dean R&D Seminar December](https://reader035.vdocuments.net/reader035/viewer/2022070605/5a4d1acd7f8b9ab059970475/html5/thumbnails/3.jpg)
RC Challenge #1: Floating Point
• Most FPGAs fine grained
• Floating point units are large– 32b FP occupies ~1,000 CLBs– Commercial capacity improving
• 2000: 6,000 CLBs• 2003: 40,000 CLBs (Max: 220,000)
• Keith Underwood at Sandia/NM– LDRD: Working on high-speed 64b floating-point cores
32b FP in Xilinx V2P7
![Page 4: Reconfigurable Computing: HPC Network Aspects Mitch Sukalski (8961) David Thompson (8963) Craig Ulmer (8963) Pete Dean R&D Seminar December](https://reader035.vdocuments.net/reader035/viewer/2022070605/5a4d1acd7f8b9ab059970475/html5/thumbnails/4.jpg)
RC Challenge #2: Design Tools
• Hardware design is non-trivial– Micromanage computations, clock-by-clock– Not appropriate for most scientists– Need languages, APIs that are easy to use
• Maya Gokhale at LANL– Streams-C: C-like language for HW design– Pipeline/unroll loops– Schedules access to external memory
![Page 5: Reconfigurable Computing: HPC Network Aspects Mitch Sukalski (8961) David Thompson (8963) Craig Ulmer (8963) Pete Dean R&D Seminar December](https://reader035.vdocuments.net/reader035/viewer/2022070605/5a4d1acd7f8b9ab059970475/html5/thumbnails/5.jpg)
RC Challenge #3: High-speed I/O
• FPGAs have large internal computational power– How do we get data into/out of FPGA?– How do we connect to our existing HPC machines?
• Mitch Sukalski, David Thompson, Craig Ulmer– LDRD: Connect FPGAs to high-performance SANs
FPGAFPGA
![Page 6: Reconfigurable Computing: HPC Network Aspects Mitch Sukalski (8961) David Thompson (8963) Craig Ulmer (8963) Pete Dean R&D Seminar December](https://reader035.vdocuments.net/reader035/viewer/2022070605/5a4d1acd7f8b9ab059970475/html5/thumbnails/6.jpg)
Outline
• Where we have beenNetworking FPGAs using external NI cards
• Where we are goingNetworking FPGAs using internal transceivers
• Project statusEarly details
![Page 7: Reconfigurable Computing: HPC Network Aspects Mitch Sukalski (8961) David Thompson (8963) Craig Ulmer (8963) Pete Dean R&D Seminar December](https://reader035.vdocuments.net/reader035/viewer/2022070605/5a4d1acd7f8b9ab059970475/html5/thumbnails/7.jpg)
Previous Work
Where we’ve been..
![Page 8: Reconfigurable Computing: HPC Network Aspects Mitch Sukalski (8961) David Thompson (8963) Craig Ulmer (8963) Pete Dean R&D Seminar December](https://reader035.vdocuments.net/reader035/viewer/2022070605/5a4d1acd7f8b9ab059970475/html5/thumbnails/8.jpg)
Networking Earlier FPGAs
• Previous generation of FPGAs were like blank ASICs– Configurable logic and pins
• Attach a network card to an FPGA card– Communication over PCI
• Examples:– Virginia Tech: Myrinet– Washington U. in St. Louis: ATM (inline)– Clemson University: Gigabit Ethernet– Georgia Tech: Myrinet
CPU
FPGA
NIC
PC
I Bus
![Page 9: Reconfigurable Computing: HPC Network Aspects Mitch Sukalski (8961) David Thompson (8963) Craig Ulmer (8963) Pete Dean R&D Seminar December](https://reader035.vdocuments.net/reader035/viewer/2022070605/5a4d1acd7f8b9ab059970475/html5/thumbnails/9.jpg)
GRIM Project at Georgia Tech
• Add multimedia devices to cluster– Message layer connects
CPUs, memory, and peripherals– Myrinet between hosts,
PCI within hosts
• Celoxica RC-1000 FPGA– Virtex FPGA (1M logic gates)– Four SRAM banks – PCI w/ PMC
SRAM0
SRAM1
SRAM2
SRAM3
PCIFPGAControl & Switching
CPUCPU
CPU CPU CPU
CPU
FPGA
RAID
FPGAFPGA
Ethernet
GRIM
![Page 10: Reconfigurable Computing: HPC Network Aspects Mitch Sukalski (8961) David Thompson (8963) Craig Ulmer (8963) Pete Dean R&D Seminar December](https://reader035.vdocuments.net/reader035/viewer/2022070605/5a4d1acd7f8b9ab059970475/html5/thumbnails/10.jpg)
FPGA Organization
Frame
Incoming Message Queues
OutgoingMessage Queues
Communication Library API
ApplicationData
Memory API
FPGA Card Memory
FPGACircuit Canvas
User Circuit API
User Circuit n
User Circuit 1
![Page 11: Reconfigurable Computing: HPC Network Aspects Mitch Sukalski (8961) David Thompson (8963) Craig Ulmer (8963) Pete Dean R&D Seminar December](https://reader035.vdocuments.net/reader035/viewer/2022070605/5a4d1acd7f8b9ab059970475/html5/thumbnails/11.jpg)
Lessons Learned
• Frame provides simple OS– Isolates users from board– Portability
• Dynamically manage resources– Card memory– Computational circuits
• PCI bottleneck– Distance between NI and FPGA– PCI difficult to work with
Page ASRAM 1
Page BSRAM 2
HostCPU
FPGA
Circuit X
Circuit Y
Circuit ECircuit FCircuit G
FunctionFault
Message:Use Circuit F
on $C0000000
PageFault
Page C
Page C
NIC
![Page 12: Reconfigurable Computing: HPC Network Aspects Mitch Sukalski (8961) David Thompson (8963) Craig Ulmer (8963) Pete Dean R&D Seminar December](https://reader035.vdocuments.net/reader035/viewer/2022070605/5a4d1acd7f8b9ab059970475/html5/thumbnails/12.jpg)
Network Features of Recent FPGAs
Where we’re going…
![Page 13: Reconfigurable Computing: HPC Network Aspects Mitch Sukalski (8961) David Thompson (8963) Craig Ulmer (8963) Pete Dean R&D Seminar December](https://reader035.vdocuments.net/reader035/viewer/2022070605/5a4d1acd7f8b9ab059970475/html5/thumbnails/13.jpg)
FPGA Network Improvements
• Recent FPGAs have special, built-in cores– High-speed transceivers, dedicated processors
• Idea: Build our NI inside the FPGA– FPGA becomes a networked, compute resource– Removes the PCI bottleneck
FPGA
NI TxRx
NI TxRx
User-definedComputational
Circuits
CPU
NIC
System Area NetworkCPU
NIC
CPU
NIC
![Page 14: Reconfigurable Computing: HPC Network Aspects Mitch Sukalski (8961) David Thompson (8963) Craig Ulmer (8963) Pete Dean R&D Seminar December](https://reader035.vdocuments.net/reader035/viewer/2022070605/5a4d1acd7f8b9ab059970475/html5/thumbnails/14.jpg)
Xilinx Virtex-II/Pro FPGA
• Up to 4 PowerPC405 cores– Embedded version of PPC– 300-400MHz
• Multiple gigabit transceivers– Run at 600Mbps to 3.125Gbps– Up to twenty-four transceivers
• Additional cores– Distributed internal memory– Arrays of 18b multipliers– Digital clock multipliers, PLLs
Xilinx V2P20
![Page 15: Reconfigurable Computing: HPC Network Aspects Mitch Sukalski (8961) David Thompson (8963) Craig Ulmer (8963) Pete Dean R&D Seminar December](https://reader035.vdocuments.net/reader035/viewer/2022070605/5a4d1acd7f8b9ab059970475/html5/thumbnails/15.jpg)
Multi-Gigabit Transceivers: Rocket I/O
• Flexible, high-speed transceivers– Can be configured to connect with different physical layers– InfiniBand, GigE, FC, 10GigE, Aurora– Note: low-level interface (commas, disparity, clock mismatches)
FPGAFabric
Serializer
Deserializer
Tx FIFO8B/10BEncoderCRC
8B/10BDecoder
Rx ElasticBuffer
ClockRecoverCRC check
PIN+
- PIN
PIN+
- PIN
FPGAFabric
Rocket I/OPIN
PIN
Rocket I/OPIN
PIN
Rocket I/OPIN
PIN
![Page 16: Reconfigurable Computing: HPC Network Aspects Mitch Sukalski (8961) David Thompson (8963) Craig Ulmer (8963) Pete Dean R&D Seminar December](https://reader035.vdocuments.net/reader035/viewer/2022070605/5a4d1acd7f8b9ab059970475/html5/thumbnails/16.jpg)
Why MGTs are Important
• Direct connection to networks– Same chip, different network – Remove PCI from equation
• Fast connections between FPGAs– Reduces analog design issues– Chain FPGAs together– Reduce pin count
• Update: Virtex II/ProX– Now 2.488 Gbps – 10.3125 Gbps– Chips have either 8 or 20 transceivers
3.125 Gbps over 44” FR4 *
* From Xilinx, http://www.xilinx.com/products/virtex2pro/mgtcharacter.htm
![Page 17: Reconfigurable Computing: HPC Network Aspects Mitch Sukalski (8961) David Thompson (8963) Craig Ulmer (8963) Pete Dean R&D Seminar December](https://reader035.vdocuments.net/reader035/viewer/2022070605/5a4d1acd7f8b9ab059970475/html5/thumbnails/17.jpg)
Hard PowerPC Core
• PowerPC 405– 16KB Instruction / 16KB Data caches– Real and Virtual memory modes– GCC is available
• Multiple memory ports for core– On-chip memory (OCM)– Processor Local Bus (PLB)
• User-defined memory map– Connect memory blocks or cores– External memory cores available
ProcessorLocal
Bus (PLB)
PowerPC
I-Cache D-Cache
On-ChipMemory
(OCM) Interface
![Page 18: Reconfigurable Computing: HPC Network Aspects Mitch Sukalski (8961) David Thompson (8963) Craig Ulmer (8963) Pete Dean R&D Seminar December](https://reader035.vdocuments.net/reader035/viewer/2022070605/5a4d1acd7f8b9ab059970475/html5/thumbnails/18.jpg)
System on a Chip (SoC)
• Commercial SoC– Designing with cores– Customize system
• New tools– Rapidly connect cores– Library of cores & buses– Saves on wiring legwork
Xilinx Platform Studio
![Page 19: Reconfigurable Computing: HPC Network Aspects Mitch Sukalski (8961) David Thompson (8963) Craig Ulmer (8963) Pete Dean R&D Seminar December](https://reader035.vdocuments.net/reader035/viewer/2022070605/5a4d1acd7f8b9ab059970475/html5/thumbnails/19.jpg)
Current Status
• Exploring V2P– New architecture, new tools
• Two reference boards– ML300 (V2P7-6)– Avnet (V2P20-6)
• Transceiver work– Raw transmission over fiber– Working towards IB
http://cdulmer.ran.sandia.gov