reconfigurable hpc research at ornl using field-programmable gate arrays (fpgas)
Post on 30-Dec-2015
36 Views
Preview:
DESCRIPTION
TRANSCRIPT
Presented by
Reconfigurable HPC Research at ORNL using Field-Programmable Gate Arrays (FPGAs)
Olaf O. StoraasliFuture Technologies Group
Computer Science and Mathematics
2 Storaasli_ReconfigHPC_0611
Contents Background: Why FPGAs?
ORNL experience: FPGA systems/compilers and applications
ORNL partners: Research Lab, , SRC,
,
THE SUPERCOMPUTER COMPANY
Explore FPGAs for future ORNL HPC
Virtex4 FPGA blades to “accelerate mission-critical applications > 100x”
Steve Scott, CTO HPCWire 24/3/0606
“After exhaustive analysis, Cray concluded that, although multi-core commodity processors will deliver some improvement, exploiting parallelism through a variety of processor technologies using scalar, vector, multithreading and hardware accelerators (e.g., FPGAs or ClearSpeed co-processors) creates the greatest opportunity for application acceleration.”
ORNL potential benefit: Exceed petaflop goal (reduce power)
High-performance computing vendors adopting FPGAs
3 Storaasli_ReconfigHPC_0611
FPGA Logic slice
What’s an FPGA? “User-tailored chip”
Xilinx Virtex4 FPGA: 25K slices (miniCPUs)
Logic array: user-tailored to application
On-chip RAM, multipliers, and PowerPC CPUs
Mega-gigabit transceivers and digital signaling processing blocks
Supports 100–1000 operations/clock cycle
4 Storaasli_ReconfigHPC_0611
0
100
200
300
Computation(GOPS)
Memory Bandwidth(GB/sec)
IO Bandwidth (Gbps)
Pentium
Virtex-4FPGAVirtex4
Pentium
Why FPGAs?
Performance—optimal silicon use (maximize parallel ops/cycle)
Rapid growth—gates, speed, I/O
Low power—1/10th CPUs
Flexible—tailor to application
0
100
200
300
400
500
600
700
2002 2004 2006 2008
Thousands
Logic Cell Capacity1000
800
600
400
200
0
5 Storaasli_ReconfigHPC_0611
Cray XD1Bee2 (via Xilinx)
ORNL FPGA hardware/compilers SRC-6 (Carte), Digilent (Viva,
VHDL), Nallatech (Viva)
Cray XD1 (MitrionC, DSPlogic/Matlab): 6 FPGAs + 144 Opterons
SGI RASC-Altix/Virtex4s (MitrionC)
Bee2: 5 FPGAs and CHiMPS (via Xilinx), DRC (future)
6 Storaasli_ReconfigHPC_0611
Find parallelism: 80% FFTs
More GF/$ GF/Watt
Goal
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
Profile
Model faster
Porting HPC code spectral transform shallow water model to FPGAs
FTRNDE
FTRNPE
FTTdd
UV FFT
SHTRNS FFT
COMP1
STEP
FTRNEX FTRNVX
8 calls in parallel
3 functions in parallel
2 calls in parallel
HLL compilerCHiMPS, Mitrion(FPGA Tools Inside) FPGA
speedup
HLL developerprofiles
7 Storaasli_ReconfigHPC_0611
Viva: Graphical Icons—3-dimensional MitrionC: Text/flow—1-dimensional
Exploring programming options
Gauss matrix solverCompiler, simulator,
and debugger
+ Carte/SRC, CHiMPS-VHDL/Xilinx ,
8 Storaasli_ReconfigHPC_0611
Simulate molecular dynamics: Compute Ewald particle mesh
(FORTRAN)
Speedup over Intel 2.8 GHz Xeon host
23,558 atoms 61,641 atoms
1st Version
+ memory tailored
1st Version
+ memory tailored
Compute only 3.30 3.30 3.97 3.97
Setup + compute 3.29 3.29 3.96 3.96
Compute + data transfer 0.64 3.20 0.69 3.83
Overall 0.60 3.19 0.60 3.82
Speedup increase with problem size
Amber speedup on SRC-6
9 Storaasli_ReconfigHPC_0611
Speedup projections
0
200
400
600
800
1000
1200
psec/day
2.40E+04 6.20E+04 1.50E+05 2.50E+05 3.50E+05 4.50E+05
Number of atoms
SRC-6E200 MHz200 MHz + 5.6 GB/s500 MHz500 MHz + 5.6 GB/s
10 Storaasli_ReconfigHPC_0611
Summary
ORNL FPGA research:
- Increasingly relevent to HPC
- FPGA Systems: Cray, SRC, Nallatech, Digilent, SGI, Bee2
- Compilers: Mitrion-C, Carte, Viva, CHiMPS, DSPlogic
- Application speedup: Amber, STSWM, BLAST, SW,…
- Partners: Xilinx HPC team, UT, Mitrion, Cray
Next: Evaluate application performance, test DRC
Acknowledgement:
This research is supported by the Office of Science of the U.S. Department of Energy under Contract No. DE-AC05-00OR22725
11 Storaasli_ReconfigHPC_0611
Contact
Olaf StoraasliFuture Technologies GroupOlaf@ornl.gov
11 Storaasli_ReconfigHPC_0611
top related