the race with mprace on grape, fpga,...
TRANSCRIPT
The race with MPRACE On GRAPE, FPGA, Petaflop/s
Application DrivenReconfigurable Computing forAstrophysics and other Fields
Rainer Spurzem, Astronomisches Rechen-InstitutZentrum für Astronomie Univ.Heidelberg, Germany
[email protected]://www.ari.uni-heidelberg.de/mitarbeiter/spurzem/
Dez. 06Dez. 06 COEHT 2007COEHT 2007
Collaboration:Sverre Aarseth (IoA Cambridge UK), David Merritt (RIT, USA), Naohito Nakasato, Tsuyoshi Hamada
(RIKEN Japan), Simon Portegies Zwart, Alessia Gualandris
(U Amsterdam),
(ARI)
Foundation Document of ARIMay 10, 1700
Calendar Patent of Duke of Brandenburg
Dez. 06Dez. 06 COEHT 2007COEHT 2007
Astrophysical Computer Simulations using Programmable HardwareR. Spurzem, R. Männer, A. Burkert withG. Lienhart, G. Marcus, A. KugelP. Berczik, I. Berentzen, M. Wetzstein, T. Naab…Interdisciplinary: Computer Science and Astrophysics
Univ. Heidelberg (ARI-ZAH), Munich (USM)Univ. Mannheim (Techn. Informatik)
TheThe GRACE Project = GRAPE + MPRACEGRACE Project = GRAPE + MPRACE
MWK Baden-Württembg.
Dez. 06Dez. 06 COEHT 2007COEHT 2007
Globular Star Cluster Centauri(Central Region with Hubble Space Telescope
Ground BasedView
Computer Simulation of Dense Star ClustersExample1: Galactic Globular Clusters
Gravitative Star-Star InteractionComplexity N2 (N: Number of Stars)
Astrophysics
Dez. 06Dez. 06 COEHT 2007COEHT 2007
Example 2: Motion of Supermassive Black Holes (SMBH) in central galactic star clusters (here not shown), gravitational wave emission, relativistic dynamics
Left: Orbits of Triple-SMBHin central starcluster (not shownhere), simulationNBODY6++
Right: SMBH-Coalescence,Gravitatonal WaveDetection withSpace AntennaLISA (2015)Source: ESA
Astrophysics
Dez. 06Dez. 06 COEHT 2007COEHT 2007
Space detectorsLISA
AstrophysicalSources
Terrestrial DetectorsGeo600 Hannover
VIRGO, LIGO, TAMA, AIGO
LISA: Bin. Black Holes in the Universe LISA: Bin. Black Holes in the Universe Terrestrial Detectors: (VIRGO, GEO600, LIGO): Galactic Compact Terrestrial Detectors: (VIRGO, GEO600, LIGO): Galactic Compact
Objects (black holes, neutron starsObjects (black holes, neutron stars……) higher frequencies ) higher frequencies
Dez. 06Dez. 06 COEHT 2007COEHT 2007
GRAPE6a, -BL - PCI ASIC Board for PC-ClustersPROGRAPE-4, FPGA based board from RIKEN (Hamada)GRAPE7 – new FPGA based board from Tokyo Univ. (Fukushige)GRAPE-DR – new board from Makino et al. NAOJMPRACE1,2 – FPGA boards from Univ. Mannheim/GRACE (Kugel et al.)
GRAPE6a, -BL - PCI ASIC Board for PC-ClustersPROGRAPE-4, FPGA based board from RIKEN (Hamada)GRAPE7 – new FPGA based board from Tokyo Univ. (Fukushige)GRAPE-DR – new board from Makino et al. NAOJMPRACE1,2 – FPGA boards from Univ. Mannheim/GRACE (Kugel et al.)
~128 Gflops for a price ~5K USD; Memory for up to 128K particles~128 Gflops for a price ~5K USD; Memory for up to 128K particles
GRAPE6a PCI boardGRAPE6a PCI board
Hardware Hardware -- GRAPEGRAPE
Dez. 06Dez. 06 COEHT 2007COEHT 2007
ijij
jij r
rmG
f rr
)( 2/322 ε+
⋅−=
~N~N ~N^2~N^2
∑≠=
=N
ijjiji fa
;1
rr
Basic idea of any GRAPE N-body code:Basic idea of any GRAPE N-body code:
Dez. 06Dez. 06 COEHT 2007COEHT 2007
jjjj tvrm ;;; rr
iiii tvrm ;;; rr
GRAPE = GRAvity PipE – more detail…GRAPE = GRAvity PipE – more detail…
iii aa &rr ;;φ
Dez. 06Dez. 06 COEHT 2007COEHT 2007
ARI-ZAH + RIT 32 node GRAPE6a clustersARI-ZAH + RIT 32 node GRAPE6a clusters
Performance Analysis (3.2 Tflop/s): Harfst et al. 2006, New Astron., in press, astro-ph/0608125
Dez. 06Dez. 06 COEHT 2007COEHT 2007
10-1
100
101
102
103
104
103 104 105 106
Speed (GFlops)
Particle number - N
GRAPE6a
GRAPE6
32xGRAPE6a
010204081632
ARI-ZAH GRAPECluster:
~3.2 Tlop/ssustained
ARI-ZAH GRAPECluster:
~3.2 Tlop/ssustained
Up to 4 million stars!World record in this class!(Direct N-Body)
Harfst, Gualandris,Merritt, Spurzem,Portegies Zwart, Berczik2006, New Astron. in pressastro-ph/0608125
Hardware Hardware -- GRAPEGRAPE
Dez. 06Dez. 06 COEHT 2007COEHT 2007
O(N p) + O(N2 /p) [ + O(N Nn/p) ]
1 2 3 Communication Long Range Short Range
Regular Force Irregular Force
Original code by S.J.Aarseth, S. Mikkola (ca. 20.000 lines):•Hierarchical Block Time Steps, 4th order Pred./Corr. Scheme•Ahmad-Cohen Neighbour Scheme•Kustaanheimo-Stiefel and Chain-Regular.for close encounters (Quaternions!)
•4th order Hermite scheme (pred/corr)• Parallelization (Spurzem 1999)•Implementation on GRAPE Cluster (Harfst et al. 2006)
Software, NBODY6++
Dez. 06Dez. 06 COEHT 2007COEHT 2007
HardwareHardware
Dez. 06Dez. 06 COEHT 2007COEHT 2007
Dez. 06Dez. 06 COEHT 2007COEHT 2007
Pipeline Generation on FPGA I (see talk by Gerhard Lienhart)
Dez. 06Dez. 06 COEHT 2007COEHT 2007
Pipeline Generation on FPGA II (see talk by Gerhard Lienhart)
Dez. 06Dez. 06 COEHT 2007COEHT 2007
• Use FPGA-platform for accelerating neighbour algorithm• GRAPE moves the bottleneck to short range (neighbour) forces
MPRACE
GRAPE
Hardware FPGA
Dez. 06Dez. 06 COEHT 2007COEHT 2007
TheThe GRACE GRACE architecturearchitecture(GRAPE+MPRACE)(GRAPE+MPRACE)
4 Tflops, 128 CPUs, 128 GB Memory(64 P4 Xeon, 32 GRAPE, 32 Xilinx FPGA-MPRACE)
Univ. Heidelberg (ARI) Univ. Mannheim (LIV)Univ. Munich (USM) RIKEN Institute Tokyo
Hardware Hardware -- GRACEGRACE
_____ Infiniband Dual PCIe 20Gb/s ____32 Hosts
Dez. 06Dez. 06 COEHT 2007COEHT 2007
Xeon 3.6GHz
FPGA1 Pipeline
GRAPE 12 Pipelines
PreliminaryOngoing Work
Dez. 06Dez. 06 COEHT 2007COEHT 2007
Prototype Testing
ProductionSummer 2007
Dez. 06Dez. 06 COEHT 2007COEHT 2007
viscii
i
i aPdtvd rr
+∇−=ρ1
( )
( )
ijijij
ijijijij
jiijji
ijji
ij
jiijji
ijji
ij
ijij
ijijij
ijijij
ij
jijijiij
j
j
i
ij
i
fhrrvh
vvvhh
hcc
c
rrrff
f
rvfor
rvforc
hrWppm
dtvd
222
2
22
,2
,2
,2
,2
00
0
,
ημ
ρρρ
ρμβμα
ρρ
+=
−=+
=+
=
−=+
=+
=
⎪⎩
⎪⎨
⎧
>
≤+−
=∏
∇⎟⎟⎠
⎞⎜⎜⎝
⎛∏++−= ∑
r
rr
rrr
rrr
rr
rr
rr
Hydrodynamic equationof motion, gravity
SPH formulation
( ) )(,,1
iiijij
N
jji PphrWm ρρ ==∑
=
r
SmoothedSmoothed ParticleParticleHydrodynamicsHydrodynamics (SPH)(SPH)
Other Applications
Dez. 06Dez. 06 COEHT 2007COEHT 2007
MolecularMolecular DynamicsDynamicsProtein Protein InteractionsInteractions, , withwith NanotubesNanotubes, , LigandsLigands, Water , Water CellularCellular SignalingSignaling
Long Range Force: Fast TREE Long Range Force: Fast TREE oror directdirect GRAPEGRAPEIntermediateIntermediate Range: FPGA Range: FPGA ProspectiveProspective Partners: Partners: * G. * G. SutmannSutmann, A. Schiller, , A. Schiller, NIC, FZ JNIC, FZ Jüülich (lich (usingusing ProPro--GRAPE FPGA Board, RIKEN Inst. Japan)GRAPE FPGA Board, RIKEN Inst. Japan)
* * EML Research Institute Heidelberg, S. Richter, R. WadeEML Research Institute Heidelberg, S. Richter, R. Wade
Other Applications
Dez. 06Dez. 06 COEHT 2007COEHT 2007
How to build a super-GRACE…… 50 Tflop/s machine for < 5 % of gen. purpose cost ?
•200 standard nodes, AMD Opteron or Pentium Xeon•200 super-GRAPEs (250 Gflop/s) MPRACE-2, GRAPE-DR, PROGRAPE•Super-Network (e.g. AMD Hypertransport, Xtoll-Connection Custom Network
(AMD excellence centre with Univ. of Mannheim, U. Brüning)
Such computercompetes withgeneral purposesupercomputers on the Petaflop/s scale.
Used: Performance Modelof Harfst et al. 06
Dez. 06Dez. 06 COEHT 2007COEHT 2007
Other Applications
Co-Ordination: Prof. J. Wambsganss(Director ZAH)
….Co-I‘s Prof. R. Klessen
Prof. R. Spurzem …
InformationScienceProf. BrüningProf. Männer
FurtherPartners?
Astrophysical Excellence Cluster Univ. ofHeidelberg – admitted for 2nd round –projected cooperation with informatics: