scec capability simulations on teragrid
DESCRIPTION
SCEC Capability Simulations on TeraGrid. Yifeng Cui San Diego Supercomputer Center. SCEC Computational Pathways. SCEC Capability Simulations on Kraken and Ranger. - PowerPoint PPT PresentationTRANSCRIPT
1SAN DIEGO SUPERCOMPUTER CENTER, UCSD
SCEC Capability Simulations on TeraGrid
Yifeng CuiSan Diego Supercomputer Center
2SAN DIEGO SUPERCOMPUTER CENTER, UCSD
SCEC Computational Pathways
3SAN DIEGO SUPERCOMPUTER CENTER, UCSD
SCEC Capability Simulations on Kraken and Ranger
• ShakeOut-D: 600 x 300 x 80 km domain, 100m resolution, 14.4 billion grids, upper frequency limit to 1-Hz, 3 minutes, 50k time steps, min surface velocity 500m/s, dynamic source (SGSN), velocity properties SCEC CVM4.0, 1 terabyte inputs, 5 terabytes output
• ShakeOut-K: 600 x 300 x 80 km domain, 100m resolution, 14.4 billion grids, upper frequency limit to 1-Hz, 3 minutes, 50k time steps, min surface velocity 500m/s, kinematic, velocity properties SCEC CVM4.0, 1 terabyte inputs, 5 terabytes output
• Chino Hills: 180x125x60km, 50m resolution, 10.8 billion grids, 80k time steps, upper frequency limit to 2-hz, using both SCEC CVM4 and CVM-H velocity models
• Latest simulation completed within 1.8 hours for ShakeOut-D run on 64k Kraken XT5 cores. ShakeOut-D 2-hz benchmark achieved sustained 49 Teraflop/s.
Source: Yifeng Cui, UCSD
4SAN DIEGO SUPERCOMPUTER CENTER, UCSD
Validation of Chino Hills Simulations
• Goodness-of-fit at 0.1-0.5 Hz for synthetics relative to data from M5.4 Chino hills earthquake.
• Seismogram comparisons of recorded data (black traces), CVM-S synthetics (read traces) and CVM-H synthetics (blue traces)
5SAN DIEGO SUPERCOMPUTER CENTER, UCSD
File system: Original Source
File
File system: Original Media
File
Source Partitioning
Media Partitioning
File System:Partitioned
Source Files
File System:Partitioned Media Files
Archival System:
Source and Media Files
Configuration:IN3D
INPUT DATA PREPARATION
Archival System:
Output Files
DATA ARCHIVAL
Grid
FT
P
SR
B
CopyG
ridFT
PS
RB
Cop
y
SIMULATION AND VALIDATION
Configuration:IN3D
File System:Partitioned Source and Media Files
GridFTP
SRBLink
File System:Simulation
Output Files
ShakeOut Simulation
GridFTP
SRBCopy
SimulationValidation
YES
Simulation preparation
Source Ready?
Media Ready?
NO
YES
NO
SimulationVisualization
GridFTP
SRBCopy
SCEC Capability Simulations Workflow
• Inputs are in TB size with spatial and temporal locality
• Input partitions are transferred between TG sites,
• Simulation outputs are backed up on TACC Ranch and NICS HPSS.
• Visualization done on Ranger
6SAN DIEGO SUPERCOMPUTER CENTER, UCSD
Adapting SCEC Applications to Different TeraGrid Architectures
settingssource fault input
media input
0-4 source mode0-3 media mode
Temporal Locality
solver
0-max checkpoints0-1 MD5 mode 0-1 output mode0-1 accumulation0-1 performance
mediapartition
Read in Read in
settings
yes
if 2
if 1if 0-1
if 1
if 0 or 2
if 1
if 0 or 2
save partition
save partition
if 2
ckpts
sfc orsfc+ vlm
MD5
if I/O mode 1
SAN switch Instrastructure
SAM-QFS HPSS
if >0
if 1
if 1
restart
Spatial Locality
initial stress input
if >2no
performance measurement
Source: Cui et al. Toward Petascale Earthquake Simulations, Acta Geotechnica, June 2008
Serial or parallel source partitioning and split options
Serial or parallel mesh partitioning and options
7SAN DIEGO SUPERCOMPUTER CENTER, UCSD
Mesh Partitioning
Mesh inputs
Mesh 0 Mesh 1 Mesh 2 … Mesh N
Serial (part-serial)
Serial (part-paralllel)
MPI-IO scattered read
MPI-IO Contiguous
read
8SAN DIEGO SUPERCOMPUTER CENTER, UCSD
Mesh Serial Read
9SAN DIEGO SUPERCOMPUTER CENTER, UCSD
Mesh Partitioned in Advance
• Data locality
10SAN DIEGO SUPERCOMPUTER CENTER, UCSD
Mesh MPI-IO Scattered Read
11SAN DIEGO SUPERCOMPUTER CENTER, UCSD
Mesh MPI-IO Contiguous Read
• Data Continuity
• Read XY plane and then redistribute data
12SAN DIEGO SUPERCOMPUTER CENTER, UCSD
Comparisons of Mesh Approaches
Serial IOSeria IO (partitioned local files)
MPIIO (scattered)MPIIO (contigous) and data redistribution
Performance Low High Midium High
System dependence Low Low High Low
Scalability poor poor dependents Good
Number of files 1 npx*npy*npz 1 1
Memory requirement (elements)
nxt*nyt*nzt/core nxt*nyt*nzt/core nxt*nyt*nzt/core
nx*ny/core - sender (nz cores)nxt*nyt*nzt/core - receiver (all cores)
Communication overhead High None None High
Collective IO No No Yes Yes
Stripe number (recommended)
Small Small Large Large
Stripe size (recommended)
Small Small Big Bigger (nx*ny)
13SAN DIEGO SUPERCOMPUTER CENTER, UCSD
Source PartitioningSource inputs
Source 1
Time Step 1-600
Time step 601-1199
…
Time step 23401 - 24000
Source 2
Time Step 1-600
Time step 601-1199
…
Time step 23401-24000
Source 3
Time Step 1-600
Time step 601-1199
…
Time step 23401-24000
… Source 483161
Time Step 1-600
Time step 601-1199
…
Time step 23401-24000
Serial (part-serial)
Serial (part-paralllel)
MPI-IO scattered read
14SAN DIEGO SUPERCOMPUTER CENTER, UCSD
1000 10000 100000 10000001E+05
1E+06
1E+07
AWP-Olsen-Day Code Scaling on Kraken, Ranger and Intrepid1-Hz ShakeOut,100m resolution and 14.4 billion mesh points
(6000x3000x800)
On NICS Kraken-XT4 with Synchronous Communication
On TGW BG/L Intrepid with Synchronous Communication
Number of Cores
Nr.
of
mes
h p
oin
ts u
pd
ated
/ste
p/s
ec/c
ore
15SAN DIEGO SUPERCOMPUTER CENTER, UCSD
1000 10000 100000 10000001E+05
1E+06
1E+07
AWP-Olsen-Day Code Scaling on Kraken, Ranger and Intrepid1-Hz ShakeOut,100m resolution and 14.4 billion mesh points
(6000x3000x800)On Sun Constellation TACC Ranger with Synchronous Communication
On NICS Kraken-XT5 with Synchronous Communication
On ALCF BG/P Intrepid with Synchronous Communication
On NICS Kraken-XT4 with Synchronous Communication
On TGW BG/L Intrepid with Synchronous Communication
Number of Cores
Nr.
of
mes
h p
oin
ts u
pd
ated
/ste
p/s
ec/c
ore
16SAN DIEGO SUPERCOMPUTER CENTER, UCSD
Synchronous Communication
17SAN DIEGO SUPERCOMPUTER CENTER, UCSD
Synchronous Communication
18SAN DIEGO SUPERCOMPUTER CENTER, UCSD
Asynchronous Communication
19SAN DIEGO SUPERCOMPUTER CENTER, UCSD
Asynchronous Communication
20SAN DIEGO SUPERCOMPUTER CENTER, UCSD
Asynchronous Communication
21SAN DIEGO SUPERCOMPUTER CENTER, UCSD
Asynchronous Communication
22SAN DIEGO SUPERCOMPUTER CENTER, UCSD
1000 10000 100000 10000001E+05
1E+06
1E+07
AWP-Olsen-Day Code Scaling on Kraken, Ranger and Intrepid1-Hz ShakeOut,100m resolution and 14.4 billion mesh points
(6000x3000x800)On Sun Constellation TACC Ranger with Synchronous Communication
On NICS Kraken-XT5 with Synchronous Communication
On ALCF BG/P Intrepid with Synchronous Communication
On NICS Kraken-XT4 with Synchronous Communication
On TGW BG/L Intrepid with Synchronous Communication
Number of Cores
Nr.
of
mes
h p
oin
ts u
pd
ated
/ste
p/s
ec/c
ore
23SAN DIEGO SUPERCOMPUTER CENTER, UCSD
1000 10000 100000 10000001E+05
1E+06
1E+07
AWP-Olsen-Day Code Scaling on Kraken, Ranger and Intrepid1-Hz ShakeOut,100m resolution and 14.4 billion mesh points
(6000x3000x800)On NICS Kraken-XT5 with Asynchronous CommunicationOn ALCF BG/P Intrepid with Asynchronous CommunicationOn Sun Constellation TACC Ranger with Asynchronous CommunicationOn Sun Constellation TACC Ranger with Synchronous CommunicationOn NICS Kraken-XT5 with Synchronous CommunicationOn ALCF BG/P Intrepid with Synchronous CommunicationOn NICS Kraken-XT4 with Synchronous CommunicationOn TGW BG/L Intrepid with Synchronous Communication
Number of Cores
Nr.
of
mes
h p
oin
ts u
pd
ated
/ste
p/s
ec/c
ore
24SAN DIEGO SUPERCOMPUTER CENTER, UCSD
SCEC Capability Simulations Performance on TeraGrid
* Benchmark
25SAN DIEGO SUPERCOMPUTER CENTER, UCSD
Other efforts in progress supporting SCEC larger-scale simulations
• Single CPU optimization, for example division is very
expensive, by reducing division work, we have observed
performance improvements by 25-45% on up to 8k cores
• Workflow: end-to-end approach to automate procedures
of capability simulations
• Restructuring code to prepare as SCEC community code,
emphasize modularity, re-usability and ease of integration
• Developing hybrid code with a two level MPI/OpenMP
26SAN DIEGO SUPERCOMPUTER CENTER, UCSD
Acknowledgements
• This work has received technical supports from varied TeraGrid sites, in particular:– Tommy Minyard and Karl Schultz of TACC– Kwai Lam Wong and Bruce Loftis of NICS– Amit Chourasia of SDSC– SCEC Collaborations