severs aist cluster (50 cpu) titech cluster (200 cpu) kisti cluster (25 cpu) climate simulation on...

10
Severs AIST Cluster (50 CPU) Titech Cluster (200 CPU) KISTI Cluster (25 CPU) Climate Simulation on ApGrid/TeraGrid at SC2 003 Client (AIST) Ninf-G Severs NCSA Cluster (225 CPU)

Upload: hayden-power

Post on 27-Mar-2015

221 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Severs AIST Cluster (50 CPU) Titech Cluster (200 CPU) KISTI Cluster (25 CPU) Climate Simulation on ApGrid/TeraGrid at SC2003 Client (AIST) Ninf-G Severs

SeversAIST Cluster (50 CPU)

Titech Cluster (200 CPU)KISTI Cluster (25 CPU)

Climate Simulation on ApGrid/TeraGrid at SC2003

Client(AIST)

Ninf-G

SeversNCSA Cluster (225 CPU)

Page 2: Severs AIST Cluster (50 CPU) Titech Cluster (200 CPU) KISTI Cluster (25 CPU) Climate Simulation on ApGrid/TeraGrid at SC2003 Client (AIST) Ninf-G Severs

National Institute of Advanced Industrial Science and Technology

Example- Hybrid QM/MD Simulation -

Page 3: Severs AIST Cluster (50 CPU) Titech Cluster (200 CPU) KISTI Cluster (25 CPU) Climate Simulation on ApGrid/TeraGrid at SC2003 Client (AIST) Ninf-G Severs

QM/MD simulation over the Pacific at SC2004

QM Server

QM Server

MD Client

TCS (512 CPU) @ PSCTotal number of CPUs: 1792

Ninf-G

Close-up view

corrosion of Sillicon under stress

P32 (512 CPU)

P32 (512 CPU)

F32 (256 CPU)

Page 4: Severs AIST Cluster (50 CPU) Titech Cluster (200 CPU) KISTI Cluster (25 CPU) Climate Simulation on ApGrid/TeraGrid at SC2003 Client (AIST) Ninf-G Severs

1 2 3 4 5 6 7 8 9 10

1 2 3 4 5 6 7 8 9 10

•Total number of CPUs: 1793•Total Simulation Time: 10 hour 20 min•# steps: 10 (= 7fs)•Average time / step: 1 hour•Size of generated files / step: 4.5GB

Page 5: Severs AIST Cluster (50 CPU) Titech Cluster (200 CPU) KISTI Cluster (25 CPU) Climate Simulation on ApGrid/TeraGrid at SC2003 Client (AIST) Ninf-G Severs

(some of) Lessons Learned

Practically impossible to occupy a Practically impossible to occupy a large-scale single system for few large-scale single system for few weeks.weeks.

How can we long-run the simulation?

Faults (e.g. HDD crush, network down) Faults (e.g. HDD crush, network down) cannot be avoided.cannot be avoided.

We don’t prefer manual restart. The simulation should be capable of automatic recovery from faults.How can the simulation recover from faults?

Page 6: Severs AIST Cluster (50 CPU) Titech Cluster (200 CPU) KISTI Cluster (25 CPU) Climate Simulation on ApGrid/TeraGrid at SC2003 Client (AIST) Ninf-G Severs

Objectives

Develop flexible, robust, and efficient Grid-enabled sDevelop flexible, robust, and efficient Grid-enabled simulation.imulation.

Flexible -- allow dynamic resource allocation/migration,robust -- detect errors and recover from faults automatically for long runs, andefficient -- manage thousands of CPUs.

Verify our strategy through large-scale experiments.Verify our strategy through large-scale experiments.Implemented Grid-enabled SIMOX (Separation by Implanted Oxygen) simulationRun the simulation on Japan-US Grid testbed for few weeks.

Page 7: Severs AIST Cluster (50 CPU) Titech Cluster (200 CPU) KISTI Cluster (25 CPU) Climate Simulation on ApGrid/TeraGrid at SC2003 Client (AIST) Ninf-G Severs

Hybrid QM/CL Simulation (1)

Enabling large scale simulation with Enabling large scale simulation with quantum accuracyquantum accuracy

Combining classical MD Simulation with quantum simulation

CL simulationSimulating the behavior of atoms in the entire regionBased on the classical MD using an empirical inter-atomic potential

QM simulationModifying energy calculated by MD simulation only in the interesting regionsBased on the density functional theory (DFT)

MD Simulation

QM simulationbased on DFT

Page 8: Severs AIST Cluster (50 CPU) Titech Cluster (200 CPU) KISTI Cluster (25 CPU) Climate Simulation on ApGrid/TeraGrid at SC2003 Client (AIST) Ninf-G Severs

simulation algorithmsimulation algorithm

Each QM computation isEach QM computation isindependent with each othercompute intensiveusually implemented as a MPI program

Hybrid QM/CL Simulation (2)

MD part QM part

initial set-up

Calculate MD forces of QM+MD regions

Update atomic positions and velocities

Calculate QM force of the QM region

Data of QM atoms

QM forces

Calculate QM force of the QM regionCalculate QM force of the QM region

Calculate MD forces of QM region

Page 9: Severs AIST Cluster (50 CPU) Titech Cluster (200 CPU) KISTI Cluster (25 CPU) Climate Simulation on ApGrid/TeraGrid at SC2003 Client (AIST) Ninf-G Severs

National Institute of Advanced Industrial Science and Technology

Implementation of Grid-enabled Simulation

- multi-scale QM/MD simulation using GridRPC and MPI -

Page 10: Severs AIST Cluster (50 CPU) Titech Cluster (200 CPU) KISTI Cluster (25 CPU) Climate Simulation on ApGrid/TeraGrid at SC2003 Client (AIST) Ninf-G Severs

Approach to “gridify” applications

Grid RPC enhances the flexibility and robustness by;dynamic allocation of server programs, anddetection of network/cluster trouble.MPI enhances the efficiency by; highly parallel computing on a cluster for both client and server programs.

The new programming approach, combining GridRPC with MPI, takes advantages of both programming models complementarily to run large-scale applications on the Grid for a long time. Client

Server

GridRPC

MPI

MD

MD

MD

MD

MPI

QM

QM

QM

QM

MPI

QM

QM

QM

QM