* work supported by .us department of energy under contract de-ac02-76sf00515

1
Meeting Challenges in Extreme-Scale Electromagnetic Modeling of Next Generation Accelerators using ACE3P* Cho Ng, Arno Candel, Lixin Ge, Kwok Ko, Lie-Quan Lee, Zenghai Li, Vineet Rawat, Greg Schussman, Liling Xiao, SLAC, Esmond Ng, Ichitaro Yamazaki, LBNL, Quikai Lu, Mark Shephard, RPI ABSTRACT: The past decade has seen tremendous advances in electromagnetic modeling for accelerator applications with the use of high performance computing on state-of-the-art supercomputers. Under the support of the DOE SciDAC computing initiative, a comprehensive set of parallel electromagnetic codes based on the finite-element method, ACE3P, has been developed aimed at tackling the most computationally challenging problems in accelerator R&D. Complemented by collaborative efforts in computational science, these powerful tools have enabled large-scale simulations of complex systems to be modeled with unprecedented details and accuracy. This paper will summarize the efforts in scalable eigen- and linear solvers, in parallel adaptive meshing algorithms, as well as in visualization of large datasets to meet the challenges in electromagnetic modeling at the extreme scale for advancing the design of next generation accelerators. * Work supported by .US Department of Energy under contract DE-AC02-76SF00515. Scalable Solvers Challenges in Electromagnetic Modeling Refinements & Parallel Meshing Moving window for wakefield computation of short bunches PEP-X undulator taper Pseudo Green’s function using = 0.5mm = 3mm Supported by DOE’s HPC initiatives Grand Challenge, SciDAC-1 (2001-2007), and SciDAC-2 (2007-2012), SLAC has developed ACE3P, a comprehensive set of parallel electromagnetic codes based on the high-order finite-element method Advances in computational science are essential to tackle computationally challenging problems of large, complex systems in accelerator applications Collaborations with SciDAC CETs and Institutes on Linear solvers and eigensolvers (TOPS) Parallel adaptive mesh refinement and parallel meshing (ITAPS) Partitioning and load balancing (CSCAPES & ITAPS) Visualization (IUSV) Goal is virtual prototyping of accelerator structures Visualization using ParaView Parallel Meshing Field distribution in moving window Cornell ERL vacuum chamber transition •5 hours w/ 18000 cores on jaguar •16 TByte data Field distribution in complex structure CLIC two-beam accelerator structure •45 hours w/ 4096 cores on jaguar •15 TByte data Direct Development of hybrid linear solver Goal: balance between computational and memory requirements - Exploits techniques from sparse direct methods in computing incomplete factorizations, which are then used as preconditioners for an iterative method Numerically stable hybrid solver based on domain decomposition: apply direct methods to interior domains and preconditioned iterative method to interfaces Any number of CPUs can be assigned to each interior domain larger domains lead to larger aggregated memory and smaller interfaces Smaller interfaces lead to faster convergence Schematic of matrix Number of cores Speedup Strong scalability A dipole mode in ILC cryomodule consisting of 8 superconducting RF cavities high-order p or finer mesh p=0 p=0 f b d ILC SRF cavity coupler (Talk by L.-Q. Lee) 160M elements 10 minutes using 64 processors A multi-file NetCDF format is designed to remove synchronized parallel writing bottleneck Preliminary testing has shown the success and efficacy of using the format 8 hours w/ 12000 cores on jaguar 15 hours w/ 6000 cores on jaguar franklin (Presentat ion at Vis Nite)

Upload: bebe

Post on 08-Jan-2016

44 views

Category:

Documents


4 download

DESCRIPTION

Cho Ng, Arno Candel, Lixin Ge, Kwok Ko, Lie-Quan Lee, Zenghai Li, Vineet Rawat, Greg Schussman, Liling Xiao, SLAC , Esmond Ng, Ichitaro Yamazaki, LBNL , Quikai Lu, Mark Shephard, RPI. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: * Work supported by .US Department of Energy under contract DE-AC02-76SF00515

Meeting Challenges in Extreme-Scale Electromagnetic Modeling of Next Generation Accelerators using ACE3P*

Cho Ng, Arno Candel, Lixin Ge, Kwok Ko, Lie-Quan Lee, Zenghai Li, Vineet Rawat, Greg Schussman, Liling Xiao, SLAC, Esmond Ng, Ichitaro Yamazaki, LBNL, Quikai Lu, Mark Shephard, RPI

ABSTRACT: The past decade has seen tremendous advances in electromagnetic modeling for accelerator applications with the use of high performance computing on state-of-the-art supercomputers. Under the support of the DOE SciDAC computing initiative, a comprehensive set of parallel electromagnetic codes based on the finite-element method, ACE3P, has been developed aimed at tackling the most computationally challenging problems in accelerator R&D. Complemented by collaborative efforts in computational science, these powerful tools have enabled large-scale simulations of complex systems to be modeled with unprecedented details and accuracy. This paper will summarize the efforts in scalable eigen- and linear solvers, in parallel adaptive meshing algorithms, as well as in visualization of large datasets to meet the challenges in electromagnetic modeling at the extreme scale for advancing the design of next generation accelerators.

* Work supported by .US Department of Energy under contract DE-AC02-76SF00515.

Scalable Solvers Challenges in Electromagnetic Modeling

Refinements & Parallel Meshing Moving window for wakefield computation of short bunches

PEP-X undulator taper

Pseudo Green’s function using = 0.5mm

= 3mm

Supported by DOE’s HPC initiatives Grand Challenge, SciDAC-1 (2001-2007), and SciDAC-2 (2007-2012), SLAC has developed ACE3P, a comprehensive set of parallel electromagnetic codes based on the high-order finite-element method

Advances in computational science are essential to tackle computationally challenging problems of large, complex systems in accelerator applications

Collaborations with SciDAC CETs and Institutes on Linear solvers and eigensolvers (TOPS)

Parallel adaptive mesh refinement and parallel meshing (ITAPS)

Partitioning and load balancing (CSCAPES & ITAPS)

Visualization (IUSV)

Goal is virtual prototyping of accelerator structures

Visualization using ParaView

Parallel Meshing

Field distribution in moving window

Cornell ERL vacuum chamber transition

• 5 hours w/ 18000 cores on jaguar

• 16 TByte data

Field distribution in complex structure

CLIC two-beam accelerator structure

• 45 hours w/ 4096 cores on jaguar

• 15 TByte data

Direct

Development of hybrid linear solver Goal: balance between computational and memory requirements

- Exploits techniques from sparse direct methods in computing incomplete factorizations, which are then used as preconditioners for an iterative method

Numerically stable hybrid solver based on domain decomposition: apply direct methods to interior domains and preconditioned iterative method to interfaces

Any number of CPUs can be assigned to each interior domain larger domains lead to larger aggregated memory and smaller interfaces Smaller interfaces lead to faster convergence

Schematic of matrixNumber of cores

Sp

eed

up

Strong scalability

A dipole mode in ILC cryomodule consisting of 8 superconducting RF cavities

high-order p or finer mesh

p=0p=0

fb

d

ILC SRF cavity coupler

(Talk by L.-Q. Lee)

160M elements10 minutes using 64 processors

A multi-file NetCDF format is designed to remove synchronized parallel writing bottleneck

Preliminary testing has shown the success and efficacy of using the format

8 hours w/ 12000 cores on jaguar 15 hours w/ 6000 cores on jaguar

franklin

(Presentation at Vis Nite)