a framework for international collaboration on iter using ... fusion … · physics, math, cs, ai...

33
A Framework for International Collaboration on ITER Using Large-Scale Data Transfer to Enable Near Real-Time Analysis RM Churchill 1 C.S. Chang 1 , R. Hager 1 , S. Ku 1 , (Ralph Kube 1 ), JS Park 2 , MJ Choi 2 , S. Klasky 3 , J. Choi 3 , M. Wolf 3 , J. Wong 3 , R. Gerber 4 , K. Antypas 4 , Prabhat 4 , D. Bard 4 , L. Stephey 4 , R.S. Canon 4 , R. Thomas 4 , W. Bhimji 4 , H. Park 5 , E. Dart 6 , BS Cho 7 1 PPPL & HBPS, KSTAR 2 , ORNL 3 , NERSC 4 , UNIST 5 , ESnet 6 , KISTI 7 3 rd IAEA Fusion Data workshop May 28-June 1, 2019, Vienna, Austria 1

Upload: others

Post on 26-Jul-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: A Framework for International Collaboration on ITER Using ... Fusion … · Physics, Math, CS, AI Reduced Big Data many international users on different level resources Users are

A Framework for International Collaboration on ITER Using Large-Scale Data Transfer to

Enable Near Real-Time Analysis

RM Churchill1

C.S. Chang1, R. Hager1, S. Ku1, (Ralph Kube1), JS Park2, MJ Choi2, S. Klasky3,

J. Choi3, M. Wolf3, J. Wong3, R. Gerber4, K. Antypas4, Prabhat 4, D. Bard 4,

L. Stephey 4, R.S. Canon 4, R. Thomas 4, W. Bhimji 4, H. Park5, E. Dart 6, BS Cho7

1PPPL & HBPS, KSTAR2, ORNL3, NERSC4, UNIST5, ESnet6, KISTI7

3rd IAEA Fusion Data workshop

May 28-June 1, 2019, Vienna, Austria

1

Page 2: A Framework for International Collaboration on ITER Using ... Fusion … · Physics, Math, CS, AI Reduced Big Data many international users on different level resources Users are

2

• ITER represents a unique challenge and

opportunity to bring together global

scientist and compute resources to

Page 3: A Framework for International Collaboration on ITER Using ... Fusion … · Physics, Math, CS, AI Reduced Big Data many international users on different level resources Users are

Fusion data from experiment (and simulation) possess all the big-V properties

radius

time

den

sity

5 Vs of big ITER data

VelocityVolumeV

arie

ty

Veracity

Valu

e

• ~1/2EB/year

• Subsequent

simulations can

greatly enhance

data volume

• >50GB/s for up to 1000s, each shot

• Hierarchy of analysis for near-real-time feedback

• Wide variety

data from

thousands of

sensors

• Many-scale

many-physics• All the sensor have

different quality:

e.g., validity range,

error, and noise

• All the

expensive

sensors are

there because

they produce

valuable data

AI/DL/NN in high demand

3

Page 4: A Framework for International Collaboration on ITER Using ... Fusion … · Physics, Math, CS, AI Reduced Big Data many international users on different level resources Users are

High Energy Physics has had tremendous success

federating analysis

• PanDA (and BigPanDA) used for LHC analysis, >1 million jobs per day, spread across >100 sites

• Use various data mirrors, tiered execution of jobs

• A key difference fusion has is the need for near real-time analysis to enable scientists to steer experimental plans

4

• Every shot on ITER will be simulated prior, verifying in near real-time accuracy will guide experimental day

Page 5: A Framework for International Collaboration on ITER Using ... Fusion … · Physics, Math, CS, AI Reduced Big Data many international users on different level resources Users are

Why an end-to-end federated framework?

• Faster, better analysis = better decision making = more science productivity• Data flow from diagnatics instruments will be too big and too fast: need

streaming out of instrument memory asynchronously and reduced in situ− Each instrument has its own quality, error, and uncertainty (veracity),

− Error from each instrument propagates to error in other physics estimates

• Instruments are managed by different scientists

→ human integration → not efficient → automate

• Future fusion devices may be in different physics regime

• Experts and computing capabilities are all over the world

− Steering of experiments takes too long for fast scientific progress

− Combining-in the more complete physics (from 1st principles-based

simulations) is difficult ~months to years

5

Utilize computer science, applied math and ML/AI tools.Simplify analysis setup: end-to-end abstraction for parallelization and libraries

radius

time

den

sity

Page 6: A Framework for International Collaboration on ITER Using ... Fusion … · Physics, Math, CS, AI Reduced Big Data many international users on different level resources Users are

System structure in the federated fusion research

Fusion Reactor with thousands of sensors

HPC

FEDERATED RESEARCHORCHESTRATION SYSTEM

Physics, Math, CS, AI

Reduced Big Data

many international

users on different level

resources

Users are to be placed at

different tiers in the data

pyramid

A preliminary workflow was created in the DOE ICEE project

by ANL-ORNL-PPPL-KSTAR-ESnet.

Smart reduction,multistep

Effieient monitoring

Discrimination

6

Page 7: A Framework for International Collaboration on ITER Using ... Fusion … · Physics, Math, CS, AI Reduced Big Data many international users on different level resources Users are

Async. streaming, analysis & 10x

reduction, at and near sensors

Near real-time federated

feedback for steering

1st phase Scientific discovery

Smart-reduced data can be stored

(start of slow lane)

1,000s post-

analyses, worldwide

Collaborative, remote

automated reduced

analyses on “my”

computers

2nd phase Scientific discovery

Slow and Fast Data lanes in federated fusion research

HPC,Cloud...

Fast Lane

Slow

Lane

Local or distributed resources

Some data segregated

reservation

7

Page 8: A Framework for International Collaboration on ITER Using ... Fusion … · Physics, Math, CS, AI Reduced Big Data many international users on different level resources Users are

Machine learning and Bayesian methods can form a

key technology in federated data analysis

• ML techniques can quickly analyze large data streams. Multi-scale, multi-physicspresent in data stream, require ML toolsthat can handle

• PPPL developing use of deep CNN with dilated convolution architectures for tackling multi-scale/multi-physics problem (see talk this workshop Churchill Fri. morning)

• Bayesian inference can be used for comparing experimental output with simplified models, larger simulations

• Scalable solutions (Hamiltonian MC) for sophisticated, non-linear models

8

https://deepmind.com/blog/wavenet-generative-model-raw-audio/

Page 9: A Framework for International Collaboration on ITER Using ... Fusion … · Physics, Math, CS, AI Reduced Big Data many international users on different level resources Users are

Example KSTAR workflow: Machine learning for online ECEi frame

TRAIN on HPC resourcesMachine learning model to

predict NTM

Large KSTAR multi-shot dataset

(ECEi + other, ~TB)

Model to predict NTM

Model to predict NTM

Offline machine learning model training

Online NTM prediction and ECEiframe classification with trained machine learning model

Allows scientists to:1. quickly find physics of

interest in large datasets2. Remote steering feedback

Detectors

Page 10: A Framework for International Collaboration on ITER Using ... Fusion … · Physics, Math, CS, AI Reduced Big Data many international users on different level resources Users are

10

Anchor under-determined AI for prediction, using data from tiered simulations (reduced, first-principles, etc.)

Present data is usually an under-determined set: solution is not unique

Data from today’s

experiments

Interpolative AI/ML for today’s

experiment

Interpolation-based AI/ML prediction for future experiment

Anchor-based extrapolative AI/ML

for future experiment

Anchor-data from

predictive simulations

+

Validity?

Must be validated on today’s experiments D(xi) → yk

D′(xi+xi′) → Yk(⊃ yk)

This technique will get better as the accuracy and sampling of xi’ improves on more powerful computers with better algorithms (AI/ML algorithms included).

Limited to available anchors.

Page 11: A Framework for International Collaboration on ITER Using ... Fusion … · Physics, Math, CS, AI Reduced Big Data many international users on different level resources Users are

11

XGC (X-point Gyrokinetic code)

Page 12: A Framework for International Collaboration on ITER Using ... Fusion … · Physics, Math, CS, AI Reduced Big Data many international users on different level resources Users are

Example: future fusion reactors could be in different edge physics regime

12

1st ITER H-mode

10x a/ρi

C-Mod 1.4MA

High-IP C-Mod

High IP JET

• ITER, being in very different

dimensionless parameter regime,

could obey different physics

• Predictions on the present

tokamaks have been validated.

• Divertor heat-load width in ITER is

predicted to be >6X wider, by XGC

gyrokinetic simulation, than

regression from present tokamaks.

− Different turbulence physics is

found in ITER that spread the

heat-load more efficiently

We are trying to establish more accurate and more # of ITER-like simulations (anchors).

Anchor data

x’ x

Page 13: A Framework for International Collaboration on ITER Using ... Fusion … · Physics, Math, CS, AI Reduced Big Data many international users on different level resources Users are

Beyond ML methods, simple & advanced reduction methods

can be applied to experiment for reducing before sending

Select only areas of interest and send (e.g., blobs)

Reduce payload on average by about 5X

Filtered out

Areas of interest

512 pixels

640

pixe

ls

Sub-chunk

13[1] Choi IEEE Tran. NYSDS 2016

Page 14: A Framework for International Collaboration on ITER Using ... Fusion … · Physics, Math, CS, AI Reduced Big Data many international users on different level resources Users are

Mathematical Tools for Lossy & Faithful Data ReductionMark Ainsworth Brown U. & ORNL

Magnetic island detection

S: Smoothness parameter

14

Page 15: A Framework for International Collaboration on ITER Using ... Fusion … · Physics, Math, CS, AI Reduced Big Data many international users on different level resources Users are

The same data reduction technique can be applied to ELM and turblent blobs

Mark AinsworthBrown U. &ORNL

S: Smoothness parameter

15

Page 16: A Framework for International Collaboration on ITER Using ... Fusion … · Physics, Math, CS, AI Reduced Big Data many international users on different level resources Users are

Network infrastructure requires care for international

data transfer

● A small amount of packet loss in a network path makes a HUGE difference in throughput when transmitting data internationally.

● KSTAR – PPPL RTT = 170ms● NERSC – PPPL RTT = 60ms

16http://fasterdata.es.net/network-tuning/tcp-issues-explained/packet-loss/

Page 17: A Framework for International Collaboration on ITER Using ... Fusion … · Physics, Math, CS, AI Reduced Big Data many international users on different level resources Users are

Throughput tests show higher degradation for

intercontinental transfer with packet loss

● Throughput tests to PPPL from domestic (NERSC) and international(KSTAR) with iperf show marked difference.

● Because of higher RTT, it takes a long time for international to recover from a single packet loss

● Steady, 5-10x higher transfer from KSTAR->PPPL by removing packet loss

17

Continental (NERSC -> PPPL) Intercontinental (KSTAR -> PPPL)

Intercontinental (KSTAR -> PPPL)(Packet loss removed)

Page 18: A Framework for International Collaboration on ITER Using ... Fusion … · Physics, Math, CS, AI Reduced Big Data many international users on different level resources Users are

Where are we in the process?

● Remote streaming, with data reduction, implemented in

data I/O framework ADIOS [1]

● KSTAR - PPPL streaming demo on 8/2017 of near real-

time processing of ECEi data, with comparison to XGC

simulation

● Demo used special one-time PPPL network configuration (a la science DMZ),

which gave ~7Gbps > ITER-Japan test speed

○ Normal PPPL firewall setup gives ~0.2Gbps

○ Dedicated data-center with large-size cluster

is being planned at PPPL

● Initial KSTAR-NERSC test performed in 10/2018[1] Choi IEEE Tran. NYSDS 2016

18

Page 19: A Framework for International Collaboration on ITER Using ... Fusion … · Physics, Math, CS, AI Reduced Big Data many international users on different level resources Users are

Dat

a G

ener

atio

n

Reduction

Space+Time

Index

ADIOSData Stream

AD

IOS

Adios Data Streams Over Wide-Area-Network (WAN)

Research on stream-based WAN data process– In-transit processing (supporting data-in-memory)– Data indexing & reduction to reduce network payload– Plug-in methods: SST (EVPath-based), DataMan (ZMQ-based

control)

Data Hub

Analysis

Analysis

Analysis

Memory-to-memory data delivery

Transparent workflow execution

WAN Transportation• SST• DataMan

19

Page 20: A Framework for International Collaboration on ITER Using ... Fusion … · Physics, Math, CS, AI Reduced Big Data many international users on different level resources Users are

Adios between NERSC-KSTARKSTAR

NERSC, USA

• Measured in September 2018• Between KSTAR and NERSC• Run a simulation code with writers and readers• 8 writers at KSTAR side• 2-16 readers on a NERSC DTN node

ADIOS Data Writer

Analysis ADIOS

Analysis ADIOS

ADIOS Data Writer

Parallel Data Streams

20

Page 21: A Framework for International Collaboration on ITER Using ... Fusion … · Physics, Math, CS, AI Reduced Big Data many international users on different level resources Users are

Software Infrastructure to Enable the Workflow

Site A Site B

Data Generator

ADIOS 2.0Write API

DataManEngine

DataManagerEngine Library

ZfpMan MdtmMan

ZmqMan

EvPathMan

DataManagerLibrary

MdtmMan

ZfpMan

Callback

Data Analyzer / Visualizer

ADIOS 2.0Read API

DataManEngine

ZmqMan

EvPathMan

Real-time feedback controlling data streaming strategies, compression strategies, etc.

21

Software infrastructure for orchestrating data analysis (launching jobs, orchestrating the data movement) needed, OMFIT or Kepler likely candidates

Page 22: A Framework for International Collaboration on ITER Using ... Fusion … · Physics, Math, CS, AI Reduced Big Data many international users on different level resources Users are

Conclusion and Discussion▪ Federated data science with AI, data reduction, etc. could

accelerate the realization of commercially viable demo reactors

▪ Data from full-scale ITER operation is too big and too fast, with

significant variety and veracity, for delivery and storage.

• All of them are valuable for (unexpected) scientific discovery

▪ Quick steering of ITER experiments could accelerate

achievement of its Q=10 goal• Real-time analysis and simulation could steer the current shot

• First-priniples-based simulations could steer next day shots

▪ AI/ML must be utilized to solve these problems

▪ First-principles-based simulations could provide anchor to

predictive AI/ML

▪ Network and software infrastructure needed to achieve goals

▪ Our US SciDAC HBPS team has started these research activities 22

Page 23: A Framework for International Collaboration on ITER Using ... Fusion … · Physics, Math, CS, AI Reduced Big Data many international users on different level resources Users are

END PRESENTATION

23

Page 24: A Framework for International Collaboration on ITER Using ... Fusion … · Physics, Math, CS, AI Reduced Big Data many international users on different level resources Users are

What are we trying to accomplish?Framework for end-to-end, large-scale, remote analysis

near real-time feedback ability

• Why would you want such a framework?– Several science experiments need (or benefit) from human steering.

Faster, better analysis = better decision making = more science productivity

• What are the use cases?– Insufficient compute resources on-site: Connecting large-scale science

experiments to worldwide compute resources expands quantity and quality of analysis able to perform

– Remote scientists: Science collaborations are global, this framework would enable remote scientists to contribute in near real-time

– Simplify analysis setup: Abstraction for parallelization, easy to use libraries

Page 25: A Framework for International Collaboration on ITER Using ... Fusion … · Physics, Math, CS, AI Reduced Big Data many international users on different level resources Users are

• Introduction to three axes of fusion-plasma issues to be solved before

Q=10 production of alpha particles in ITER

• Fusion experiment/simulation data and machine learnig

• New paradigm: Federated Data-Science and Machine Learning for

Accelerated Fusion Research

• Using first-principles simulations to provide anchor to predictive ML

• Present status of the federated workflow research in the KSTAR-

NERSC-PPPL (HBPS-Group)-ORNL-ESNet-KREONET Consortium

• Conclusion and Discussion

Outline

25

Page 26: A Framework for International Collaboration on ITER Using ... Fusion … · Physics, Math, CS, AI Reduced Big Data many international users on different level resources Users are

▪ Large scale instabilities:

T~10-1-102 ms, Lr ~ 10-1-100m

• Plasma disruption could damage the material

wall and the reactor infrastructure

• Edge localized modes (ELMs) could damage the

divertor material

▪ Divertor heat-load width; multiscale issue• Regression from present experiments predicts

steady deposition of ~50MW on ~1cm x 50m belt;

ITER’s design is based on the ~5x wider strip:

10-3-102m

▪ Central plasma pressure (fusion power ∝ n2T2)

• Central plasma pressure ∝ edge pedestal height

▪ (Alpha particle physics)

Three axes of plasma issues in achieving economically viable magnetic fusion reaction

26

Page 27: A Framework for International Collaboration on ITER Using ... Fusion … · Physics, Math, CS, AI Reduced Big Data many international users on different level resources Users are

Fusion research can be greatly accelerated by federating thousands of sensors and distribted scientists/computers

KSTAR27

Page 28: A Framework for International Collaboration on ITER Using ... Fusion … · Physics, Math, CS, AI Reduced Big Data many international users on different level resources Users are

28

Smart Reduction

▪ Asynchronous

▪ Importance recognition

▪ Reduction

▪ provenance

▪ Indexing (provenance, tier identification, importance, …)

▪ Compression

▪ …

Supercomputers and Demo/commercial reactors

In the future DEMO reactors in “nuclear” environment, sensors can be highly limited, and HPCs could become a more important part of the federated research and ML.

Page 29: A Framework for International Collaboration on ITER Using ... Fusion … · Physics, Math, CS, AI Reduced Big Data many international users on different level resources Users are

05

10

5

20Z

00

30

Y X

40

-5-5-10

-10 -15

05

10

5

20Z

30

Y

0 0

X

40

-5

-5 -10

Toward Machine Learning for Implicit Particle PushR. Archibald (ORNL), Ben Sturdevant, and C-S Chang (PPPL)

Goal: Use machine learning to act as effective preconditioner to accelerate implicit particle push in electromagnetic XGC1

For more information: Rick Archibald ([email protected]),

Data Analytics lead

10 20 30 40 50 60

Time (s)

0

1

2

3

4

5

6

7

Itera

tio

ns

Implicit

ML

Offline Learning of

random trajectories in

time-varying magnetic

field

Training data: 10K different

Trajectories, with

Machine adapted online

as implicit method

operates on

preconditioned

predictions

▪ Success: Within range of

training data, initial guess from

ML accelerates convergence by

halving iterations.

Random trajectory model

29

Page 30: A Framework for International Collaboration on ITER Using ... Fusion … · Physics, Math, CS, AI Reduced Big Data many international users on different level resources Users are

30

ML SolutionsProblem

Machine Learning for Extreme-Scale Simulation

1. Physics discovery from big data• Preload experimental data on HPC

memory for training (DataSpaces tech.)

• ML can “analyze, select, reduce and

compress” the data in situ, in memory

• Lossy reduction: Preserve physics

features to predetermined accuracy

2. Simulation acceleration• ML/AI for particle pushing, load balancing,

field solving, atomic physics, PMI, etc

• AI/ML telescoping of 2D profile evolution

without many training data

▪ Distributed and scalable ML needed

▪ Close-collaboration with math and

computer science necessary for

Service-Oriented ”plug-in” ML tools

Peta/Exascale computers can enable

realistic global multiscale simulations of

whole fusion device

1. Physics discovery from Big data

− Data size too big for post-processing

− XGC on Summit generates >100PB/1-

days for one ITER simulation, but the file-

system size is 250PB.

− Presently, we rerun simulation to return

to the unwritten data.

2. Simulation acceleration− Presently, brute-force math

− ML could cut down simulation time

− Telecoping of profile evolution via to the

2D fluxes is in high demand

Page 31: A Framework for International Collaboration on ITER Using ... Fusion … · Physics, Math, CS, AI Reduced Big Data many international users on different level resources Users are

31

ML SolutionsProblem

Machine Learning for Reduced Simulation

Reduced simulation can “reinforce”

experimental ML and steer present exp.

1. Accuracy and speed in the closure• Get the data from first-principles-based

simulations under various conditions

• Different spatial locations and time

• Different geometry

• Inverse problem from experimental data

2. Profile evolution• Utilize 1D transport-flux data from many

simulations

• Problem becomes challenging in 2D, 3D

▪ Extremely reduced model?

− Can we have an ”equation-free”

plasma model using AI/NN?

▪ Fast parameter-scan on “My

Computer” is key

1. Accuracy and speed in closure

evaluation is an issue

− Function of many variables

− General analytic expressions do not

exist except for the idealized case

(CGL-Braginkii)

− Kinetic closure simulation for each

case is expensive

2. Faithfully accelerated profile-

evolution needed

− a subject of active research presently

Page 32: A Framework for International Collaboration on ITER Using ... Fusion … · Physics, Math, CS, AI Reduced Big Data many international users on different level resources Users are

K-Means Clustering reveals that the warm trapped electrons

exhibit correlated response to SOL turbulence in full-Ip ITER:

another evidence that the λq spread in full-Ip ITER is from TEMs.

(R. Churchill, PPPL)

▪ K=6

▪ At a warm energy band relative to

local Te, trapped particle’s response

to edge turbulence is mutually

correlated, and decorrelated from

passing particles

• A sign of TEM driven turbulence

Apache Spark is used here.

32

Example of unsupervised ML for physics discovery from simulation data

Use ADIOS framework

Page 33: A Framework for International Collaboration on ITER Using ... Fusion … · Physics, Math, CS, AI Reduced Big Data many international users on different level resources Users are

ML can be executed on GPUs while idle: task parallelization

33

XGC1

XGC is using ADIOS framework for on-memory, in-situ ML.