exploratory machine learning studies for disruption ... documents/fusion...– avoided...

25
1 by C. Rea, R.S. Granetz MIT Plasma Science and Fusion Center, Cambridge, MA, USA Presented at the 2 nd IAEA Technical Meeting on Fusion Data Processing, Validation and Analysis MIT Samberg Center Cambridge, MA, USA May 30 th – June 2 nd , 2017 Exploratory Machine Learning studies for disruption prediction on DIII-D C. Rea / IAEA-TM FDPVA / May 2017

Upload: dangminh

Post on 15-Mar-2018

214 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Exploratory Machine Learning studies for disruption ... Documents/Fusion...– avoided hardware-related ... J. Vega et al. Fusion Engineering and Design 88 (2013) [3] B ... space,

1

by C. Rea, R.S. GranetzMIT Plasma Science and Fusion Center, Cambridge, MA, USA

Presented at the 2nd IAEA Technical Meeting on Fusion Data Processing, Validation and Analysis

MIT Samberg CenterCambridge, MA, USA

May 30th – June 2nd , 2017

Exploratory Machine Learning studies for disruption prediction on DIII-D

C. Rea / IAEA-TM FDPVA / May 2017

Page 2: Exploratory Machine Learning studies for disruption ... Documents/Fusion...– avoided hardware-related ... J. Vega et al. Fusion Engineering and Design 88 (2013) [3] B ... space,

2

• the problem complexity has motivated the recent efforts for development of data-driven predictors and Machine Learning studies to successfully predict disruption events with sufficient warning time

• SQL database was created, gathering time series of relevant plasma parameters: similar tables available on DIII-D, EAST and C-Mod– cross-device analysis

• exploratory studies to gain further insight on the DIII-D dataset before addressing the problem of a disruption-warning algorithm– binary and multi-label classification analysis through Machine Learning algorithms– variable importance ranking– accuracy metrics and comparison between different models

the successful avoidance of disruption events is extremely relevant for fusion devices and in particular for ITER

C. Rea / IAEA-TM FDPVA / May 2017

Page 3: Exploratory Machine Learning studies for disruption ... Documents/Fusion...– avoided hardware-related ... J. Vega et al. Fusion Engineering and Design 88 (2013) [3] B ... space,

3

• 10 features out of ~40 available parameters– cfr. Granetz talk for a full db description

• mainly dimensionless or machine-independentparameters

• focus on flattop disruptions: 195 flattop disruptions complemented by an analogous number of discharges that did not disrupt, possibly extracted from the same experiments (similar operational space) ⟹ 394 discharges

• ~70,000 samples for each of the 10 chosen input variables

• reliable knowledge base capable of highlighting the underlying physics :– validated signals and EFIT reconstructions– avoided intentional disruptions– avoided hardware-related disruptions by specifically checking for feedback control on

plasma current or UFOs events

we chose a subset of features and samples for ML applications to the DIII-D database for disruption prediction

li Ip_error_fraction Vloopq95 radiated_fraction n1amp

beta_p dWmhd_dtn/nG Te_HWHM

C. Rea / IAEA-TM FDPVA / May 2017

Page 4: Exploratory Machine Learning studies for disruption ... Documents/Fusion...– avoided hardware-related ... J. Vega et al. Fusion Engineering and Design 88 (2013) [3] B ... space,

4

• ML supervised and unsupervised algorithms, mainly developed at JET and also studied in real-time environment: – Artificial Neural Networks[1], multi-tiered Support Vector Machines[2], Manifolds and

Generative Topographic Maps[3]

[1] B. Cannas et al. Nuclear Fusion 44 (2004) 68-76 [2] J. Vega et al. Fusion Engineering and Design 88 (2013)[3] B. Cannas et al. Nuclear Fusion 57 (2013) 093023

C. Rea / IAEA-TM FDPVA / May 2017

a plethora of ML algorithms is available in literature – already tested on other devices for disruption prediction

Page 5: Exploratory Machine Learning studies for disruption ... Documents/Fusion...– avoided hardware-related ... J. Vega et al. Fusion Engineering and Design 88 (2013) [3] B ... space,

5

• ML supervised and unsupervised algorithms, mainly developed at JET and also studied in real-time environment: – Artificial Neural Networks[1], multi-tiered Support Vector Machines[2], Manifolds and

Generative Topographic Maps[3]

[1] B. Cannas et al. Nuclear Fusion 44 (2004) 68-76 [2] J. Vega et al. Fusion Engineering and Design 88 (2013)[3] B. Cannas et al. Nuclear Fusion 57 (2013) 093023

• to better understand the dataset and gain further insights : “white box” approach− inner components and logic are available for inspection − importance of individual features can be determined

• Random Forests[4]: a large collection of randomized de-correlated decision trees that can be used to solve both classification and regression problems

[4] L. Breiman, “Random Forests”, Machine Learning, 45(1), 5-32, 2001

C. Rea / IAEA-TM FDPVA / May 2017

a plethora of ML algorithms is available in literature – already tested on other devices for disruption prediction

Page 6: Exploratory Machine Learning studies for disruption ... Documents/Fusion...– avoided hardware-related ... J. Vega et al. Fusion Engineering and Design 88 (2013) [3] B ... space,

6

• CART (Classification and Regression Trees) algorithms repeatedly partition the input space, implementing a test function at each split (node), to build a tree whose nodes are as pure as possible

• 2 features (x1, x2) and 2 classes (red, blue)

• the algorithm selects a splitting value to partition the dataset, by minimizing an impurity measure:

E. Alpaydin, “Introduction to Machine Learning”, 2nd edition, MIT Press

C. Rea / IAEA-TM FDPVA / May 2017

decision trees are hierarchical data structures implementing the divide-and-conquer strategy: 2D classification example

Page 7: Exploratory Machine Learning studies for disruption ... Documents/Fusion...– avoided hardware-related ... J. Vega et al. Fusion Engineering and Design 88 (2013) [3] B ... space,

7

• CART (Classification and Regression Trees) algorithms repeatedly partition the input space, implementing a test function at each split (node), to build a tree whose nodes are as pure as possible

• 2 features (x1, x2) and 2 classes (red, blue)

R2R1

R3

• the algorithm selects a splitting value to partition the dataset, by minimizing an impurity measure:

• all the subdivided regions are pure red or blue data

C. Rea / IAEA-TM FDPVA / May 2017

decision trees are hierarchical data structures implementing the divide-and-conquer strategy: 2D classification example

• a set of rules is defined and can be visualized as a tree structure

E. Alpaydin, “Introduction to Machine Learning”, 2nd edition, MIT Press

Page 8: Exploratory Machine Learning studies for disruption ... Documents/Fusion...– avoided hardware-related ... J. Vega et al. Fusion Engineering and Design 88 (2013) [3] B ... space,

8

• CART (Classification and Regression Trees) algorithms repeatedly partition the input space, implementing a test function at each split (node), to build a tree whose nodes are as pure as possible

• 2 features (x1, x2) and 2 classes (red, blue)

R2R1

R3

• the algorithm selects a splitting value to partition the dataset, by minimizing an impurity measure:

• all the subdivided regions are pure red or blue data

-0.55

0.3

C. Rea / IAEA-TM FDPVA / May 2017

decision trees are hierarchical data structures implementing the divide-and-conquer strategy: 2D classification example

• a set of rules is defined and can be visualized as a tree structure

E. Alpaydin, “Introduction to Machine Learning”, 2nd edition, MIT Press

Page 9: Exploratory Machine Learning studies for disruption ... Documents/Fusion...– avoided hardware-related ... J. Vega et al. Fusion Engineering and Design 88 (2013) [3] B ... space,

9

• CART (Classification and Regression Trees) algorithms repeatedly partition the input space, implementing a test function at each split (node), to build a tree whose nodes are as pure as possible

• 2 features (x1, x2) and 2 classes (red, blue)

R2R1

R3

• the algorithm selects a splitting value to partition the dataset, by minimizing an impurity measure:

• all the subdivided regions are pure red or blue data

-0.55

0.3

C. Rea / IAEA-TM FDPVA / May 2017

decision trees are hierarchical data structures implementing the divide-and-conquer strategy: 2D classification example

• this set of rules is used to classify a new, unseen (test) sample

E. Alpaydin, “Introduction to Machine Learning”, 2nd edition, MIT Press

?

Page 10: Exploratory Machine Learning studies for disruption ... Documents/Fusion...– avoided hardware-related ... J. Vega et al. Fusion Engineering and Design 88 (2013) [3] B ... space,

10

recursive binary trees have a key feature: interpretability, but they tend to overfit – pruning needed

C. Rea / IAEA-TM FDPVA / May 2017

• decision trees have advantages and limitations, as well as other ML algorithms

Page 11: Exploratory Machine Learning studies for disruption ... Documents/Fusion...– avoided hardware-related ... J. Vega et al. Fusion Engineering and Design 88 (2013) [3] B ... space,

11

recursive binary trees have a key feature: interpretability, but they tend to overfit – pruning needed

C. Rea / IAEA-TM FDPVA / May 2017

• decision trees have advantages and limitations, as well as other ML algorithms

for classification purposes: RF Tree Neural Nets SVMsno overfittingintrinsic feature selection androbustness to outliersparameter tuningnon-parametric models (no a-priori assumptions)interpretabilitynatural handling of mixed type dataprediction accuracytime handling

• considerations valid in case of classification purposes only• Random Forests seems a promising algorithm

good poorfair

Page 12: Exploratory Machine Learning studies for disruption ... Documents/Fusion...– avoided hardware-related ... J. Vega et al. Fusion Engineering and Design 88 (2013) [3] B ... space,

12

Random Forests is an ensemble method, leveraging bootstrapping and averaging techniques

C. Rea / IAEA-TM FDPVA / May 2017

main steps of the algorithm:

• grow many iid trees (e.g., 500 trees) on bootstrap samples of the original training set– random sampling with replacement from the training set

• minimize bias by fully growing trees (no pruning needed)

• reduce variance of noisy but unbiased (fully grown) trees by averaging (regression) or majority voting (classification) for the final decisions on test samples

• the in-between-trees correlation minimization is obtained through the maximization of variance reduction:– bootstrap data for each tree– random sub-selection of input variables at each split– best split chosen by minimizing an impurity measure (as for CART models)

Page 13: Exploratory Machine Learning studies for disruption ... Documents/Fusion...– avoided hardware-related ... J. Vega et al. Fusion Engineering and Design 88 (2013) [3] B ... space,

13

-0.55

0.3 from a 2D space with few points to a 10D space with many thousands of points

(x1, x2)

binary classification problem: disrupted/non-disrupted

C. Rea / IAEA-TM FDPVA / May 2017

li Ip_error_fraction Vloopq95 radiated_fraction n1amp

beta_p dWmhd_dtn/nG Te_HWHM

Page 14: Exploratory Machine Learning studies for disruption ... Documents/Fusion...– avoided hardware-related ... J. Vega et al. Fusion Engineering and Design 88 (2013) [3] B ... space,

14

-0.55

0.3

10 variables~ 70,000 time slices394 discharges195 flattop disruptions500 decision trees

binary classification problem: disrupted/non-disruptedgraphical depiction of a single tree in a Random Forests

li Ip_error_fraction Vloopq95 radiated_fraction n1amp

beta_p dWmhd_dtn/nG Te_HWHM

C. Rea / IAEA-TM FDPVA / May 2017

from a 2D space with few points to a 10D space with many thousands of points

(x1, x2)

Page 15: Exploratory Machine Learning studies for disruption ... Documents/Fusion...– avoided hardware-related ... J. Vega et al. Fusion Engineering and Design 88 (2013) [3] B ... space,

15

graphical depiction of a single tree in a Random Forests in a classification scheme – disrupted/non-disrupted

C. Rea / IAEA-TM FDPVA / May 2017

fully grown tree

first three layersof a tree

• features are not scaled a-priori• at each node, random sub-

selection of features• impurity minimization (Gini index)

by choosing the best split

Page 16: Exploratory Machine Learning studies for disruption ... Documents/Fusion...– avoided hardware-related ... J. Vega et al. Fusion Engineering and Design 88 (2013) [3] B ... space,

16

binary classification problem: disrupted/non-disruptedno time dependency

• leaf nodes: complete separation of the training samples in the two classes

original dataset split in training and test subsets

500 trees built on training data to define a set of rules

set of rules is used to decide if new, unseen samples in our test set, belong to one of the two classesC. Rea / IAEA-TM FDPVA / May 2017

Page 17: Exploratory Machine Learning studies for disruption ... Documents/Fusion...– avoided hardware-related ... J. Vega et al. Fusion Engineering and Design 88 (2013) [3] B ... space,

17

how to classify a new sample belonging to the test subset with a Random Forests and how to assess the classifier’s accuracy

for all the trees in the forest…

final class will be chosen through majority vote or by averaging the probabilities

source: O. Dürr, ZHAW School of Engineering

C. Rea / IAEA-TM FDPVA / May 2017

new observation new observation new observation

Page 18: Exploratory Machine Learning studies for disruption ... Documents/Fusion...– avoided hardware-related ... J. Vega et al. Fusion Engineering and Design 88 (2013) [3] B ... space,

18

confusion matrix is used as an accuracy metrics to assess the model’s capability to discriminate between class labelsbinary classification -no time dependency15ms black window before disruption event

disruptednon-disrupted

C. Rea / IAEA-TM FDPVA / May 2017

Page 19: Exploratory Machine Learning studies for disruption ... Documents/Fusion...– avoided hardware-related ... J. Vega et al. Fusion Engineering and Design 88 (2013) [3] B ... space,

19no

n-disr

upted

disrup

ted

Predicted label

non-disrupted

disrupted

True

labe

l

94.2% 5.8%

5.0% 95.0%

Confusion matrix

confusion matrix is used as an accuracy metrics to assess the model’s capability to discriminate between class labelsbinary classification -no time dependency15ms black window before disruption event

successfullypredicted disrupted samples

false positive alarms

missed detections (false negatives)

successfullypredicted

non-disrupted samples

disruptednon-disrupted

confusion matrixpercentages are normalized with respect to each true class label

C. Rea / IAEA-TM FDPVA / May 2017

Page 20: Exploratory Machine Learning studies for disruption ... Documents/Fusion...– avoided hardware-related ... J. Vega et al. Fusion Engineering and Design 88 (2013) [3] B ... space,

20

relative importance ranking extracted from the Random Forests – binary classification case – no time dependency

0. 00 0. 06 0. 12 0. 18 0. 24 0. 30Relative Importance

n1amp

prad_frac

dWmhd_dt

Vloop

ip_error_frac

Te_HWHM

li

n/nG

betap

q95Variable Importance

• relative variable importance wrt label predictability is defined as mean decrease impurity – possibility to implement the permutation importance metric

• q95 is the relatively most important variable

Page 21: Exploratory Machine Learning studies for disruption ... Documents/Fusion...– avoided hardware-related ... J. Vega et al. Fusion Engineering and Design 88 (2013) [3] B ... space,

21 0. 00 0. 06 0. 12 0. 18 0. 24 0. 30Relative Importance

n1amp

prad_frac

dWmhd_dt

Vloop

ip_error_frac

Te_HWHM

li

n/nG

betap

q95Variable Importance

blue: non-disrupted discharges – flattop only red: disruptions during flattop

• relative variable importance wrt label predictability is defined as mean decrease impurity – possibility to implement the permutation importance metric

• q95 is the relatively most important variable

relative importance ranking extracted from the Random Forests – binary classification case – no time dependency

Page 22: Exploratory Machine Learning studies for disruption ... Documents/Fusion...– avoided hardware-related ... J. Vega et al. Fusion Engineering and Design 88 (2013) [3] B ... space,

22no

n-disr

upted

far fro

m disr

close

to di

sr

Predicted label

non-disrupted

far from disr

close to disr

True

labe

l

91.6% 6.5% 1.9%

18.3% 81.4% 0.3%

2.6% 0.1% 97.3%

Confusion matrix

in the multi-label classification, the time dependency is introduced through the definition of different class labels

multi-label classification

non-disrupted

close to disr

non-disrupted : time slices for non disruptive shotsfar from disr : time_until_disrupt > 1sclose to disr : time_until_disrupt < 0.1s

far from disr

C. Rea / IAEA-TM FDPVA / May 2017

Page 23: Exploratory Machine Learning studies for disruption ... Documents/Fusion...– avoided hardware-related ... J. Vega et al. Fusion Engineering and Design 88 (2013) [3] B ... space,

23

• class labels have an induced time dependency for the human eye• similar results with respect to the binary classification case

0. 00 0. 05 0. 10 0. 15 0. 20 0. 25Relative Importance

n1amp

prad_frac

Vloop

ip_error_frac

dWmhd_dt

Te_HWHM

li

betap

n/nG

q95Variable Importance

relative importance ranking extracted from the Random Forests – multi-label classification case

Page 24: Exploratory Machine Learning studies for disruption ... Documents/Fusion...– avoided hardware-related ... J. Vega et al. Fusion Engineering and Design 88 (2013) [3] B ... space,

24

ROCs illustrate relative tradeoff between benefits (true positives) and costs (false positives) of binary classifiers• RF (blue) maximizes the AUC (Area Under the Curve) if compared to Multi-Layer

Perceptron (MLP) or Support Vector Machine (SVM)• RF catches a higher number of correct classification with respect to a lower number

of false positives

0. 00 0. 06 0. 12 0. 18 0. 24 0. 30False positive rate

0. 80

0. 84

0. 88

0. 92

0. 96

1. 00

True

pos

itive

rate

ROC curve (zoomed in at top left)

RF AUC 0.98, model score 0.94MLP AUC 0.95, model score 0.89SVM AUC 0.93, model score 0.85

0. 0 0. 2 0. 4 0. 6 0. 8 1. 0False positive rate

0. 0

0. 2

0. 4

0. 6

0. 8

1. 0

True

pos

itive

rate

ROC curve

RF AUC 0.98, model score 0.94MLP AUC 0.95, model score 0.89SVM AUC 0.93, model score 0.85

random guess

Page 25: Exploratory Machine Learning studies for disruption ... Documents/Fusion...– avoided hardware-related ... J. Vega et al. Fusion Engineering and Design 88 (2013) [3] B ... space,

25

• ML classification gives promising results, with optimal performances (Random Forests) and possibility of gaining further insights on the dataset

• better features and better definition of the target variable to address the disruption-proximity problem: “how much time until the discharge is going to disrupt”

• evaluate different algorithms that could best perform in case of regression problems• real-time integration with the PCS• dimensionless and machine-independent features enable cross-device analysis:

comparison with EAST and C-Mod data

summary and conclusions

C. Rea / IAEA-TM FDPVA / May 2017