high-performance outlier detection algorithm for finding ...lfwu/slides/bdac14.pdf ·...

20
BDAC-SC14 1 / 17 High-Performance Outlier Detection Algorithm for Finding Blob-Filaments in Plasma Lingfei Wu 1 , Kesheng Wu 2 , Alex Sim 2 , Michael Churchill 3 , Jong Y. Choi 4 , Andreas Stathopoulos 1 , CS Chang 3 , and Scott Klasky 4 1 College of William and Mary 2 Lawrence Berkeley National laboratory 3 Princeton Plasma Physics Laboratory 4 Oak Ridge National Laboratory

Upload: halien

Post on 15-Feb-2019

215 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: High-Performance Outlier Detection Algorithm for Finding ...lfwu/Slides/BDAC14.pdf · High-Performance Outlier Detection Algorithm for Finding ... Morgan kaufmann, ... images Contouring

BDAC-SC14 1 / 17

High-Performance Outlier Detection Algorithm for Finding

Blob-Filaments in Plasma

Lingfei Wu1, Kesheng Wu2, Alex Sim2, Michael Churchill3, Jong Y. Choi4,

Andreas Stathopoulos1, CS Chang3, and Scott Klasky4

1College of William and Mary2Lawrence Berkeley National laboratory3Princeton Plasma Physics Laboratory

4Oak Ridge National Laboratory

Page 2: High-Performance Outlier Detection Algorithm for Finding ...lfwu/Slides/BDAC14.pdf · High-Performance Outlier Detection Algorithm for Finding ... Morgan kaufmann, ... images Contouring

Outline

• Outline

Introduction

Related work

Blob detection

Hybrid parallel

Evaluations

Conclusion

BDAC-SC14 2 / 17

Introduction

Related work

Blob detection

Hybrid parallel

Evaluations

Conclusion

Page 3: High-Performance Outlier Detection Algorithm for Finding ...lfwu/Slides/BDAC14.pdf · High-Performance Outlier Detection Algorithm for Finding ... Morgan kaufmann, ... images Contouring

What is an outlier ?

• Outline

Introduction

• Outlier Detection

• Our goal

• Blobs in fusion

• Motivation

Related work

Blob detection

Hybrid parallel

Evaluations

Conclusion

BDAC-SC14 3 / 17

An outlier is a data object that deviates significantly from the

rest of the objects, as if it were generated by a different

mechanism.1

• Outliers could be errors or noise to be eliminated

• Outliers can lead to the discovery of important information in data

Outlier detection is employed in a variety of applications:

• fraud detection

• time-series monitoring

• medical care

• public safety and security

1Jiawei Han and Micheline Kamber, Data Mining, Southeast Asia Edition:

Concepts and Techniques, Morgan kaufmann, 2006.

Page 4: High-Performance Outlier Detection Algorithm for Finding ...lfwu/Slides/BDAC14.pdf · High-Performance Outlier Detection Algorithm for Finding ... Morgan kaufmann, ... images Contouring

Our goal

• Outline

Introduction

• Outlier Detection

• Our goal

• Blobs in fusion

• Motivation

Related work

Blob detection

Hybrid parallel

Evaluations

Conclusion

BDAC-SC14 4 / 17

Outlier detection is an important task in many safety critical

environments.

• An outlier demands to be detected in real-time

• A suitable feedback is provided to alarm the control system

• The size of data sets need fast and scalable outlier detection

methods

Our goal: apply the outlier detection techniques to effectively

tackle the fusion blob detection problem on extremely large

parallel machines

• Massive amounts of data are generated from fusion experiments

/ simulations

• Near real-time understanding of data is needed to predict

performance

Page 5: High-Performance Outlier Detection Algorithm for Finding ...lfwu/Slides/BDAC14.pdf · High-Performance Outlier Detection Algorithm for Finding ... Morgan kaufmann, ... images Contouring

Blobs in fusion

• Outline

Introduction

• Outlier Detection

• Our goal

• Blobs in fusion

• Motivation

Related work

Blob detection

Hybrid parallel

Evaluations

Conclusion

BDAC-SC14 5 / 17

What is fusion & Why fusion?

• Fusion is viable energy

source for the future

• Fossil fuels will run out

soon; Solar and wind have

limited potential

• Advantages of fusion:

inexhaustible, clear and

safe

Page 6: High-Performance Outlier Detection Algorithm for Finding ...lfwu/Slides/BDAC14.pdf · High-Performance Outlier Detection Algorithm for Finding ... Morgan kaufmann, ... images Contouring

Blobs in fusion

• Outline

Introduction

• Outlier Detection

• Our goal

• Blobs in fusion

• Motivation

Related work

Blob detection

Hybrid parallel

Evaluations

Conclusion

BDAC-SC14 5 / 17

Blobs are intermittent bursts of particles near the edge of

the confined plasma

⇒ Driven by turbulence

Blobs are bad for fusion performance

because they:

• Transport heat and particles away from

the confined plasma

• May damage the main chamber wall

• Lead to increased levels of neutrals and

impurities, bypassing control

mechanisms

Blob detection is a very important task!

Page 7: High-Performance Outlier Detection Algorithm for Finding ...lfwu/Slides/BDAC14.pdf · High-Performance Outlier Detection Algorithm for Finding ... Morgan kaufmann, ... images Contouring

Big data challenges in fusion energy

• Outline

Introduction

• Outlier Detection

• Our goal

• Blobs in fusion

• Motivation

Related work

Blob detection

Hybrid parallel

Evaluations

Conclusion

BDAC-SC14 6 / 17

Fusion experiments generate massive amounts of data:

• Diagnostics measuring lasts

from a few to several hundred

seconds generating large

amounts of data, ∼ Gigabytes

to Terabytes!

• Large-scale fusion simulation

generates ∼ a few tens of

Terabytes per second!

Page 8: High-Performance Outlier Detection Algorithm for Finding ...lfwu/Slides/BDAC14.pdf · High-Performance Outlier Detection Algorithm for Finding ... Morgan kaufmann, ... images Contouring

Big data challenges in fusion energy

• Outline

Introduction

• Outlier Detection

• Our goal

• Blobs in fusion

• Motivation

Related work

Blob detection

Hybrid parallel

Evaluations

Conclusion

BDAC-SC14 6 / 17

Difficulties in large-scale data analysis:

• Existing data analysis is often

a single-threaded, slow, and

only for post-run analysis

• Fusion experiments demand

real-time data analysis

• E.g. ICEE aims to apply blob

detection for monitoring health

of fusion experiments in

KSTAR

Real-time blob detection is a very challenging

task!

Page 9: High-Performance Outlier Detection Algorithm for Finding ...lfwu/Slides/BDAC14.pdf · High-Performance Outlier Detection Algorithm for Finding ... Morgan kaufmann, ... images Contouring

Three approaches for blob detection

• Outline

Introduction

Related work

• Related work

Blob detection

Hybrid parallel

Evaluations

Conclusion

BDAC-SC14 7 / 17

Single

threshold &

conditional

averaging

• The exact criterion varies

• Averaging may destroy important

information

Image

analysis

techniques

• Very sensitive to the setting of

parameters

• Hard to use generic method for all

images

Contouring

method &

thresholding

• Can not be a real-time blob detection

• May miss detecting blobs at the edge

• Is still post-run-analysis

Page 10: High-Performance Outlier Detection Algorithm for Finding ...lfwu/Slides/BDAC14.pdf · High-Performance Outlier Detection Algorithm for Finding ... Morgan kaufmann, ... images Contouring

An efficient blob detection approach

• Outline

Introduction

Related work

Blob detection

• Our approach

• The sketch

• Refine mesh

• Two-step detection

• Fast CCL

Hybrid parallel

Evaluations

Conclusion

BDAC-SC14 8 / 17

• Our approach: an outlier detection algorithm for efficiently

finding blobs in fusion simulations / experiments

◦ Two-step outlier detection with various criteria after

normalizing the local intensity

◦ Leverage a fast connected component labeling method to

find blob components based on a refined triangular mesh

• Contributions:

◦ A new method not missing detection of blobs in the edge of

the region of interests compared to contouring method

◦ Targeting for more challenging in-shot-analysis and

between-shot-analysis

◦ The first research work to achieve blob detection in a few

milliseconds

Page 11: High-Performance Outlier Detection Algorithm for Finding ...lfwu/Slides/BDAC14.pdf · High-Performance Outlier Detection Algorithm for Finding ... Morgan kaufmann, ... images Contouring

Outlier detection algorithm for finding blobs

• Outline

Introduction

Related work

Blob detection

• Our approach

• The sketch

• Refine mesh

• Two-step detection

• Fast CCL

Hybrid parallel

Evaluations

Conclusion

BDAC-SC14 9 / 17

Sketch the proposed outlier detection algorithm:

Page 12: High-Performance Outlier Detection Algorithm for Finding ...lfwu/Slides/BDAC14.pdf · High-Performance Outlier Detection Algorithm for Finding ... Morgan kaufmann, ... images Contouring

Refine mesh in the region of interests

• Outline

Introduction

Related work

Blob detection

• Our approach

• The sketch

• Refine mesh

• Two-step detection

• Fast CCL

Hybrid parallel

Evaluations

Conclusion

BDAC-SC14 10 / 17

1.2 1.4 1.6 1.8 2 2.2 2.4

-1

-0.5

0

0.5

1

R (m)

Magnetic Fields in Poloidal Plane

Z (

m)

Poloidal PlaneRegion of Interests

2.25 2.26 2.27 2.28 2.29 2.3 2.31 2.32-0.25

-0.2

-0.15

-0.1

-0.05

0

0.05

0.1

0.15

0.2

0.25

R (m)

Z (

m)

Reinfed Original

• Compute 4 times more triangles by creating new vertexes with

the three middle points of original edges

• Apply recursively until reaching the desired resolution

• Depend on specified data set and demanded resolution

Page 13: High-Performance Outlier Detection Algorithm for Finding ...lfwu/Slides/BDAC14.pdf · High-Performance Outlier Detection Algorithm for Finding ... Morgan kaufmann, ... images Contouring

Two-step outlier detection to identify blobs

• Outline

Introduction

Related work

Blob detection

• Our approach

• The sketch

• Refine mesh

• Two-step detection

• Fast CCL

Hybrid parallel

Evaluations

Conclusion

BDAC-SC14 11 / 17

Motivation for two-step outlier detection for finding blobs:

A contour plot in the region of interests

Page 14: High-Performance Outlier Detection Algorithm for Finding ...lfwu/Slides/BDAC14.pdf · High-Performance Outlier Detection Algorithm for Finding ... Morgan kaufmann, ... images Contouring

Two-step outlier detection to identify blobs

• Outline

Introduction

Related work

Blob detection

• Our approach

• The sketch

• Refine mesh

• Two-step detection

• Fast CCL

Hybrid parallel

Evaluations

Conclusion

BDAC-SC14 11 / 17

Apply exploratory data analysis to analyze the underlying

distribution of the local normalized density:

0 2 4 6 8 10 120

1

2

3

4

5

6

7x 10

4 Density distribution fitting using 50 bins

Normalized electron density (n_e/n_e0)

Num

ber

of p

oint

s in

eac

h bi

n

(a) Extreme Value Distribution

0 2 4 6 8 10 120

1

2

3

4

5

6

7x 10

4 Density distribution fitting using 50 bins

Normalized electron density (n_e/n_e0)

Num

ber

of p

oint

s in

eac

h bi

n

(b) Log Normal Distribution

[

N(ri, zi, t)− µ > α ∗ σ,∀(ri, zi) ∈ Γ,N(ri, zi, t)− µ2 > β ∗ σ2,∀(ri, zi) ∈ Γ2.

]

Page 15: High-Performance Outlier Detection Algorithm for Finding ...lfwu/Slides/BDAC14.pdf · High-Performance Outlier Detection Algorithm for Finding ... Morgan kaufmann, ... images Contouring

A fast connected component labeling algorithm

• Outline

Introduction

Related work

Blob detection

• Our approach

• The sketch

• Refine mesh

• Two-step detection

• Fast CCL

Hybrid parallel

Evaluations

Conclusion

BDAC-SC14 12 / 17

We apply an efficient connected component labeling algorithm

on a refined triangular mesh to find blob components:

• This is a two-pass approach and each triangle is scanned firstly

• Reduce unnecessary memory access if any vertex in a triangle

is found to be connected with others

• After the label array is filled full, we need flatten the union and

find tree

• Second pass is performed to correct labels and all blob

candidate components are found

Page 16: High-Performance Outlier Detection Algorithm for Finding ...lfwu/Slides/BDAC14.pdf · High-Performance Outlier Detection Algorithm for Finding ... Morgan kaufmann, ... images Contouring

Parallelization of blob detection approach

• Outline

Introduction

Related work

Blob detection

Hybrid parallel

• MPI/OpenMP

Evaluations

Conclusion

BDAC-SC14 13 / 17

A hybrid MPI/OpenMP parallelization on many-core processor

architecture:

• High-level: use MPI to allocate n processes to process each

time frame

• Low-level: use OpenMP to accelerate the computations with m

threads

Page 17: High-Performance Outlier Detection Algorithm for Finding ...lfwu/Slides/BDAC14.pdf · High-Performance Outlier Detection Algorithm for Finding ... Morgan kaufmann, ... images Contouring

Results: same time frame + four planes

• Outline

Introduction

Related work

Blob detection

Hybrid parallel

Evaluations

• Results I

• Results II

• Results III

Conclusion

BDAC-SC14 14 / 17

Page 18: High-Performance Outlier Detection Algorithm for Finding ...lfwu/Slides/BDAC14.pdf · High-Performance Outlier Detection Algorithm for Finding ... Morgan kaufmann, ... images Contouring

Results: same plane + four time frames

• Outline

Introduction

Related work

Blob detection

Hybrid parallel

Evaluations

• Results I

• Results II

• Results III

Conclusion

BDAC-SC14 15 / 17

Page 19: High-Performance Outlier Detection Algorithm for Finding ...lfwu/Slides/BDAC14.pdf · High-Performance Outlier Detection Algorithm for Finding ... Morgan kaufmann, ... images Contouring

Results: real-time blob detection

• Outline

Introduction

Related work

Blob detection

Hybrid parallel

Evaluations

• Results I

• Results II

• Results III

Conclusion

BDAC-SC14 16 / 17

100

101

102

103

104

10-3

10-2

10-1

100

101

102

103

Real Time Blob Detection

Number of processes

Tim

e (S

econ

d)

I/O Time - MPII/O Time - MPI/OpenMPDetection Time - MPIDetection Time - MPI/OpenMP

100

101

102

103

104

100

101

102

103

104

Real Time Blob Detection

Number of processes

Spe

edup

ove

r se

quat

ial

MPI SpeedupMPI/OpenMP Speedup

• Complete blob detection in around 2 ms with MPI/OpenMP using

4096 cores and in 3 ms with MPI using 1024 cores

• MPI/OpenMP is two times faster than MPI

• Linear time speedup in blob detection time and slightly more in

I/O time

Page 20: High-Performance Outlier Detection Algorithm for Finding ...lfwu/Slides/BDAC14.pdf · High-Performance Outlier Detection Algorithm for Finding ... Morgan kaufmann, ... images Contouring

Conclusion and future work

• Outline

Introduction

Related work

Blob detection

Hybrid parallel

Evaluations

Conclusion

• Conclusion

BDAC-SC14 17 / 17

We present for the first time a real time blob detection method

for finding blob-filaments in real fusion experiments or

numerical simulations.

• Key components:

◦ Two-step outlier detection with various criteria

◦ A fast connected component labeling method

◦ Hybrid MPI/OpenMP parallelization

• Future work:

◦ Test the detection algorithm to experimental measurement

data from operating fusion devices

◦ Develop a blob tracking algorithm