the mass volume curve, a performance metric for ......anomaly detection mass volume curve extreme...

32
Anomaly detection Mass Volume curve Extreme regions The Mass Volume curve, a performance metric for unsupervised anomaly detection Rare Events, Extremes and Machine Learning Workshop May 24th, 2018 Albert Thomas Huawei Technologies - T´ el´ ecom ParisTech - Airbus Group Innovations Joint work with Stephan Cl´ emen¸ con, Alexandre Gramfort, Vincent Feuillard and Anne Sabourin. 1 / 32

Upload: others

Post on 22-Jul-2020

10 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: The Mass Volume curve, a performance metric for ......Anomaly detection Mass Volume curve Extreme regions Problem One-Class SVM with Gaussian kernel k ˙ The user needs to choose the

Anomaly detection Mass Volume curve Extreme regions

The Mass Volume curve, a performance metric forunsupervised anomaly detection

Rare Events, Extremes and Machine Learning Workshop

May 24th, 2018

Albert Thomas

Huawei Technologies - Telecom ParisTech - Airbus Group Innovations

Joint work with Stephan Clemencon, Alexandre Gramfort, Vincent

Feuillard and Anne Sabourin.

1 / 32

Page 2: The Mass Volume curve, a performance metric for ......Anomaly detection Mass Volume curve Extreme regions Problem One-Class SVM with Gaussian kernel k ˙ The user needs to choose the

Anomaly detection Mass Volume curve Extreme regions

Outline

1 Unsupervised anomaly detection

2 The Mass Volume curve

3 Anomaly detection in extreme regions

2 / 32

Page 3: The Mass Volume curve, a performance metric for ......Anomaly detection Mass Volume curve Extreme regions Problem One-Class SVM with Gaussian kernel k ˙ The user needs to choose the

Anomaly detection Mass Volume curve Extreme regions

Unsupervised anomaly detection

1. Find anomalies in an unlabeled data set X1, . . . ,Xn

2. Anomalies are assumed to be rare events

3 / 32

Page 4: The Mass Volume curve, a performance metric for ......Anomaly detection Mass Volume curve Extreme regions Problem One-Class SVM with Gaussian kernel k ˙ The user needs to choose the

Anomaly detection Mass Volume curve Extreme regions

Unsupervised anomaly detection

X1, . . . ,Xn P Rd unlabeled data setiid P

density f w.r.t. Lebesgue measure λ

anomalies rare events

Estimate a density level set

4 / 32

Page 5: The Mass Volume curve, a performance metric for ......Anomaly detection Mass Volume curve Extreme regions Problem One-Class SVM with Gaussian kernel k ˙ The user needs to choose the

Anomaly detection Mass Volume curve Extreme regions

Minimum Volume set (Polonik, 1997)

A density level set is a Minimum Volume set, i.e. a solution of

minΩPBpRd q

λpΩq such that PpΩq ¥ α

6 4 2 0 2 4 6

6

4

2

0

2

4

6

5 / 32

Page 6: The Mass Volume curve, a performance metric for ......Anomaly detection Mass Volume curve Extreme regions Problem One-Class SVM with Gaussian kernel k ˙ The user needs to choose the

Anomaly detection Mass Volume curve Extreme regions

Minimum volume set density level set

1. A density level set is always a minimum volume set

2. If f has no flat parts, a minimum volume set is a density level set.

(Einmahl and Mason, 1992), (Polonik, 1997), (Nunez-Garcia et al., 2003)

6 / 32

Page 7: The Mass Volume curve, a performance metric for ......Anomaly detection Mass Volume curve Extreme regions Problem One-Class SVM with Gaussian kernel k ˙ The user needs to choose the

Anomaly detection Mass Volume curve Extreme regions

Unsupervised anomaly detection algorithms

Common approach for most algorithms

1 Learn a scoring function

s : x P Rd ÞÑ R

such that the smaller spxq the more abnormal is x .

2 Threshold s at an offset q such that pΩα tx , spxq ¥ qu is anestimation of the Minimum Volume set with mass α.

Ñ density estimation (Cadre et al., 2013), One-Class SVM/SupportVector Data Description (Scholkopf et al., 2001), (Vert and Vert, 2006),(Tax et al., 2004), k-NN (Sricharan & Hero, 2011)

Ñ Isolation Forest (Liu et al., 2008) and Local Outlier Factor (Breunig et

al., 2000)

7 / 32

Page 8: The Mass Volume curve, a performance metric for ......Anomaly detection Mass Volume curve Extreme regions Problem One-Class SVM with Gaussian kernel k ˙ The user needs to choose the

Anomaly detection Mass Volume curve Extreme regions

Ideal scoring functions

Ideal scoring functions s preserve the order induced by density f

spx1q spx2q ðñ f px1q f px2qi.e. strictly increasing transform of f .

0.00 0.25 0.50 0.75 1.000

1

2

3

4

5density ff (x− 0.05)f (x) + 2

s does not need to be close to f in the sense of a Lp norm

8 / 32

Page 9: The Mass Volume curve, a performance metric for ......Anomaly detection Mass Volume curve Extreme regions Problem One-Class SVM with Gaussian kernel k ˙ The user needs to choose the

Anomaly detection Mass Volume curve Extreme regions

Scoring function of the One-Class SVM

0.0

0.2Gaussian mixture density

f

0 5 100

20

One-Class SVM scoring functions

Asymptotically constant near the modes and proportional tothe density in the low density regions (Vert and Vert, 2006)

9 / 32

Page 10: The Mass Volume curve, a performance metric for ......Anomaly detection Mass Volume curve Extreme regions Problem One-Class SVM with Gaussian kernel k ˙ The user needs to choose the

Anomaly detection Mass Volume curve Extreme regions

Problem

One-Class SVM with Gaussian kernel kσ

The user needs to choose the bandwidth σ of the kernel

σ 0.5

10 5 0 5

5

0

5

Estimated Tru

e

True

Overfitting

σ 10

10 5 0 5

5

0

5

Estimated

Tru

e

True

Underfitting

How to automatically choose σ?

10 / 32

Page 11: The Mass Volume curve, a performance metric for ......Anomaly detection Mass Volume curve Extreme regions Problem One-Class SVM with Gaussian kernel k ˙ The user needs to choose the

Anomaly detection Mass Volume curve Extreme regions

Problem

Given a data set Sn pX1, . . . ,Xnq, a hyperparameter space Θ andan unsupervised anomaly detection algorithm

A : Sn Θ Ñ RRd

pSn, θq ÞÑ sθ

How to assess the performance of sθwithout a labeled data set?

(Anomaly Detection Workshop, Thomas, Clemencon, Feuillard, Gramfort,

ICML 2016)

11 / 32

Page 12: The Mass Volume curve, a performance metric for ......Anomaly detection Mass Volume curve Extreme regions Problem One-Class SVM with Gaussian kernel k ˙ The user needs to choose the

Anomaly detection Mass Volume curve Extreme regions

Mass Volume curve

X P, scoring function s : Rd ÝÑ R,

t-level set of s: tx , spxq ¥ tu

αsptq PpspX q ¥ tq mass of the t-level set

λsptq λptx , spxq ¥ tuq volume of t-the level set.

12 / 32

Page 13: The Mass Volume curve, a performance metric for ......Anomaly detection Mass Volume curve Extreme regions Problem One-Class SVM with Gaussian kernel k ˙ The user needs to choose the

Anomaly detection Mass Volume curve Extreme regions

Mass Volume curve

Mass Volume curve MVs of a scoring function s(Clemencon and Jakubowicz, 2013), (Clemencon and Thomas, 2017)

t P R ÞÑ pαsptq, λsptqq

4 3 2 1 0 1 2 3 4

x

0.00

0.05

0.10

0.15

0.20

0.25

0.30

0.35

0.40

Sco

re

fs

Scoring functions

0.0 0.2 0.4 0.6 0.8 1.0Mass

0

1

2

3

4

5

6

7

8

Volu

me

MVfMVs

Mass Volume curves

13 / 32

Page 14: The Mass Volume curve, a performance metric for ......Anomaly detection Mass Volume curve Extreme regions Problem One-Class SVM with Gaussian kernel k ˙ The user needs to choose the

Anomaly detection Mass Volume curve Extreme regions

Mass Volume curve

Easier to work with the following definition: MVs is defined as theplot of the function

MVs : α P p0, 1q ÞÑ λspα1s pαqq λptx , spxq ¥ α1

s pαquq

where α1s generalized inverse of αs .

Property

(Clemencon and Jakubowicz, 2013), (Clemencon and Thomas, 2017)

Assume that the underlying density f has no flat parts, then for allscoring functions s,

@α P p0, 1q, MVpαq def MVf pαq ¤ MVspαq

The lower is MVs the better is s

14 / 32

Page 15: The Mass Volume curve, a performance metric for ......Anomaly detection Mass Volume curve Extreme regions Problem One-Class SVM with Gaussian kernel k ˙ The user needs to choose the

Anomaly detection Mass Volume curve Extreme regions

Learning hyperparameters

Consider sθ ApSn, θq

Choose θ minimizing area under MVsθ

As MVsθ depends on P, use empirical MV curve estimated on atest set

yMVs : α P r0, 1q ÞÑ λsppα1s pαqq

where pαsptq 1n

°ni1 1tx ,spxq¥tupXi q.

15 / 32

Page 16: The Mass Volume curve, a performance metric for ......Anomaly detection Mass Volume curve Extreme regions Problem One-Class SVM with Gaussian kernel k ˙ The user needs to choose the

Anomaly detection Mass Volume curve Extreme regions

Algorithm

Choose θ: minimizing area under Mass Volume curve AMVsθ .

16 / 32

Page 17: The Mass Volume curve, a performance metric for ......Anomaly detection Mass Volume curve Extreme regions Problem One-Class SVM with Gaussian kernel k ˙ The user needs to choose the

Anomaly detection Mass Volume curve Extreme regions

Aggregation

For each random split we may obtain a different θ:

Ñ to reduce the variance of the estimator consider B randomsplits of the data set

for each random split b we get θ:b and sbθ:b

final scoring function

pS 1

B

B

b1

sbθ:b

17 / 32

Page 18: The Mass Volume curve, a performance metric for ......Anomaly detection Mass Volume curve Extreme regions Problem One-Class SVM with Gaussian kernel k ˙ The user needs to choose the

Anomaly detection Mass Volume curve Extreme regions

Toy example

Minimum Volume set estimation with One-Class SVM and B 50(α 0.95)

10 5 0 5

5

0

5

Est

imate

d

Estimated Tru

e

True

18 / 32

Page 19: The Mass Volume curve, a performance metric for ......Anomaly detection Mass Volume curve Extreme regions Problem One-Class SVM with Gaussian kernel k ˙ The user needs to choose the

Anomaly detection Mass Volume curve Extreme regions

Experiments

Given an anomaly detection algorithm A, compare

our approach ÝÑ stuned

a priori fixed hyperparameters ÝÑ sfixed

Performance criterion: relative gain

GApstuned, sfixedq AMVI psfixedq AMVI pstunedqAMVI psfixedq

where AMVI is area under MV curve over interval I r0.9, 0.99s,computed on left out data.

If GApstuned, sfixedq ¡ 0 then stuned better than sfixed

19 / 32

Page 20: The Mass Volume curve, a performance metric for ......Anomaly detection Mass Volume curve Extreme regions Problem One-Class SVM with Gaussian kernel k ˙ The user needs to choose the

Anomaly detection Mass Volume curve Extreme regions

Results

For stuned we consider 50 random splits (80/20)

GM, d=2 GM, d=4 Banana HTTP

20

0

20

40

60

80

100

Rela

tive g

ain

(%

)

aKLPEKLPEOCSVMKSiForest

n 100

GM, d=2 GM, d=4 Banana HTTP

20

0

20

40

60

80

100

Rela

tive g

ain

(%

)

aKLPEKLPEOCSVMKSiForest

n 500

KLPE (Sricharan & Hero, 2011) — aKLPE: Average KLPE (Qian & Saligrama,

2012) — OCSVM: One-Class SVM (Scholkopf et al., 2001) — iForest:

Isolation Forest (Liu et al., 2008) — KS: Kernel Smoothing

20 / 32

Page 21: The Mass Volume curve, a performance metric for ......Anomaly detection Mass Volume curve Extreme regions Problem One-Class SVM with Gaussian kernel k ˙ The user needs to choose the

Anomaly detection Mass Volume curve Extreme regions

Consistency of yMVs

We used yMVs as an estimate of MVs where

@α P r0, 1q, yMVspαq λtx , spxq ¥ pα1s pαqu

with pα1s the generalized inverse of pαs .

2 questions

Consistency of yMVs as n Ñ8?

How to build confidence regions?

21 / 32

Page 22: The Mass Volume curve, a performance metric for ......Anomaly detection Mass Volume curve Extreme regions Problem One-Class SVM with Gaussian kernel k ˙ The user needs to choose the

Anomaly detection Mass Volume curve Extreme regions

Consistency

Let s be a scoring function and ε P p0, 1s .

Assumptions on the random variable spX q and its distribution Fs

λs is C2

Theorem (Clemencon and Thomas, 2017)

(i) Consistency. With probability one,

supαPr0,1εs

|yMVspαq MVspαq| ÝÑnÑ8

0

(ii) Functional CLT. There exists a sequence of Brownian bridgestBnpαquαPr0,1s such that, almost-surely, uniformly over r0, 1 εs, asn Ñ8,

?nyMVspαq MVspαq

λ1spα1

s pαqqfspα1

s pαqq Bnpαq Opn12 log nq

22 / 32

Page 23: The Mass Volume curve, a performance metric for ......Anomaly detection Mass Volume curve Extreme regions Problem One-Class SVM with Gaussian kernel k ˙ The user needs to choose the

Anomaly detection Mass Volume curve Extreme regions

Confidence bands using smoothed bootstrap

0.05 0.15 0.25 0.35 0.45 0.55 0.65 0.75 0.85 0.95Mass

0

10

20

30

40

Volu

me

MVf

90% confidence band

Up to log n factors (Clemencon and Thomas, 2017),

Functional CLT: rate in Opn12q but requires knowledge of fs

Naive (non-smoothed) bootstrap: rate in Opn14qSmoothed bootstrap: rate in Opn25q

23 / 32

Page 24: The Mass Volume curve, a performance metric for ......Anomaly detection Mass Volume curve Extreme regions Problem One-Class SVM with Gaussian kernel k ˙ The user needs to choose the

Anomaly detection Mass Volume curve Extreme regions

Anomaly detection in extreme regions

Goal of multivariate Extreme Value Theory: model the tail of thedistribution F of a multivariate random variable

X pX p1q, . . . ,X pdqqMotivation for unsupervised anomaly detection

Anomalies are likely to be located in extreme regions, i.e. regions farfrom the mean ErX sLack of data in these regions makes it difficult to distinguishbetween large normal instances and anomalies

Relying on multivariate Extreme Value Theory we suggest analgorithm to detect anomalies in extreme regions.

24 / 32

Page 25: The Mass Volume curve, a performance metric for ......Anomaly detection Mass Volume curve Extreme regions Problem One-Class SVM with Gaussian kernel k ˙ The user needs to choose the

Anomaly detection Mass Volume curve Extreme regions

Multivariate Extreme Value Theory

Common approach

Standardization to unit Pareto: T pX q V P Rd where

V pjq 1

1 FjpX pjqq , @j P t1, . . . , du,

Fj , 1 ¤ j ¤ d , being the margins. Note that X and V share thesame dependence structure/copula.

25 / 32

Page 26: The Mass Volume curve, a performance metric for ......Anomaly detection Mass Volume curve Extreme regions Problem One-Class SVM with Gaussian kernel k ˙ The user needs to choose the

Anomaly detection Mass Volume curve Extreme regions

Multivariate Extreme Value Theory

prpvq, ϕpvqq pv8, vv8q polar coordinates

Sd1 positive orthant of the unit hypercube

Theorem (Resnick, 1987)

With mild assumptions on the distribution F , there exists a finite(angular) measure Φ on Sd1 such that for all Ω Sd1,

ΦtpΩq def t PprpV q ¡ t, ϕpV q P Ωq ÝÑtÑ8

ΦpΩq

26 / 32

Page 27: The Mass Volume curve, a performance metric for ......Anomaly detection Mass Volume curve Extreme regions Problem One-Class SVM with Gaussian kernel k ˙ The user needs to choose the

Anomaly detection Mass Volume curve Extreme regions

Main idea

Anomalies in extreme regions: observations that deviate fromthe dependence structure of the tail

Dependence Independence

To find the most likely directions of the extreme observations weestimate a Minimum Volume set of the angular measure Φ

27 / 32

Page 28: The Mass Volume curve, a performance metric for ......Anomaly detection Mass Volume curve Extreme regions Problem One-Class SVM with Gaussian kernel k ˙ The user needs to choose the

Anomaly detection Mass Volume curve Extreme regions

Minimum volume set estimation on the sphere

Solve empirical optimization problem (Scott and Nowak, 2006)

minΩPG

λdpΩq subject to pΦn,kpΩq ¥ α ψkpδq

where pΦn,k is estimated from t PprpV q ¡ t, ϕpV q P Ωq witht nk :

pΦn,kpΩq 1

k

n

i1

1trpV i q¥nk, ϕpV i qPΩu

n/k

n/k

28 / 32

Page 29: The Mass Volume curve, a performance metric for ......Anomaly detection Mass Volume curve Extreme regions Problem One-Class SVM with Gaussian kernel k ˙ The user needs to choose the

Anomaly detection Mass Volume curve Extreme regions

TheoremΩα being the true Minimum Volume set,

RpΩq pλdpΩq λdpΩαqq

α ΦnkpΩq

Theorem (Thomas, Clemencon, Gramfort, Sabourin, 2017)

Assume that

the margins Fj are known

the class G is of finite VC dimension

common assumptions for existence and uniqueness of MV sets arefulfilled by Φnk

Then there exists a constant C ¡ 0 such that

ErRppΩαqs ¤

infΩPGα

λdpΩq λdpΩαq C

clog k

k

where Gα tΩ P G,ΦnkpΩq ¥ αu.29 / 32

Page 30: The Mass Volume curve, a performance metric for ......Anomaly detection Mass Volume curve Extreme regions Problem One-Class SVM with Gaussian kernel k ˙ The user needs to choose the

Anomaly detection Mass Volume curve Extreme regions

Numerical experiments

Global scoring function

sprpvq, ϕpvqq 1rpvq2 sϕpϕpvqq.

Data set OCSVM Isolation Forest Score s

shuttle 0.981 0.963 0.987SF 0.478 0.251 0.660

http 0.997 0.662 0.964ann 0.372 0.610 0.518

forestcover 0.540 0.516 0.646

ROC-AUC are computed on test sets made of normal andabnormal instances and restricted to the extreme region (k ?

n).

30 / 32

Page 31: The Mass Volume curve, a performance metric for ......Anomaly detection Mass Volume curve Extreme regions Problem One-Class SVM with Gaussian kernel k ˙ The user needs to choose the

Anomaly detection Mass Volume curve Extreme regions

References

Mass Volume curves and anomaly ranking. S. Clemencon, A.Thomas. Submitted. 2017

Learning hyperparameters for unsupervised anomaly detection. A.Thomas, S. Clemencon, V. Feuillard, A. Gramfort. Anomalydetection Workshop, ICML 2016.

Anomaly detection in extreme regions via empirical MV sets on thesphere. A. Thomas, S. Clemencon, A. Gramfort, A. Sabourin.AISTATS 2017.

Code for hyperparameter tuning available on githubhttps://github.com/albertcthomas/anomaly_tuning

31 / 32

Page 32: The Mass Volume curve, a performance metric for ......Anomaly detection Mass Volume curve Extreme regions Problem One-Class SVM with Gaussian kernel k ˙ The user needs to choose the

Anomaly detection Mass Volume curve Extreme regions

Thank you

32 / 32