multi-view anomaly detection via robust probabilistic

23
Copyright©2016 NTT corp. All Rights Reserved. Multi-view Anomaly Detection via Robust Probabilistic Latent Variable Models Tomoharu Iwata NTT Communication Science Labs Makoto Yamada RIKEN, AIP NIPS2016

Upload: others

Post on 06-Jun-2022

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Multi-view Anomaly Detection via Robust Probabilistic

Copyright©2016 NTT corp. All Rights Reserved.

Multi-view Anomaly Detection via Robust Probabilistic Latent Variable Models

Tomoharu Iwata NTT Communication Science Labs Makoto Yamada RIKEN, AIP

NIPS2016

Page 2: Multi-view Anomaly Detection via Robust Probabilistic

2

Multi-view anomaly • Instances that have inconsistent views • Application

– information disparity management • find documents that contain different information

across multilingual Wikipedia documents – malicious insider detection – purchase behavior analysis

• find movies inconsistently purchased by users based on the genre (animation by purchased by grown-ups)

image annotation

text URLs

audio video

Japanese text English text

Multi-view data

Page 3: Multi-view Anomaly Detection via Robust Probabilistic

3

Single-view/multi-view anomaly

B M C A

F D G

S

I J

E

B M C D

F A

G

S

I J

E

H

H

View1 View2

• single-view anomaly is an instance that does not conform to expected behavior

• S is single-view anomaly, but not multi-view anomaly • M is not single-view anomaly, but multi-view

anomaly

Page 4: Multi-view Anomaly Detection via Robust Probabilistic

4

Existing method: HOAD (1) • HOrizontal Anomaly Detection (HOAD)

– A Spectral Framework for Detecting Inconsistency across Multi-source Object Relationships, Gao et. al. ICDM 2011

• Step1: soft clustering two views together with the constraint that an instance should be assigned to the same cluster

two-view combined similarity

graph

view1

view2

view1 view2 diagonal

diagonal

spectral clustering

Page 5: Multi-view Anomaly Detection via Robust Probabilistic

5

Existing method: HOAD (2)

• Step2: quantify the difference between the two clustering solutions

• weak points – anomalous instances also have the constraint to

be assigned to the same cluster – require hyper-parameter tuning (e.g. weight for

the constraint)

Page 6: Multi-view Anomaly Detection via Robust Probabilistic

6

Proposed model • Normal (non-anomalous) instance

– all views are generated from a single latent vector • Anomaly

– different views are generated from different latent vectors

B M C A

F D G

S

I J

E

B M C D

F A

G

S

I J

E

H

H

B M C D

F A G

S

I J E H M

Observed view 1 Observed view 2

Latent space

𝑾1 𝑾2

Page 7: Multi-view Anomaly Detection via Robust Probabilistic

7

Proposed model • Each instance has potentially a countably infinite number

of latent vectors • Each view of an instance is generated depending on a

view-specific projection matrix and a latent vector

A

Observed view 1 Observed view 2

Latent space A

A

A A

Observed view D

𝑾1 𝑾2 𝑾𝐷

Page 8: Multi-view Anomaly Detection via Robust Probabilistic

8

Proposed model • Each instance has potentially a countably infinite number

of latent vectors • Each view of an instance is generated depending on a

view-specific projection matrix and a latent vector

A

Observed view 1 Observed view 2

Latent space A

A

A A

Observed view D

𝑾1 𝑾2 𝑾𝐷

Page 9: Multi-view Anomaly Detection via Robust Probabilistic

9

Proposed model • Each instance has potentially a countably infinite number

of latent vectors • Each view of an instance is generated depending on a

view-specific projection matrix and a latent vector

A

Observed view 1 Observed view 2

Latent space A

A

A A

Observed view D

𝑾1 𝑾2 𝑾𝐷

A

Page 10: Multi-view Anomaly Detection via Robust Probabilistic

10

Proposed model • Each instance has potentially a countably infinite number

of latent vectors • Each view of an instance is generated depending on a

view-specific projection matrix and a latent vector

A

Observed view 1 Observed view 2

Latent space A

A

A A

Observed view D

𝑾1 𝑾2 𝑾𝐷

A

Page 11: Multi-view Anomaly Detection via Robust Probabilistic

11

Proposed model • Each instance has potentially a countably infinite number

of latent vectors • Each view of an instance is generated depending on a

view-specific projection matrix and a latent vector

Observed view 1 Observed view 2

Latent space A

A

A

Observed view D

𝑾1 𝑾2 𝑾𝐷

A A

A A

Page 12: Multi-view Anomaly Detection via Robust Probabilistic

12

Proposed model • Each instance has potentially a countably infinite number

of latent vectors • Each view of an instance is generated depending on a

view-specific projection matrix and a latent vector

Observed view 1 Observed view 2

Latent space A

A

A

Observed view D

𝑾1 𝑾2 𝑾𝐷

A A

A A

Page 13: Multi-view Anomaly Detection via Robust Probabilistic

13

Proposed model • Each instance has potentially a countably infinite number

of latent vectors • Each view of an instance is generated depending on a

view-specific projection matrix and a latent vector

Observed view 1 Observed view 2

Latent space A

A

A

Observed view D

𝑾1 𝑾2 𝑾𝐷

A A

A

A A

Page 14: Multi-view Anomaly Detection via Robust Probabilistic

14

Proposed probabilistic model

• By using Dirichlet processes prior for each instance, the number of latent vectors is automatically determined from the given data – if views are consistent, they are clustered; otherwise, they

need different latent vectors • By using view-dependent projection matrices, we can handle

different properties across different views

𝑝 𝒙𝑛𝑛 𝒛𝑛,𝑾𝑛 ,𝜽𝑛,𝛼 = �𝜃𝑛𝑛𝑁(𝒙𝑛𝑛|𝑾𝑛𝒛𝑛𝑛 ,𝛼−1𝑰)∞

𝑛=1

view d of instance n

j-th latent vectors of instance n

projection matrix for view d

mixture weight for j-th latent vector

precision

𝑣𝑛 =1𝐻� 𝐼(𝐽𝑛 ℎ > 1)𝐻

ℎ=1

#latent vectors of instance n at the h-th iteration after burn-in

indicator function

Anomaly score: probability that the instance uses more than one latent vector

Page 15: Multi-view Anomaly Detection via Robust Probabilistic

15

Proposed model • clustering views for each instance in the latent space

Observed view 1 Observed view 2

Latent space A1 A2

A4

Observed view 5

𝑾1 𝑾2 𝑾5

A

A

A

A5 A3

Page 16: Multi-view Anomaly Detection via Robust Probabilistic

16

• Proposed model – 𝑝 𝒙𝑛𝑛 𝒛𝑛,𝑾𝑛 ,𝜽𝑛,𝛼 = ∑ 𝜃𝑛𝑁(𝒙𝑛𝑛|𝑾𝑛𝒛𝑛𝑛 ,𝛼−1𝑰)∞

𝑛=1

• Probabilistic PCA – 𝑝 𝒙𝑛𝑛 𝒁,𝑾 = 𝑁(𝒙𝑛𝑛|𝑾𝒛𝑛𝑛 ,𝛼−1𝑰)

• Probabilistic CCA – 𝑝 𝒙𝑛𝑛 𝒁,𝑾 = 𝑁(𝒙𝑛𝑛|𝑾𝑛𝒛𝑛𝑛 ,𝚺)

• Infinite Gaussian mixture – 𝑝 𝒙𝑛𝑛 𝝁,𝜽 = ∑ 𝜃𝑛𝑁(𝒙𝑛𝑛|𝝁𝑛 ,𝛼−1𝑰)∞

𝑛=1

Relation with other latent variable models

Page 17: Multi-view Anomaly Detection via Robust Probabilistic

17

Generative process

Page 18: Multi-view Anomaly Detection via Robust Probabilistic

18

Inference based on stochastic EM • Analytically integrate out the latent vectors Z, mixture

weights Θ, precision 𝛼 • E-step: collapsed Gibbs sampling of latent vector

assignment s for each view of each instance ℓ = (𝑛,𝑑)

• M-step: maximum joint likelihood estimation of

projection matrices W

Page 19: Multi-view Anomaly Detection via Robust Probabilistic

19

Experiments • Data

– 11 data sets from LIBSVM data – generated multiple views by randomly splitting

the features – anomalies were added by swapping views of two

randomly selected instances • Comparing methods

– PCCA: probabilistic canonical correlation analysis – HOAD: horizontal anomaly detection – CC: consensus clustering based anomaly detection – OCSVM: one-class support vector machine

Page 20: Multi-view Anomaly Detection via Robust Probabilistic

20

Multi-view anomaly detection with different anomaly rate

The proposed model achieved the best with 8 of the 11 data sets

Page 21: Multi-view Anomaly Detection via Robust Probabilistic

21

Multi-view anomaly detection with different latent dimensionality

Page 22: Multi-view Anomaly Detection via Robust Probabilistic

22

MovieLens data analysis instance: movie, view1: user list who rated, view2: genre

‘The Full Monty’ and ‘Liar Liar’ were ‘Comedy’ genre. They are rated by not only users who likes ‘Comedy’, but also who likes ‘Romance’ and ‘Action-Thriller’. ‘The Professional’ was anomaly because it was rated by two different user groups, where a group prefers ‘Romance’ and the other prefers ‘Action’. Since ‘Star Trek’ series are typical Sci-Fi and liked by specific users, its anomaly score was low.

high anomaly score movies low anomaly score movies

Page 23: Multi-view Anomaly Detection via Robust Probabilistic

23

Conclusion • We proposed a generative model approach for multi-

view anomaly detection, which finds instances that have inconsistent views.

• In the experiments, we confirmed that the proposed model could perform much better than existing methods for detecting multi-view anomalies

• Future work – nonlinear projection using Gaussian processes or

deep neural nets