introduction to anomaly detection - uwyo.educlan/teach/rampe18_anomaly.pdf · - preliminary: data...
TRANSCRIPT
![Page 1: Introduction to Anomaly Detection - uwyo.educlan/teach/rampe18_anomaly.pdf · - preliminary: data representation and visualization-six common anomaly detection approaches Evaluation](https://reader033.vdocuments.net/reader033/viewer/2022042909/5f3b9da76c029054e648a0f6/html5/thumbnails/1.jpg)
Introduction to Anomaly Detection
Chao Lan
Presented at the summer camp of RAMPE II: Cybersecurity and Internet of Things, University of Wyoming, 2018.
![Page 2: Introduction to Anomaly Detection - uwyo.educlan/teach/rampe18_anomaly.pdf · - preliminary: data representation and visualization-six common anomaly detection approaches Evaluation](https://reader033.vdocuments.net/reader033/viewer/2022042909/5f3b9da76c029054e648a0f6/html5/thumbnails/2.jpg)
OutlineBackground
Learning-based Detection Approaches
Evaluation Metrics
Challenges
![Page 3: Introduction to Anomaly Detection - uwyo.educlan/teach/rampe18_anomaly.pdf · - preliminary: data representation and visualization-six common anomaly detection approaches Evaluation](https://reader033.vdocuments.net/reader033/viewer/2022042909/5f3b9da76c029054e648a0f6/html5/thumbnails/3.jpg)
OutlineBackground
- what is anomaly detection and what are their applications?- why do we need computer to help anomaly detection?- why do we want machine learning to help design detection rule?
Learning-based Detection Approaches
Evaluation Metrics
Challenges
![Page 4: Introduction to Anomaly Detection - uwyo.educlan/teach/rampe18_anomaly.pdf · - preliminary: data representation and visualization-six common anomaly detection approaches Evaluation](https://reader033.vdocuments.net/reader033/viewer/2022042909/5f3b9da76c029054e648a0f6/html5/thumbnails/4.jpg)
What is anomaly detection?
“Anomaly detection refers to the problem of finding patterns
in data that do not conform to expected behavior.”
Chandola et al. Anomaly detection: A survey. ACM Computing Surveys, 2009.
![Page 5: Introduction to Anomaly Detection - uwyo.educlan/teach/rampe18_anomaly.pdf · - preliminary: data representation and visualization-six common anomaly detection approaches Evaluation](https://reader033.vdocuments.net/reader033/viewer/2022042909/5f3b9da76c029054e648a0f6/html5/thumbnails/5.jpg)
http://www.svcl.ucsd.edu/projects/anomaly/
![Page 6: Introduction to Anomaly Detection - uwyo.educlan/teach/rampe18_anomaly.pdf · - preliminary: data representation and visualization-six common anomaly detection approaches Evaluation](https://reader033.vdocuments.net/reader033/viewer/2022042909/5f3b9da76c029054e648a0f6/html5/thumbnails/6.jpg)
Network Anomaly Detection – Do We Know What to Detect? 2013
![Page 7: Introduction to Anomaly Detection - uwyo.educlan/teach/rampe18_anomaly.pdf · - preliminary: data representation and visualization-six common anomaly detection approaches Evaluation](https://reader033.vdocuments.net/reader033/viewer/2022042909/5f3b9da76c029054e648a0f6/html5/thumbnails/7.jpg)
Fraud Prevention with Neo4j: A 5-Minute Overview, 2017
![Page 8: Introduction to Anomaly Detection - uwyo.educlan/teach/rampe18_anomaly.pdf · - preliminary: data representation and visualization-six common anomaly detection approaches Evaluation](https://reader033.vdocuments.net/reader033/viewer/2022042909/5f3b9da76c029054e648a0f6/html5/thumbnails/8.jpg)
![Page 9: Introduction to Anomaly Detection - uwyo.educlan/teach/rampe18_anomaly.pdf · - preliminary: data representation and visualization-six common anomaly detection approaches Evaluation](https://reader033.vdocuments.net/reader033/viewer/2022042909/5f3b9da76c029054e648a0f6/html5/thumbnails/9.jpg)
![Page 10: Introduction to Anomaly Detection - uwyo.educlan/teach/rampe18_anomaly.pdf · - preliminary: data representation and visualization-six common anomaly detection approaches Evaluation](https://reader033.vdocuments.net/reader033/viewer/2022042909/5f3b9da76c029054e648a0f6/html5/thumbnails/10.jpg)
![Page 11: Introduction to Anomaly Detection - uwyo.educlan/teach/rampe18_anomaly.pdf · - preliminary: data representation and visualization-six common anomaly detection approaches Evaluation](https://reader033.vdocuments.net/reader033/viewer/2022042909/5f3b9da76c029054e648a0f6/html5/thumbnails/11.jpg)
![Page 12: Introduction to Anomaly Detection - uwyo.educlan/teach/rampe18_anomaly.pdf · - preliminary: data representation and visualization-six common anomaly detection approaches Evaluation](https://reader033.vdocuments.net/reader033/viewer/2022042909/5f3b9da76c029054e648a0f6/html5/thumbnails/12.jpg)
![Page 13: Introduction to Anomaly Detection - uwyo.educlan/teach/rampe18_anomaly.pdf · - preliminary: data representation and visualization-six common anomaly detection approaches Evaluation](https://reader033.vdocuments.net/reader033/viewer/2022042909/5f3b9da76c029054e648a0f6/html5/thumbnails/13.jpg)
![Page 14: Introduction to Anomaly Detection - uwyo.educlan/teach/rampe18_anomaly.pdf · - preliminary: data representation and visualization-six common anomaly detection approaches Evaluation](https://reader033.vdocuments.net/reader033/viewer/2022042909/5f3b9da76c029054e648a0f6/html5/thumbnails/14.jpg)
![Page 15: Introduction to Anomaly Detection - uwyo.educlan/teach/rampe18_anomaly.pdf · - preliminary: data representation and visualization-six common anomaly detection approaches Evaluation](https://reader033.vdocuments.net/reader033/viewer/2022042909/5f3b9da76c029054e648a0f6/html5/thumbnails/15.jpg)
Fujitsu Develops Traffic-Video-Analysis Technology Based on Image Recognition and Machine Learning, 2016.
![Page 16: Introduction to Anomaly Detection - uwyo.educlan/teach/rampe18_anomaly.pdf · - preliminary: data representation and visualization-six common anomaly detection approaches Evaluation](https://reader033.vdocuments.net/reader033/viewer/2022042909/5f3b9da76c029054e648a0f6/html5/thumbnails/16.jpg)
Early detection of at-risk students using machine learning based on LMS log data. 2017.
![Page 17: Introduction to Anomaly Detection - uwyo.educlan/teach/rampe18_anomaly.pdf · - preliminary: data representation and visualization-six common anomaly detection approaches Evaluation](https://reader033.vdocuments.net/reader033/viewer/2022042909/5f3b9da76c029054e648a0f6/html5/thumbnails/17.jpg)
![Page 18: Introduction to Anomaly Detection - uwyo.educlan/teach/rampe18_anomaly.pdf · - preliminary: data representation and visualization-six common anomaly detection approaches Evaluation](https://reader033.vdocuments.net/reader033/viewer/2022042909/5f3b9da76c029054e648a0f6/html5/thumbnails/18.jpg)
Exercise: how to teach computer to detect spams?
![Page 19: Introduction to Anomaly Detection - uwyo.educlan/teach/rampe18_anomaly.pdf · - preliminary: data representation and visualization-six common anomaly detection approaches Evaluation](https://reader033.vdocuments.net/reader033/viewer/2022042909/5f3b9da76c029054e648a0f6/html5/thumbnails/19.jpg)
An Example Spam Email on Google Lottery
![Page 20: Introduction to Anomaly Detection - uwyo.educlan/teach/rampe18_anomaly.pdf · - preliminary: data representation and visualization-six common anomaly detection approaches Evaluation](https://reader033.vdocuments.net/reader033/viewer/2022042909/5f3b9da76c029054e648a0f6/html5/thumbnails/20.jpg)
Let me design & program some “rules” in computer!
![Page 21: Introduction to Anomaly Detection - uwyo.educlan/teach/rampe18_anomaly.pdf · - preliminary: data representation and visualization-six common anomaly detection approaches Evaluation](https://reader033.vdocuments.net/reader033/viewer/2022042909/5f3b9da76c029054e648a0f6/html5/thumbnails/21.jpg)
Rule 1: Email with “lottery” is a spam.
![Page 22: Introduction to Anomaly Detection - uwyo.educlan/teach/rampe18_anomaly.pdf · - preliminary: data representation and visualization-six common anomaly detection approaches Evaluation](https://reader033.vdocuments.net/reader033/viewer/2022042909/5f3b9da76c029054e648a0f6/html5/thumbnails/22.jpg)
What about this warning email?
![Page 23: Introduction to Anomaly Detection - uwyo.educlan/teach/rampe18_anomaly.pdf · - preliminary: data representation and visualization-six common anomaly detection approaches Evaluation](https://reader033.vdocuments.net/reader033/viewer/2022042909/5f3b9da76c029054e648a0f6/html5/thumbnails/23.jpg)
Rule 2: Email containing “million” is a spam.
![Page 24: Introduction to Anomaly Detection - uwyo.educlan/teach/rampe18_anomaly.pdf · - preliminary: data representation and visualization-six common anomaly detection approaches Evaluation](https://reader033.vdocuments.net/reader033/viewer/2022042909/5f3b9da76c029054e648a0f6/html5/thumbnails/24.jpg)
What about this UW email?
![Page 25: Introduction to Anomaly Detection - uwyo.educlan/teach/rampe18_anomaly.pdf · - preliminary: data representation and visualization-six common anomaly detection approaches Evaluation](https://reader033.vdocuments.net/reader033/viewer/2022042909/5f3b9da76c029054e648a0f6/html5/thumbnails/25.jpg)
![Page 26: Introduction to Anomaly Detection - uwyo.educlan/teach/rampe18_anomaly.pdf · - preliminary: data representation and visualization-six common anomaly detection approaches Evaluation](https://reader033.vdocuments.net/reader033/viewer/2022042909/5f3b9da76c029054e648a0f6/html5/thumbnails/26.jpg)
![Page 27: Introduction to Anomaly Detection - uwyo.educlan/teach/rampe18_anomaly.pdf · - preliminary: data representation and visualization-six common anomaly detection approaches Evaluation](https://reader033.vdocuments.net/reader033/viewer/2022042909/5f3b9da76c029054e648a0f6/html5/thumbnails/27.jpg)
ML Solution: learn detection rules from example emails.
spam
normal
![Page 28: Introduction to Anomaly Detection - uwyo.educlan/teach/rampe18_anomaly.pdf · - preliminary: data representation and visualization-six common anomaly detection approaches Evaluation](https://reader033.vdocuments.net/reader033/viewer/2022042909/5f3b9da76c029054e648a0f6/html5/thumbnails/28.jpg)
QuizQ1: what are the applications of anomaly detection?
Q2: why do we need computers to help detect anomalies?
Q3: what’s wrong with handcrafted detection rules?
![Page 29: Introduction to Anomaly Detection - uwyo.educlan/teach/rampe18_anomaly.pdf · - preliminary: data representation and visualization-six common anomaly detection approaches Evaluation](https://reader033.vdocuments.net/reader033/viewer/2022042909/5f3b9da76c029054e648a0f6/html5/thumbnails/29.jpg)
QuizQ1: what are the applications of anomaly detection?
A1: surveillance, cyber-security, fraud transaction, health-care, education, etc
Q2: why do we need computers to help detect anomalies?
A2: massive amount of data makes manual detection inefficient (or, impossible)
Q3: what’s wrong with handcrafted detection rules?
A3: hard to design (need domain knowledge) and generalize
![Page 30: Introduction to Anomaly Detection - uwyo.educlan/teach/rampe18_anomaly.pdf · - preliminary: data representation and visualization-six common anomaly detection approaches Evaluation](https://reader033.vdocuments.net/reader033/viewer/2022042909/5f3b9da76c029054e648a0f6/html5/thumbnails/30.jpg)
OutlineBackground
Learning-based Detection Approaches - preliminary: data representation and visualization- six common anomaly detection approaches
Evaluation Metrics
Challenges
![Page 31: Introduction to Anomaly Detection - uwyo.educlan/teach/rampe18_anomaly.pdf · - preliminary: data representation and visualization-six common anomaly detection approaches Evaluation](https://reader033.vdocuments.net/reader033/viewer/2022042909/5f3b9da76c029054e648a0f6/html5/thumbnails/31.jpg)
Preliminary 1: Data Representation An example email is often represented by a vector (feature vector).
x =
google lotterycatemailtransportpandamillion ..
=
1101001..
Above example vector is called “bag-of-words” feature representation of a document.
![Page 32: Introduction to Anomaly Detection - uwyo.educlan/teach/rampe18_anomaly.pdf · - preliminary: data representation and visualization-six common anomaly detection approaches Evaluation](https://reader033.vdocuments.net/reader033/viewer/2022042909/5f3b9da76c029054e648a0f6/html5/thumbnails/32.jpg)
Concepts: Feature, Label, InstanceEach element in the vector is a feature/attribute.
x =
google lotterycatemailtransportpandamillion ..
=
1101001..
![Page 33: Introduction to Anomaly Detection - uwyo.educlan/teach/rampe18_anomaly.pdf · - preliminary: data representation and visualization-six common anomaly detection approaches Evaluation](https://reader033.vdocuments.net/reader033/viewer/2022042909/5f3b9da76c029054e648a0f6/html5/thumbnails/33.jpg)
The target variable we want to detect is label. (different tasks have different labels)
spam
normal
Concepts: Feature, Label, Instance
![Page 34: Introduction to Anomaly Detection - uwyo.educlan/teach/rampe18_anomaly.pdf · - preliminary: data representation and visualization-six common anomaly detection approaches Evaluation](https://reader033.vdocuments.net/reader033/viewer/2022042909/5f3b9da76c029054e648a0f6/html5/thumbnails/34.jpg)
In summary, an example email (or, an instance) is a pair of feature vector & label.
x1 =
1101001..
, spam x2 =
1011010..
, ham
This is a most common representation of an example. There are, of course, more complicated representations.
Concepts: Feature, Label, Instance
![Page 35: Introduction to Anomaly Detection - uwyo.educlan/teach/rampe18_anomaly.pdf · - preliminary: data representation and visualization-six common anomaly detection approaches Evaluation](https://reader033.vdocuments.net/reader033/viewer/2022042909/5f3b9da76c029054e648a0f6/html5/thumbnails/35.jpg)
Other Examples of Feature Vector Representation Image data represented as a vector.
.
.
.
![Page 36: Introduction to Anomaly Detection - uwyo.educlan/teach/rampe18_anomaly.pdf · - preliminary: data representation and visualization-six common anomaly detection approaches Evaluation](https://reader033.vdocuments.net/reader033/viewer/2022042909/5f3b9da76c029054e648a0f6/html5/thumbnails/36.jpg)
Other Examples of Feature Vector Representation Student data represented as a vector.
# Steal
# Lie/Cheat
# Behavior Pro
# Peer Rej
.
.
.
=
0
1
2
1
.
.
.
![Page 37: Introduction to Anomaly Detection - uwyo.educlan/teach/rampe18_anomaly.pdf · - preliminary: data representation and visualization-six common anomaly detection approaches Evaluation](https://reader033.vdocuments.net/reader033/viewer/2022042909/5f3b9da76c029054e648a0f6/html5/thumbnails/37.jpg)
We will repeatedly see example & label notations.
![Page 38: Introduction to Anomaly Detection - uwyo.educlan/teach/rampe18_anomaly.pdf · - preliminary: data representation and visualization-six common anomaly detection approaches Evaluation](https://reader033.vdocuments.net/reader033/viewer/2022042909/5f3b9da76c029054e648a0f6/html5/thumbnails/38.jpg)
Preliminary 2: Data Visualization An example is a vector in a high dimensional space (feature space).
For easier interpretation, we often visualize examples in a 2D space.
x =
google lotterycatemailtransportpandamillion ..
=
1101001..
feature 1
feature 2
![Page 39: Introduction to Anomaly Detection - uwyo.educlan/teach/rampe18_anomaly.pdf · - preliminary: data representation and visualization-six common anomaly detection approaches Evaluation](https://reader033.vdocuments.net/reader033/viewer/2022042909/5f3b9da76c029054e648a0f6/html5/thumbnails/39.jpg)
Two Common Strategies to Get 2D Space 1. Select two features from the pool (feature selection)
2. Project all features onto two new features (feature transformation)
x = =
feature 1
feature 2
google lotterycatemailtransportpandamillion ..
1101001..
![Page 40: Introduction to Anomaly Detection - uwyo.educlan/teach/rampe18_anomaly.pdf · - preliminary: data representation and visualization-six common anomaly detection approaches Evaluation](https://reader033.vdocuments.net/reader033/viewer/2022042909/5f3b9da76c029054e648a0f6/html5/thumbnails/40.jpg)
1101
Feature Projection We can project all features on to a new feature using a projective vector w.
Projection on to the new feature is obtained by inner product between w and x.
wT * x = 0.3, -1.2, 0.8, 0.23 * = 0.3 - 1.2 + 0 + 0.23 = -0.67 new feature
![Page 41: Introduction to Anomaly Detection - uwyo.educlan/teach/rampe18_anomaly.pdf · - preliminary: data representation and visualization-six common anomaly detection approaches Evaluation](https://reader033.vdocuments.net/reader033/viewer/2022042909/5f3b9da76c029054e648a0f6/html5/thumbnails/41.jpg)
Feature Projection Two get two new features, we need to projective vectors w1 and w2.
feature 1 = w1T * x
feature 2 = w2T * x
![Page 42: Introduction to Anomaly Detection - uwyo.educlan/teach/rampe18_anomaly.pdf · - preliminary: data representation and visualization-six common anomaly detection approaches Evaluation](https://reader033.vdocuments.net/reader033/viewer/2022042909/5f3b9da76c029054e648a0f6/html5/thumbnails/42.jpg)
Get Projective Vectors using PCA Principal Component Analysis (PCA) is commonly used to get projective vectors.
https://qiita.com/bmj0114/items/db9145a707cb6ed13201
w2 w1
![Page 43: Introduction to Anomaly Detection - uwyo.educlan/teach/rampe18_anomaly.pdf · - preliminary: data representation and visualization-six common anomaly detection approaches Evaluation](https://reader033.vdocuments.net/reader033/viewer/2022042909/5f3b9da76c029054e648a0f6/html5/thumbnails/43.jpg)
We will repeatedly see data distribution in 2D feature space (by PCA).
![Page 44: Introduction to Anomaly Detection - uwyo.educlan/teach/rampe18_anomaly.pdf · - preliminary: data representation and visualization-six common anomaly detection approaches Evaluation](https://reader033.vdocuments.net/reader033/viewer/2022042909/5f3b9da76c029054e648a0f6/html5/thumbnails/44.jpg)
QuizRecap: to design a spam email detection model, we can design label as
- y = 1 for spam, y = 0 for ham
Q1: to design a fraud transaction detection model, how to design label?
Q2: to design an at-risk student detection model, how to design label?
![Page 45: Introduction to Anomaly Detection - uwyo.educlan/teach/rampe18_anomaly.pdf · - preliminary: data representation and visualization-six common anomaly detection approaches Evaluation](https://reader033.vdocuments.net/reader033/viewer/2022042909/5f3b9da76c029054e648a0f6/html5/thumbnails/45.jpg)
QuizRecap: to design a spam email detection model, we can design label as
- y = 1 for spam, y = 0 for ham
Q1: to design a fraud transaction detection model, how to design label?
A1: y = 1 for fraud, y = 0 for normal transaction
Q2: to design an at-risk student detection model, how to design label?
A2: y = 1 for at-risk student, y = 0 for normal student
![Page 46: Introduction to Anomaly Detection - uwyo.educlan/teach/rampe18_anomaly.pdf · - preliminary: data representation and visualization-six common anomaly detection approaches Evaluation](https://reader033.vdocuments.net/reader033/viewer/2022042909/5f3b9da76c029054e648a0f6/html5/thumbnails/46.jpg)
OutlineBackground
Learning-based Detection Approaches - preliminary: data representation and visualization- six common anomaly detection approaches
Evaluation Metrics
Open Challenges
![Page 47: Introduction to Anomaly Detection - uwyo.educlan/teach/rampe18_anomaly.pdf · - preliminary: data representation and visualization-six common anomaly detection approaches Evaluation](https://reader033.vdocuments.net/reader033/viewer/2022042909/5f3b9da76c029054e648a0f6/html5/thumbnails/47.jpg)
Learning-based Anomaly Detection Approaches Classification-based
Clustering-based
Support Vector Data Descriptor (SVDD)
Statistics-based
Neighborhood-based
Spectral-based
![Page 48: Introduction to Anomaly Detection - uwyo.educlan/teach/rampe18_anomaly.pdf · - preliminary: data representation and visualization-six common anomaly detection approaches Evaluation](https://reader033.vdocuments.net/reader033/viewer/2022042909/5f3b9da76c029054e648a0f6/html5/thumbnails/48.jpg)
1. Classification-based Approach Learn a detection model to classify emails into spam and ham (i.e. normal email).
model f
spam
ham
![Page 49: Introduction to Anomaly Detection - uwyo.educlan/teach/rampe18_anomaly.pdf · - preliminary: data representation and visualization-six common anomaly detection approaches Evaluation](https://reader033.vdocuments.net/reader033/viewer/2022042909/5f3b9da76c029054e648a0f6/html5/thumbnails/49.jpg)
model f
How to learn model f ?
spam
ham
Step 1. construct a model f with some unknown parameters.
Step 2. estimate the parameters from data
![Page 50: Introduction to Anomaly Detection - uwyo.educlan/teach/rampe18_anomaly.pdf · - preliminary: data representation and visualization-six common anomaly detection approaches Evaluation](https://reader033.vdocuments.net/reader033/viewer/2022042909/5f3b9da76c029054e648a0f6/html5/thumbnails/50.jpg)
Example: learn a linear regression model Step 1. Construct a linear regression model
- x·1 and x·2 are two features of example x (e.g. words “google” and “cat”)
- w0, w1, w2 are unknown parameters (w0 is called bias)
![Page 51: Introduction to Anomaly Detection - uwyo.educlan/teach/rampe18_anomaly.pdf · - preliminary: data representation and visualization-six common anomaly detection approaches Evaluation](https://reader033.vdocuments.net/reader033/viewer/2022042909/5f3b9da76c029054e648a0f6/html5/thumbnails/51.jpg)
Example: learn a linear regression model Step 2. Estimate w0, w1, w2 from examples x1, x2, x3, …, xn by solving
- xi is the ith example (e.g. the ith email)
- yi is the label of xi, and yi= 0 (ham) or 1 (spam)
![Page 52: Introduction to Anomaly Detection - uwyo.educlan/teach/rampe18_anomaly.pdf · - preliminary: data representation and visualization-six common anomaly detection approaches Evaluation](https://reader033.vdocuments.net/reader033/viewer/2022042909/5f3b9da76c029054e648a0f6/html5/thumbnails/52.jpg)
Example: learn a linear regression model The solution is where
![Page 53: Introduction to Anomaly Detection - uwyo.educlan/teach/rampe18_anomaly.pdf · - preliminary: data representation and visualization-six common anomaly detection approaches Evaluation](https://reader033.vdocuments.net/reader033/viewer/2022042909/5f3b9da76c029054e648a0f6/html5/thumbnails/53.jpg)
A new email x = [x.1,x.2] is first input to the model
The result is then thresholded (by a proper value such as 0.5)
Example: apply model to classify email
Many models can directly output 0 and 1, so we do not need to threshold their outputs.
![Page 54: Introduction to Anomaly Detection - uwyo.educlan/teach/rampe18_anomaly.pdf · - preliminary: data representation and visualization-six common anomaly detection approaches Evaluation](https://reader033.vdocuments.net/reader033/viewer/2022042909/5f3b9da76c029054e648a0f6/html5/thumbnails/54.jpg)
Recall: detection rule is
y=1 for spam, y=0 for ham.
QuizIf model has
- w0=0.5, w1=−0.1, w2=0.1
Are the following emails spam or ham?
- x1 = [x·1,x·2]T = [1, 0]T
- x2 = [x·1,x·2]T = [0, 1]T
- x3 = [x·1,x·2]T = [1, 1]T
- x4 = [x·1,x·2]T = [0, 0]T
![Page 55: Introduction to Anomaly Detection - uwyo.educlan/teach/rampe18_anomaly.pdf · - preliminary: data representation and visualization-six common anomaly detection approaches Evaluation](https://reader033.vdocuments.net/reader033/viewer/2022042909/5f3b9da76c029054e648a0f6/html5/thumbnails/55.jpg)
QuizIf model has
- w0=0.5, w1=−0.1, w2=0.1
Are the following emails spam or ham?
- x1 = [x·1,x·2]T = [1, 0]T is ham, because f(x) = 0.5 - 0.1*1 + 0.1*0 = 0.4 < 0.5
- x2 = [x·1,x·2]T = [0, 1]T
- x3 = [x·1,x·2]T = [1, 1]T
- x4 = [x·1,x·2]T = [0, 0]T
Recall: detection rule is
y=1 for spam, y=0 for ham.
![Page 56: Introduction to Anomaly Detection - uwyo.educlan/teach/rampe18_anomaly.pdf · - preliminary: data representation and visualization-six common anomaly detection approaches Evaluation](https://reader033.vdocuments.net/reader033/viewer/2022042909/5f3b9da76c029054e648a0f6/html5/thumbnails/56.jpg)
QuizIf model has
- w0=0.5, w1=−0.1, w2=0.1
Are the following emails spam or ham?
- x1 = [x·1,x·2]T = [1, 0]T is ham, because f(x) = 0.5 - 0.1*1 + 0.1*0 = 0.4 < 0.5
- x2 = [x·1,x·2]T = [0, 1]T is spam, because f(x) = 0.5 - 0.1*0 + 0.1*1 = 0.6 > 0.5
- x3 = [x·1,x·2]T = [1, 1]T
- x4 = [x·1,x·2]T = [0, 0]T
Recall: detection rule is
y=1 for spam, y=0 for ham.
![Page 57: Introduction to Anomaly Detection - uwyo.educlan/teach/rampe18_anomaly.pdf · - preliminary: data representation and visualization-six common anomaly detection approaches Evaluation](https://reader033.vdocuments.net/reader033/viewer/2022042909/5f3b9da76c029054e648a0f6/html5/thumbnails/57.jpg)
QuizIf model has
- w0=0.5, w1=−0.1, w2=0.1
Are the following emails spam or ham?
- x1 = [x·1,x·2]T = [1, 0]T is ham, because f(x) = 0.5 - 0.1*1 + 0.1*0 = 0.4 < 0.5
- x2 = [x·1,x·2]T = [0, 1]T is spam, because f(x) = 0.5 - 0.1*0 + 0.1*1 = 0.6 > 0.5
- x3 = [x·1,x·2]T = [1, 1]T is ham, because f(x) = 0.5 - 0.1*0 + 0.1*0 = 0.5 ≤ 0.5
- x4 = [x·1,x·2]T = [0, 0]T
Recall: detection rule is
y=1 for spam, y=0 for ham.
![Page 58: Introduction to Anomaly Detection - uwyo.educlan/teach/rampe18_anomaly.pdf · - preliminary: data representation and visualization-six common anomaly detection approaches Evaluation](https://reader033.vdocuments.net/reader033/viewer/2022042909/5f3b9da76c029054e648a0f6/html5/thumbnails/58.jpg)
QuizIf model has
- w0=0.5, w1=−0.1, w2=0.1
Are the following emails spam or ham?
- x1 = [x·1,x·2]T = [1, 0]T is ham, because f(x) = 0.5 - 0.1*1 + 0.1*0 = 0.4 < 0.5
- x2 = [x·1,x·2]T = [0, 1]T is spam, because f(x) = 0.5 - 0.1*0 + 0.1*1 = 0.6 > 0.5
- x3 = [x·1,x·2]T = [1, 1]T is ham, because f(x) = 0.5 - 0.1*0 + 0.1*0 = 0.5 ≤ 0.5
- x4 = [x·1,x·2]T = [0, 0]T is ham, because f(x) = 0.5 - 0.1*1 + 0.1*1 = 0.5 ≤ 0.5
Recall: detection rule is
y=1 for spam, y=0 for ham.
![Page 59: Introduction to Anomaly Detection - uwyo.educlan/teach/rampe18_anomaly.pdf · - preliminary: data representation and visualization-six common anomaly detection approaches Evaluation](https://reader033.vdocuments.net/reader033/viewer/2022042909/5f3b9da76c029054e648a0f6/html5/thumbnails/59.jpg)
Learning-based Anomaly Detection Approaches Classification-based
Clustering-based
Support Vector Data Descriptor (SVDD)
Statistics-based
Neighborhood-based
Spectral-based
![Page 60: Introduction to Anomaly Detection - uwyo.educlan/teach/rampe18_anomaly.pdf · - preliminary: data representation and visualization-six common anomaly detection approaches Evaluation](https://reader033.vdocuments.net/reader033/viewer/2022042909/5f3b9da76c029054e648a0f6/html5/thumbnails/60.jpg)
2. Clustering-based Approach Group examples into clusters. Assume those far from their cluster centers are
more likely to be anomalie.
![Page 61: Introduction to Anomaly Detection - uwyo.educlan/teach/rampe18_anomaly.pdf · - preliminary: data representation and visualization-six common anomaly detection approaches Evaluation](https://reader033.vdocuments.net/reader033/viewer/2022042909/5f3b9da76c029054e648a0f6/html5/thumbnails/61.jpg)
Detection based on Anomalous Score Algorithm output anomalous score of an example, which indicates how likely the example is an anomaly. We can then threshold the scores to get final detection.
a.s. = 0.8
a.s. = 0.4
a.s. = 0.1
![Page 62: Introduction to Anomaly Detection - uwyo.educlan/teach/rampe18_anomaly.pdf · - preliminary: data representation and visualization-six common anomaly detection approaches Evaluation](https://reader033.vdocuments.net/reader033/viewer/2022042909/5f3b9da76c029054e648a0f6/html5/thumbnails/62.jpg)
How to cluster examples? K-means is a most common clustering algorithm among others.
- choose a number of clusters k (e.g. k=3)
- initialize k cluster centers (randomly)
- repeat until convergence
- assign every example to its nearest cluster (nearest to cluster center)
- update cluster center to means of its member examples
![Page 63: Introduction to Anomaly Detection - uwyo.educlan/teach/rampe18_anomaly.pdf · - preliminary: data representation and visualization-six common anomaly detection approaches Evaluation](https://reader033.vdocuments.net/reader033/viewer/2022042909/5f3b9da76c029054e648a0f6/html5/thumbnails/63.jpg)
A Demo of K-means Clustering Algorithm
![Page 64: Introduction to Anomaly Detection - uwyo.educlan/teach/rampe18_anomaly.pdf · - preliminary: data representation and visualization-six common anomaly detection approaches Evaluation](https://reader033.vdocuments.net/reader033/viewer/2022042909/5f3b9da76c029054e648a0f6/html5/thumbnails/64.jpg)
Quiz (apply K-means Clustering with k=2) Which example has the highest anomalous score? Which has the lowest?
x1
x2
x3
![Page 65: Introduction to Anomaly Detection - uwyo.educlan/teach/rampe18_anomaly.pdf · - preliminary: data representation and visualization-six common anomaly detection approaches Evaluation](https://reader033.vdocuments.net/reader033/viewer/2022042909/5f3b9da76c029054e648a0f6/html5/thumbnails/65.jpg)
Quiz (apply K-means Clustering with k=2) A: first, the k-means clustering result is roughly as follows
x1
x2
x3
![Page 66: Introduction to Anomaly Detection - uwyo.educlan/teach/rampe18_anomaly.pdf · - preliminary: data representation and visualization-six common anomaly detection approaches Evaluation](https://reader033.vdocuments.net/reader033/viewer/2022042909/5f3b9da76c029054e648a0f6/html5/thumbnails/66.jpg)
Quiz (apply K-means Clustering with k=2) A: based on clustering result, x1 is the farthest from its center so it has the highest anomalous score. And x3 is the closest to its center so it has the lowest score.
x1
x2
x3
![Page 67: Introduction to Anomaly Detection - uwyo.educlan/teach/rampe18_anomaly.pdf · - preliminary: data representation and visualization-six common anomaly detection approaches Evaluation](https://reader033.vdocuments.net/reader033/viewer/2022042909/5f3b9da76c029054e648a0f6/html5/thumbnails/67.jpg)
Learning-based Anomaly Detection Approaches Classification-based
Clustering-based
Support Vector Data Descriptor (SVDD)
Statistics-based
Neighborhood-based
Spectral-based
![Page 68: Introduction to Anomaly Detection - uwyo.educlan/teach/rampe18_anomaly.pdf · - preliminary: data representation and visualization-six common anomaly detection approaches Evaluation](https://reader033.vdocuments.net/reader033/viewer/2022042909/5f3b9da76c029054e648a0f6/html5/thumbnails/68.jpg)
3. Support Vector Data Descriptor (SVDD)Learn a (smallest) normal region that encompasses all normal examples. Assume whatever falls outside the region is anomaly.
![Page 69: Introduction to Anomaly Detection - uwyo.educlan/teach/rampe18_anomaly.pdf · - preliminary: data representation and visualization-six common anomaly detection approaches Evaluation](https://reader033.vdocuments.net/reader033/viewer/2022042909/5f3b9da76c029054e648a0f6/html5/thumbnails/69.jpg)
minimize
s.t.
Mathematical Model of One-Class SVMFirst, assume a sphere with radius R encompasses all normal examples.
- distance from normal example to normal region center is less than R
![Page 70: Introduction to Anomaly Detection - uwyo.educlan/teach/rampe18_anomaly.pdf · - preliminary: data representation and visualization-six common anomaly detection approaches Evaluation](https://reader033.vdocuments.net/reader033/viewer/2022042909/5f3b9da76c029054e648a0f6/html5/thumbnails/70.jpg)
min
Mathematical Model of One-Class SVMThen, find the center and smallest radius of such a sphere.
- find sphere center and minimize sphere radius
s.t.
![Page 71: Introduction to Anomaly Detection - uwyo.educlan/teach/rampe18_anomaly.pdf · - preliminary: data representation and visualization-six common anomaly detection approaches Evaluation](https://reader033.vdocuments.net/reader033/viewer/2022042909/5f3b9da76c029054e648a0f6/html5/thumbnails/71.jpg)
QuizIf is normal example. Which examples will be detected as anomalies by SVDD?
AB
C
![Page 72: Introduction to Anomaly Detection - uwyo.educlan/teach/rampe18_anomaly.pdf · - preliminary: data representation and visualization-six common anomaly detection approaches Evaluation](https://reader033.vdocuments.net/reader033/viewer/2022042909/5f3b9da76c029054e648a0f6/html5/thumbnails/72.jpg)
QuizA: a normal region roughly looks like below. B & C are outside so are anomalies.
AB
C
![Page 73: Introduction to Anomaly Detection - uwyo.educlan/teach/rampe18_anomaly.pdf · - preliminary: data representation and visualization-six common anomaly detection approaches Evaluation](https://reader033.vdocuments.net/reader033/viewer/2022042909/5f3b9da76c029054e648a0f6/html5/thumbnails/73.jpg)
Learning-based Anomaly Detection Approaches Classification-based
Clustering-based
Support Vector Data Descriptor (SVDD)
Statistics-based
Neighborhood-based
Spectral-based
![Page 74: Introduction to Anomaly Detection - uwyo.educlan/teach/rampe18_anomaly.pdf · - preliminary: data representation and visualization-six common anomaly detection approaches Evaluation](https://reader033.vdocuments.net/reader033/viewer/2022042909/5f3b9da76c029054e648a0f6/html5/thumbnails/74.jpg)
4. Statistics-based Approach Estimate a distribution over examples. Assume those drawn from the distribution with lower probability are more likely to be anomalies.
![Page 75: Introduction to Anomaly Detection - uwyo.educlan/teach/rampe18_anomaly.pdf · - preliminary: data representation and visualization-six common anomaly detection approaches Evaluation](https://reader033.vdocuments.net/reader033/viewer/2022042909/5f3b9da76c029054e648a0f6/html5/thumbnails/75.jpg)
Exercise
Student # Attendance
John 3
Nancy 2
Sam 2
Richard 1
Lily 3
p(x=3) =
p(x=2) =
p(x=1) =
What are the probabilities a student attend class for 1, 2, 3 times?
![Page 76: Introduction to Anomaly Detection - uwyo.educlan/teach/rampe18_anomaly.pdf · - preliminary: data representation and visualization-six common anomaly detection approaches Evaluation](https://reader033.vdocuments.net/reader033/viewer/2022042909/5f3b9da76c029054e648a0f6/html5/thumbnails/76.jpg)
Exercise
Student # Attendance
John 3
Nancy 2
Sam 2
Richard 1
Lily 3
p(x=3) = 2 / 5 = 0.4
p(x=2) =
p(x=1) =
We can estimate probabilities by counting frequencies.
![Page 77: Introduction to Anomaly Detection - uwyo.educlan/teach/rampe18_anomaly.pdf · - preliminary: data representation and visualization-six common anomaly detection approaches Evaluation](https://reader033.vdocuments.net/reader033/viewer/2022042909/5f3b9da76c029054e648a0f6/html5/thumbnails/77.jpg)
Exercise
Student # Attendance
John 3
Nancy 2
Sam 2
Richard 1
Lily 3
p(x=3) = 2 / 5 = 0.4
p(x=2) = 2 / 5 = 0.4
p(x=1) =
We can estimate probabilities by counting frequencies.
![Page 78: Introduction to Anomaly Detection - uwyo.educlan/teach/rampe18_anomaly.pdf · - preliminary: data representation and visualization-six common anomaly detection approaches Evaluation](https://reader033.vdocuments.net/reader033/viewer/2022042909/5f3b9da76c029054e648a0f6/html5/thumbnails/78.jpg)
Exercise
Student # Attendance
John 3
Nancy 2
Sam 2
Richard 1
Lily 3
p(x=3) = 2 / 5 = 0.4
p(x=2) = 2 / 5 = 0.4
p(x=1) = 1 / 5 = 0.2
We can estimate probabilities by counting frequencies.
![Page 79: Introduction to Anomaly Detection - uwyo.educlan/teach/rampe18_anomaly.pdf · - preliminary: data representation and visualization-six common anomaly detection approaches Evaluation](https://reader033.vdocuments.net/reader033/viewer/2022042909/5f3b9da76c029054e648a0f6/html5/thumbnails/79.jpg)
Exercise
Student # Attendance
John 3
Nancy 2
Sam 2
Richard 1
Lily 3
p(x=3) = 2 / 5 = 0.4
p(x=2) = 2 / 5 = 0.4
p(x=1) = 1 / 5 = 0.2
Richard is more likely to be an abnormal (at-risk) student because he attends class 1 time, and p(x=1)=0.1 is way smaller than the other probabilities.
![Page 80: Introduction to Anomaly Detection - uwyo.educlan/teach/rampe18_anomaly.pdf · - preliminary: data representation and visualization-six common anomaly detection approaches Evaluation](https://reader033.vdocuments.net/reader033/viewer/2022042909/5f3b9da76c029054e648a0f6/html5/thumbnails/80.jpg)
Quiz Which student is most likely at-risk according to statistics-based approach?
- let x be # peer rejection
Student John Lily Sam Nancy Green Susan Peter Rose Jack Lucy
x 0 1 0 2 1 0 3 1 2 0
![Page 81: Introduction to Anomaly Detection - uwyo.educlan/teach/rampe18_anomaly.pdf · - preliminary: data representation and visualization-six common anomaly detection approaches Evaluation](https://reader033.vdocuments.net/reader033/viewer/2022042909/5f3b9da76c029054e648a0f6/html5/thumbnails/81.jpg)
Quiz Which student is most likely at-risk according to statistics-based approach?
- let x be # peer rejection
Student John Lily Sam Nancy Green Susan Peter Rose Jack Lucy
x 0 1 0 2 1 0 3 1 2 0
p(x=0) = 4/10 = 0.4
p(x=1) = 3/10 = 0.3
p(x=2) = 2/10 = 0.2
p(x=3) = 1/10 = 0.1, lowest probability, Peter has x=1 so he is most likely at-risk
![Page 82: Introduction to Anomaly Detection - uwyo.educlan/teach/rampe18_anomaly.pdf · - preliminary: data representation and visualization-six common anomaly detection approaches Evaluation](https://reader033.vdocuments.net/reader033/viewer/2022042909/5f3b9da76c029054e648a0f6/html5/thumbnails/82.jpg)
Learning-based Anomaly Detection Approaches Classification-based
Clustering-based
Support Vector Data Descriptor (SVDD)
Statistics-based
Neighborhood-based
Spectral-based
![Page 83: Introduction to Anomaly Detection - uwyo.educlan/teach/rampe18_anomaly.pdf · - preliminary: data representation and visualization-six common anomaly detection approaches Evaluation](https://reader033.vdocuments.net/reader033/viewer/2022042909/5f3b9da76c029054e648a0f6/html5/thumbnails/83.jpg)
5. Neighborhood-based Approach Assume examples far from their neighbors are more likely to be anomalies.
![Page 84: Introduction to Anomaly Detection - uwyo.educlan/teach/rampe18_anomaly.pdf · - preliminary: data representation and visualization-six common anomaly detection approaches Evaluation](https://reader033.vdocuments.net/reader033/viewer/2022042909/5f3b9da76c029054e648a0f6/html5/thumbnails/84.jpg)
Example: 2-nearest neighbor based approachOnly consider two nearest neighbors of examples.
A
B C
D1
1
1
2
2
![Page 85: Introduction to Anomaly Detection - uwyo.educlan/teach/rampe18_anomaly.pdf · - preliminary: data representation and visualization-six common anomaly detection approaches Evaluation](https://reader033.vdocuments.net/reader033/viewer/2022042909/5f3b9da76c029054e648a0f6/html5/thumbnails/85.jpg)
Example: 2-nearest neighbor based approachTotal distance from A to its two nearest neighbors (B, C) are 1 + 1 = 2
A
B C
D1
1
1
2
2
Example Distance
A 2
B
C
D
![Page 86: Introduction to Anomaly Detection - uwyo.educlan/teach/rampe18_anomaly.pdf · - preliminary: data representation and visualization-six common anomaly detection approaches Evaluation](https://reader033.vdocuments.net/reader033/viewer/2022042909/5f3b9da76c029054e648a0f6/html5/thumbnails/86.jpg)
Example: 2-nearest neighbor based approachTotal distance from B to its two nearest neighbors (A, C) are 1 + 1 = 2
A
B C
D1
1
1
2
2
Example Distance
A 2
B 2
C
D
![Page 87: Introduction to Anomaly Detection - uwyo.educlan/teach/rampe18_anomaly.pdf · - preliminary: data representation and visualization-six common anomaly detection approaches Evaluation](https://reader033.vdocuments.net/reader033/viewer/2022042909/5f3b9da76c029054e648a0f6/html5/thumbnails/87.jpg)
Example: 2-nearest neighbor based approachTotal distance from C to its two nearest neighbors (A, B) are 1 + 1 = 2
A
B C
D1
1
1
2
2
Example Distance
A 2
B 2
C 2
D
![Page 88: Introduction to Anomaly Detection - uwyo.educlan/teach/rampe18_anomaly.pdf · - preliminary: data representation and visualization-six common anomaly detection approaches Evaluation](https://reader033.vdocuments.net/reader033/viewer/2022042909/5f3b9da76c029054e648a0f6/html5/thumbnails/88.jpg)
Example: 2-nearest neighbor based approachTotal distance from D to its two nearest neighbors (A, C) are 2 + 2 = 4
A
B C
D1
1
1
2
2
Example Distance
A 2
B 2
C 2
D 4
![Page 89: Introduction to Anomaly Detection - uwyo.educlan/teach/rampe18_anomaly.pdf · - preliminary: data representation and visualization-six common anomaly detection approaches Evaluation](https://reader033.vdocuments.net/reader033/viewer/2022042909/5f3b9da76c029054e648a0f6/html5/thumbnails/89.jpg)
Example: 2-nearest neighbor based approachD is more likely to be an anomaly because it has the largest distance to neighbors.
A
B C
D1
1
1
2
2
Example Distance
A 2
B 2
C 2
D 4
![Page 90: Introduction to Anomaly Detection - uwyo.educlan/teach/rampe18_anomaly.pdf · - preliminary: data representation and visualization-six common anomaly detection approaches Evaluation](https://reader033.vdocuments.net/reader033/viewer/2022042909/5f3b9da76c029054e648a0f6/html5/thumbnails/90.jpg)
Quiz Which example is most likely an anomaly based on 2-nearest neighbor approach?
A
C D
B
1
0.5
1
1
1.5
Example Distance
A
B
C
D
![Page 91: Introduction to Anomaly Detection - uwyo.educlan/teach/rampe18_anomaly.pdf · - preliminary: data representation and visualization-six common anomaly detection approaches Evaluation](https://reader033.vdocuments.net/reader033/viewer/2022042909/5f3b9da76c029054e648a0f6/html5/thumbnails/91.jpg)
Quiz A: B is most likely an anomaly.
A
C D
B
1
0.5
1
1
1.5
Example Distance
A 2
B 2.5
C 1.5
D 1.5
![Page 92: Introduction to Anomaly Detection - uwyo.educlan/teach/rampe18_anomaly.pdf · - preliminary: data representation and visualization-six common anomaly detection approaches Evaluation](https://reader033.vdocuments.net/reader033/viewer/2022042909/5f3b9da76c029054e648a0f6/html5/thumbnails/92.jpg)
Learning-based Anomaly Detection Approaches Classification-based
Clustering-based
Support Vector Data Descriptor (SVDD)
Statistics-based
Neighborhood-based
Spectral-based
![Page 93: Introduction to Anomaly Detection - uwyo.educlan/teach/rampe18_anomaly.pdf · - preliminary: data representation and visualization-six common anomaly detection approaches Evaluation](https://reader033.vdocuments.net/reader033/viewer/2022042909/5f3b9da76c029054e648a0f6/html5/thumbnails/93.jpg)
6. Spectral-based Approach Assume normal examples lie in a low dimensional feature space so can be well-reconstructed from that space. Anomalies are not.
![Page 94: Introduction to Anomaly Detection - uwyo.educlan/teach/rampe18_anomaly.pdf · - preliminary: data representation and visualization-six common anomaly detection approaches Evaluation](https://reader033.vdocuments.net/reader033/viewer/2022042909/5f3b9da76c029054e648a0f6/html5/thumbnails/94.jpg)
3.20.2
Example Project original feature vector into 2D space and reconstruct it.
0.91.10.10.9
Projection can be done by taking inner product between the feature vector with a projective vector.
1101
![Page 95: Introduction to Anomaly Detection - uwyo.educlan/teach/rampe18_anomaly.pdf · - preliminary: data representation and visualization-six common anomaly detection approaches Evaluation](https://reader033.vdocuments.net/reader033/viewer/2022042909/5f3b9da76c029054e648a0f6/html5/thumbnails/95.jpg)
0.1-0.1-0.10.1
Example Reconstruction error can be used as an anomalous score.
1101
0.91.10.10.9
- = error = 0.12 + (-0.1)2 + (-0.1)2 + 0.12 = 0.04
![Page 96: Introduction to Anomaly Detection - uwyo.educlan/teach/rampe18_anomaly.pdf · - preliminary: data representation and visualization-six common anomaly detection approaches Evaluation](https://reader033.vdocuments.net/reader033/viewer/2022042909/5f3b9da76c029054e648a0f6/html5/thumbnails/96.jpg)
Find Low-Dimensional Space using PCA Principal Component Analysis (PCA) is commonly used to get projective vectors.
https://qiita.com/bmj0114/items/db9145a707cb6ed13201
w2 w1
![Page 97: Introduction to Anomaly Detection - uwyo.educlan/teach/rampe18_anomaly.pdf · - preliminary: data representation and visualization-six common anomaly detection approaches Evaluation](https://reader033.vdocuments.net/reader033/viewer/2022042909/5f3b9da76c029054e648a0f6/html5/thumbnails/97.jpg)
Example Result of PCA-based Approach Abnormal network traffic flows have higher reconstruction errors.
![Page 98: Introduction to Anomaly Detection - uwyo.educlan/teach/rampe18_anomaly.pdf · - preliminary: data representation and visualization-six common anomaly detection approaches Evaluation](https://reader033.vdocuments.net/reader033/viewer/2022042909/5f3b9da76c029054e648a0f6/html5/thumbnails/98.jpg)
OutlineBackground
Learning-based Detection Approaches
Evaluation Metrics - detection error - f1-score and AUC score
Challenges
![Page 99: Introduction to Anomaly Detection - uwyo.educlan/teach/rampe18_anomaly.pdf · - preliminary: data representation and visualization-six common anomaly detection approaches Evaluation](https://reader033.vdocuments.net/reader033/viewer/2022042909/5f3b9da76c029054e648a0f6/html5/thumbnails/99.jpg)
Detection Error Detection error of a model is the fraction of its mis-detected examples
- e.g. mis-detect a normal example as anomaly
- e.g. mis-detect an anomaly as normal
Example: if there are 100 testing examples, and 10 of them are mis-detected, the detection error is 10/100 = 0.1.
![Page 100: Introduction to Anomaly Detection - uwyo.educlan/teach/rampe18_anomaly.pdf · - preliminary: data representation and visualization-six common anomaly detection approaches Evaluation](https://reader033.vdocuments.net/reader033/viewer/2022042909/5f3b9da76c029054e648a0f6/html5/thumbnails/100.jpg)
![Page 101: Introduction to Anomaly Detection - uwyo.educlan/teach/rampe18_anomaly.pdf · - preliminary: data representation and visualization-six common anomaly detection approaches Evaluation](https://reader033.vdocuments.net/reader033/viewer/2022042909/5f3b9da76c029054e648a0f6/html5/thumbnails/101.jpg)
10 spam emails
990 ham emails
![Page 102: Introduction to Anomaly Detection - uwyo.educlan/teach/rampe18_anomaly.pdf · - preliminary: data representation and visualization-six common anomaly detection approaches Evaluation](https://reader033.vdocuments.net/reader033/viewer/2022042909/5f3b9da76c029054e648a0f6/html5/thumbnails/102.jpg)
10 spam emails
990 ham emails
What is the detection error of this model?
normalspam
![Page 103: Introduction to Anomaly Detection - uwyo.educlan/teach/rampe18_anomaly.pdf · - preliminary: data representation and visualization-six common anomaly detection approaches Evaluation](https://reader033.vdocuments.net/reader033/viewer/2022042909/5f3b9da76c029054e648a0f6/html5/thumbnails/103.jpg)
![Page 104: Introduction to Anomaly Detection - uwyo.educlan/teach/rampe18_anomaly.pdf · - preliminary: data representation and visualization-six common anomaly detection approaches Evaluation](https://reader033.vdocuments.net/reader033/viewer/2022042909/5f3b9da76c029054e648a0f6/html5/thumbnails/104.jpg)
Confusion Matrix
True Positive (TP) False Positive (FP)
False Negative (FN) True Negative (TN)
actual positive (spam) actual negative (ham)
predicted negative
predicted positive
![Page 105: Introduction to Anomaly Detection - uwyo.educlan/teach/rampe18_anomaly.pdf · - preliminary: data representation and visualization-six common anomaly detection approaches Evaluation](https://reader033.vdocuments.net/reader033/viewer/2022042909/5f3b9da76c029054e648a0f6/html5/thumbnails/105.jpg)
Precision: how many predicted positive are truly positive
Recall: how many actual positive data are predicted positive
F1-Score: harmonic mean of precision and recall
Precision, Recall, F1-Score
![Page 106: Introduction to Anomaly Detection - uwyo.educlan/teach/rampe18_anomaly.pdf · - preliminary: data representation and visualization-six common anomaly detection approaches Evaluation](https://reader033.vdocuments.net/reader033/viewer/2022042909/5f3b9da76c029054e648a0f6/html5/thumbnails/106.jpg)
TP = ? FP = ?
FN = ? TN = ?
actual pos (spam)
actual neg (ham)
predicted neg (ham)
predicted pos (spam)
10 spam emails
990 ham emails
Exercise What is the confusion matrix of the detection model?
normal spam
![Page 107: Introduction to Anomaly Detection - uwyo.educlan/teach/rampe18_anomaly.pdf · - preliminary: data representation and visualization-six common anomaly detection approaches Evaluation](https://reader033.vdocuments.net/reader033/viewer/2022042909/5f3b9da76c029054e648a0f6/html5/thumbnails/107.jpg)
TP = 0 FP = 0
FN = 10 TN = 990
actual pos (spam)
actual neg (ham)
predicted neg (ham)
predicted pos (spam)
10 spam emails
990 ham emails
Exercise What is the confusion matrix of the detection model?
normal spam
![Page 108: Introduction to Anomaly Detection - uwyo.educlan/teach/rampe18_anomaly.pdf · - preliminary: data representation and visualization-six common anomaly detection approaches Evaluation](https://reader033.vdocuments.net/reader033/viewer/2022042909/5f3b9da76c029054e648a0f6/html5/thumbnails/108.jpg)
TP = 0 FP = 0
FN = 10 TN = 990
actual pos (spam)
actual neg (ham)
predicted neg (ham)
predicted pos (spam)
Exercise What are the precision, recall and f1-score?
Precision =
Recall =
F1-Score =
![Page 109: Introduction to Anomaly Detection - uwyo.educlan/teach/rampe18_anomaly.pdf · - preliminary: data representation and visualization-six common anomaly detection approaches Evaluation](https://reader033.vdocuments.net/reader033/viewer/2022042909/5f3b9da76c029054e648a0f6/html5/thumbnails/109.jpg)
TP = 0 FP = 0
FN = 10 TN = 990
actual pos (spam)
actual neg (ham)
predicted neg (ham)
predicted pos (spam)
Exercise What are the precision, recall and f1-score?
Precision = = 0 / 0
Recall = = 0/ (0+10)
F1-Score = = ?
![Page 110: Introduction to Anomaly Detection - uwyo.educlan/teach/rampe18_anomaly.pdf · - preliminary: data representation and visualization-six common anomaly detection approaches Evaluation](https://reader033.vdocuments.net/reader033/viewer/2022042909/5f3b9da76c029054e648a0f6/html5/thumbnails/110.jpg)
Detection by Thresholding Anomalous Score Many anomaly detection models output anomalous scores, and detection results are obtained by thresholding these scores.
Example A. Score Threshold 0.5Detection Result
1 = anomaly 0 = normal
A 0.8 0.8 > 1 1
B 0.3 0.3 < 0.5 0
C 0.6 0.6 > 0.5 1
D 0.2 0.2 < 0.5 0
TP FP
FN TNF1 Score
![Page 111: Introduction to Anomaly Detection - uwyo.educlan/teach/rampe18_anomaly.pdf · - preliminary: data representation and visualization-six common anomaly detection approaches Evaluation](https://reader033.vdocuments.net/reader033/viewer/2022042909/5f3b9da76c029054e648a0f6/html5/thumbnails/111.jpg)
Exercise What are detection results based on the following thresholds?
Example A. Score Detection Result (Threshold 0.5)
Detection Result (Threshold 0.7)
Detection Result (Threshold 0.25)
A 0.8 1
B 0.3 0
C 0.6 1
D 0.2 0
![Page 112: Introduction to Anomaly Detection - uwyo.educlan/teach/rampe18_anomaly.pdf · - preliminary: data representation and visualization-six common anomaly detection approaches Evaluation](https://reader033.vdocuments.net/reader033/viewer/2022042909/5f3b9da76c029054e648a0f6/html5/thumbnails/112.jpg)
ExerciseDifferent thresholds can give different detection results, thus different TP & FP.
Example A. Score Detection Result (Threshold 0.5)
Detection Result (Threshold 0.7)
Detection Result (Threshold 0.25)
A 0.8 1 1 1
B 0.3 0 0 1
C 0.6 1 0 1
D 0.2 0 0 0
![Page 113: Introduction to Anomaly Detection - uwyo.educlan/teach/rampe18_anomaly.pdf · - preliminary: data representation and visualization-six common anomaly detection approaches Evaluation](https://reader033.vdocuments.net/reader033/viewer/2022042909/5f3b9da76c029054e648a0f6/html5/thumbnails/113.jpg)
ROC CurveROC curve of a model is its performance under different thresholds.
Each point is result of one threshold.
![Page 114: Introduction to Anomaly Detection - uwyo.educlan/teach/rampe18_anomaly.pdf · - preliminary: data representation and visualization-six common anomaly detection approaches Evaluation](https://reader033.vdocuments.net/reader033/viewer/2022042909/5f3b9da76c029054e648a0f6/html5/thumbnails/114.jpg)
Area Under Curve (AUC) ScoreAUC score is the area under ROC curve. Good model has higher AUC score.
![Page 115: Introduction to Anomaly Detection - uwyo.educlan/teach/rampe18_anomaly.pdf · - preliminary: data representation and visualization-six common anomaly detection approaches Evaluation](https://reader033.vdocuments.net/reader033/viewer/2022042909/5f3b9da76c029054e648a0f6/html5/thumbnails/115.jpg)
SummaryThere are many metrics to evaluate detection performance of a model.
Detection error is most common but has many flaws.
Confusion matrix gives four numbers but hard to compare.
F1-score is a more robust measure but based on a single threshold.
AUC score is a most robust measure that integrates results over many thresholds.
![Page 116: Introduction to Anomaly Detection - uwyo.educlan/teach/rampe18_anomaly.pdf · - preliminary: data representation and visualization-six common anomaly detection approaches Evaluation](https://reader033.vdocuments.net/reader033/viewer/2022042909/5f3b9da76c029054e648a0f6/html5/thumbnails/116.jpg)
OutlineBackground
Learning-based Detection Approaches
Evaluation Metrics
Challenges
![Page 117: Introduction to Anomaly Detection - uwyo.educlan/teach/rampe18_anomaly.pdf · - preliminary: data representation and visualization-six common anomaly detection approaches Evaluation](https://reader033.vdocuments.net/reader033/viewer/2022042909/5f3b9da76c029054e648a0f6/html5/thumbnails/117.jpg)
Challenges in Anomaly Detection Contextual Anomaly Detection
Collective Anomaly Detection
Other Technical Challenges
![Page 118: Introduction to Anomaly Detection - uwyo.educlan/teach/rampe18_anomaly.pdf · - preliminary: data representation and visualization-six common anomaly detection approaches Evaluation](https://reader033.vdocuments.net/reader033/viewer/2022042909/5f3b9da76c029054e648a0f6/html5/thumbnails/118.jpg)
Contextual Anomaly
![Page 119: Introduction to Anomaly Detection - uwyo.educlan/teach/rampe18_anomaly.pdf · - preliminary: data representation and visualization-six common anomaly detection approaches Evaluation](https://reader033.vdocuments.net/reader033/viewer/2022042909/5f3b9da76c029054e648a0f6/html5/thumbnails/119.jpg)
Collective Anomaly
![Page 120: Introduction to Anomaly Detection - uwyo.educlan/teach/rampe18_anomaly.pdf · - preliminary: data representation and visualization-six common anomaly detection approaches Evaluation](https://reader033.vdocuments.net/reader033/viewer/2022042909/5f3b9da76c029054e648a0f6/html5/thumbnails/120.jpg)
Collective Anomaly
![Page 121: Introduction to Anomaly Detection - uwyo.educlan/teach/rampe18_anomaly.pdf · - preliminary: data representation and visualization-six common anomaly detection approaches Evaluation](https://reader033.vdocuments.net/reader033/viewer/2022042909/5f3b9da76c029054e648a0f6/html5/thumbnails/121.jpg)
Exercise: Any Anomaly?A customer is shopping on Amazon
- object 1: steel ball bearings
![Page 122: Introduction to Anomaly Detection - uwyo.educlan/teach/rampe18_anomaly.pdf · - preliminary: data representation and visualization-six common anomaly detection approaches Evaluation](https://reader033.vdocuments.net/reader033/viewer/2022042909/5f3b9da76c029054e648a0f6/html5/thumbnails/122.jpg)
Exercise: Any Anomaly? A customer is shopping on Amazon
- object 1: steel ball bearings
- object 2: black powder/charcoal
![Page 123: Introduction to Anomaly Detection - uwyo.educlan/teach/rampe18_anomaly.pdf · - preliminary: data representation and visualization-six common anomaly detection approaches Evaluation](https://reader033.vdocuments.net/reader033/viewer/2022042909/5f3b9da76c029054e648a0f6/html5/thumbnails/123.jpg)
Exercise: Any Anomaly? A customer is shopping on Amazon
- object 1: steel ball bearings
- object 2: black powder/charcoal
- object 3: battery connectors
![Page 124: Introduction to Anomaly Detection - uwyo.educlan/teach/rampe18_anomaly.pdf · - preliminary: data representation and visualization-six common anomaly detection approaches Evaluation](https://reader033.vdocuments.net/reader033/viewer/2022042909/5f3b9da76c029054e648a0f6/html5/thumbnails/124.jpg)
Exercise: Any Anomaly? A customer is shopping on Amazon
- object 1: steel ball bearings
- object 2: black powder/charcoal
- object 3: battery connectors
- …
A customer who bought above items together could be a bomb-maker!
![Page 125: Introduction to Anomaly Detection - uwyo.educlan/teach/rampe18_anomaly.pdf · - preliminary: data representation and visualization-six common anomaly detection approaches Evaluation](https://reader033.vdocuments.net/reader033/viewer/2022042909/5f3b9da76c029054e648a0f6/html5/thumbnails/125.jpg)
Other Technical Challenges
Hard to find a normal region.
Attackers may disguise anomalies.
Normal behavior may evolve over time.
Notion of anomaly is problem-dependent.
Not enough labeled data (especially, anomalous data).
![Page 126: Introduction to Anomaly Detection - uwyo.educlan/teach/rampe18_anomaly.pdf · - preliminary: data representation and visualization-six common anomaly detection approaches Evaluation](https://reader033.vdocuments.net/reader033/viewer/2022042909/5f3b9da76c029054e648a0f6/html5/thumbnails/126.jpg)
Q & A?