a system for detecting anomalies in data streams for ...dddas/papers/alec_proposal_slides.pdfonline...
TRANSCRIPT
![Page 1: A System for Detecting Anomalies in Data Streams for ...dddas/Papers/alec_proposal_slides.pdfOnline Anomaly Detection via Clustering Link Sampling and Anomalous Link Detection Real-Time](https://reader033.vdocuments.net/reader033/viewer/2022050110/5f4777f48cda017c553260ad/html5/thumbnails/1.jpg)
A System for Detecting
Anomalies in Data
Streams for Emergency
Response Applications
Alec Pawling
Overview
Detection and Alert
System
Real-Time Data Source
Conclusion
A System for Detecting Anomalies inData Streams for Emergency Response
Applications
Alec Pawling
University of Notre Dame
October 2, 2007
![Page 2: A System for Detecting Anomalies in Data Streams for ...dddas/Papers/alec_proposal_slides.pdfOnline Anomaly Detection via Clustering Link Sampling and Anomalous Link Detection Real-Time](https://reader033.vdocuments.net/reader033/viewer/2022050110/5f4777f48cda017c553260ad/html5/thumbnails/2.jpg)
A System for Detecting
Anomalies in Data
Streams for Emergency
Response Applications
Alec Pawling
Overview
Proposed Research
WIPER
Detection and Alert
System
Real-Time Data Source
Conclusion
Outline
Overview
Proposed Research
WIPER
Detection and Alert System
Online Anomaly Detection via ClusteringLink Sampling and Anomalous Link Detection
Real-Time Data Source
Conclusion
![Page 3: A System for Detecting Anomalies in Data Streams for ...dddas/Papers/alec_proposal_slides.pdfOnline Anomaly Detection via Clustering Link Sampling and Anomalous Link Detection Real-Time](https://reader033.vdocuments.net/reader033/viewer/2022050110/5f4777f48cda017c553260ad/html5/thumbnails/3.jpg)
A System for Detecting
Anomalies in Data
Streams for Emergency
Response Applications
Alec Pawling
Overview
Proposed Research
WIPER
Detection and Alert
System
Real-Time Data Source
Conclusion
Overview
Proposed Research
Fast, online anomaly detection in streaming sensordata
Non-relational dataRelational data
Real-time data aggregation and distribution tovarious system components
Motivation
Wireless Phone-based Emergency Response System(WIPER)
![Page 4: A System for Detecting Anomalies in Data Streams for ...dddas/Papers/alec_proposal_slides.pdfOnline Anomaly Detection via Clustering Link Sampling and Anomalous Link Detection Real-Time](https://reader033.vdocuments.net/reader033/viewer/2022050110/5f4777f48cda017c553260ad/html5/thumbnails/4.jpg)
A System for Detecting
Anomalies in Data
Streams for Emergency
Response Applications
Alec Pawling
Overview
Proposed Research
WIPER
Detection and Alert
System
Real-Time Data Source
Conclusion
Wireless Phone-Based Emergency ResponseSystem (WIPER)
Emergency Response System
Provide decision support to emergency responsemanagers
Cell phones as sensors
![Page 5: A System for Detecting Anomalies in Data Streams for ...dddas/Papers/alec_proposal_slides.pdfOnline Anomaly Detection via Clustering Link Sampling and Anomalous Link Detection Real-Time](https://reader033.vdocuments.net/reader033/viewer/2022050110/5f4777f48cda017c553260ad/html5/thumbnails/5.jpg)
A System for Detecting
Anomalies in Data
Streams for Emergency
Response Applications
Alec Pawling
Overview
Proposed Research
WIPER
Detection and Alert
System
Real-Time Data Source
Conclusion
Wireless Phone-Based Emergency ResponseSystem (WIPER)
![Page 6: A System for Detecting Anomalies in Data Streams for ...dddas/Papers/alec_proposal_slides.pdfOnline Anomaly Detection via Clustering Link Sampling and Anomalous Link Detection Real-Time](https://reader033.vdocuments.net/reader033/viewer/2022050110/5f4777f48cda017c553260ad/html5/thumbnails/6.jpg)
A System for Detecting
Anomalies in Data
Streams for Emergency
Response Applications
Alec Pawling
Overview
Detection and Alert
System
Online Anomaly Detectionvia Clustering
Link Sampling andAnomalous Link Detection
Real-Time Data Source
Conclusion
Outline
Overview
Detection and Alert System
Online Anomaly Detection via Clustering
Problem Definition
Related Work
An Online Hybrid Clustering Algorithm
Datasets
Experimental Setup
Results
Proposed Research
Link Sampling and Anomalous Link Detection
Real-Time Data Source
Conclusion
![Page 7: A System for Detecting Anomalies in Data Streams for ...dddas/Papers/alec_proposal_slides.pdfOnline Anomaly Detection via Clustering Link Sampling and Anomalous Link Detection Real-Time](https://reader033.vdocuments.net/reader033/viewer/2022050110/5f4777f48cda017c553260ad/html5/thumbnails/7.jpg)
A System for Detecting
Anomalies in Data
Streams for Emergency
Response Applications
Alec Pawling
Overview
Detection and Alert
System
Online Anomaly Detectionvia Clustering
Link Sampling andAnomalous Link Detection
Real-Time Data Source
Conclusion
Problem Definition
Problem:
How can we detect anomalies in streaming cellphone transaction data?
Challenges:
Lots of data
Limited time for detecting anomalies
![Page 8: A System for Detecting Anomalies in Data Streams for ...dddas/Papers/alec_proposal_slides.pdfOnline Anomaly Detection via Clustering Link Sampling and Anomalous Link Detection Real-Time](https://reader033.vdocuments.net/reader033/viewer/2022050110/5f4777f48cda017c553260ad/html5/thumbnails/8.jpg)
A System for Detecting
Anomalies in Data
Streams for Emergency
Response Applications
Alec Pawling
Overview
Detection and Alert
System
Online Anomaly Detectionvia Clustering
Link Sampling andAnomalous Link Detection
Real-Time Data Source
Conclusion
Related Work
Proximity Based Anomaly Detection
Makes no assumptions about data distribution
Anomalous points are far from other points (specificdefinitions vary from application to application)
Computationally expensive
Clustering can be used to reduce computationalcomplexity
![Page 9: A System for Detecting Anomalies in Data Streams for ...dddas/Papers/alec_proposal_slides.pdfOnline Anomaly Detection via Clustering Link Sampling and Anomalous Link Detection Real-Time](https://reader033.vdocuments.net/reader033/viewer/2022050110/5f4777f48cda017c553260ad/html5/thumbnails/9.jpg)
A System for Detecting
Anomalies in Data
Streams for Emergency
Response Applications
Alec Pawling
Overview
Detection and Alert
System
Online Anomaly Detectionvia Clustering
Link Sampling andAnomalous Link Detection
Real-Time Data Source
Conclusion
Related Work
Approaches to Data Clustering [Jain, Murty, and Flynn,1999]:
Hierarchical Clustering
Iteratively split/merge clustersComputationally expensive
Partitional Clustering
Divides the data into disjoint subsetsRelatively efficientAssumes prior knowledge of the number of cluster;prone to finding local maxima
Incremental Clustering
Consider examples one at a time; update clustersEfficient
![Page 10: A System for Detecting Anomalies in Data Streams for ...dddas/Papers/alec_proposal_slides.pdfOnline Anomaly Detection via Clustering Link Sampling and Anomalous Link Detection Real-Time](https://reader033.vdocuments.net/reader033/viewer/2022050110/5f4777f48cda017c553260ad/html5/thumbnails/10.jpg)
A System for Detecting
Anomalies in Data
Streams for Emergency
Response Applications
Alec Pawling
Overview
Detection and Alert
System
Online Anomaly Detectionvia Clustering
Link Sampling andAnomalous Link Detection
Real-Time Data Source
Conclusion
Related Work
Leader Algorithm [Hartigan, 1975]
For each data example
Locate the closest cluster center.If the distance between the example and the clustercenter is less than a user defined threshold
Add the example to the cluster.
Otherwise, create a new cluster centered at theexample.
![Page 11: A System for Detecting Anomalies in Data Streams for ...dddas/Papers/alec_proposal_slides.pdfOnline Anomaly Detection via Clustering Link Sampling and Anomalous Link Detection Real-Time](https://reader033.vdocuments.net/reader033/viewer/2022050110/5f4777f48cda017c553260ad/html5/thumbnails/11.jpg)
A System for Detecting
Anomalies in Data
Streams for Emergency
Response Applications
Alec Pawling
Overview
Detection and Alert
System
Online Anomaly Detectionvia Clustering
Link Sampling andAnomalous Link Detection
Real-Time Data Source
Conclusion
Related Work
Hybrid Clustering: combination of two clusteringalgorithms
Cheu et al. 2004: Use partitional algorithms toreduce data set for hierarchical algorithms
Chipman and Tibshiran 2006: Combine bottom upalgorithms with top down algorithms
![Page 12: A System for Detecting Anomalies in Data Streams for ...dddas/Papers/alec_proposal_slides.pdfOnline Anomaly Detection via Clustering Link Sampling and Anomalous Link Detection Real-Time](https://reader033.vdocuments.net/reader033/viewer/2022050110/5f4777f48cda017c553260ad/html5/thumbnails/12.jpg)
A System for Detecting
Anomalies in Data
Streams for Emergency
Response Applications
Alec Pawling
Overview
Detection and Alert
System
Online Anomaly Detectionvia Clustering
Link Sampling andAnomalous Link Detection
Real-Time Data Source
Conclusion
An Online Hybrid Clustering Algorithm
For each example ~x :
Find the closest cluster Ci
Let ~µi be the centroid of Ci
Let ~σi standard deviations of the features of Ci
If d(~x , ~µi ) < l |~σi |, add ~x to Ci
Otherwise, add ~x to the set of unclustered examples
If there are km examples in the unclustered set:
Cluster the unclustered examples using k-meansFor each cluster with m or more examples:
Accept the cluster
For each cluster with less than m examples:
Return its examples to the unclustered set
![Page 13: A System for Detecting Anomalies in Data Streams for ...dddas/Papers/alec_proposal_slides.pdfOnline Anomaly Detection via Clustering Link Sampling and Anomalous Link Detection Real-Time](https://reader033.vdocuments.net/reader033/viewer/2022050110/5f4777f48cda017c553260ad/html5/thumbnails/13.jpg)
A System for Detecting
Anomalies in Data
Streams for Emergency
Response Applications
Alec Pawling
Overview
Detection and Alert
System
Online Anomaly Detectionvia Clustering
Link Sampling andAnomalous Link Detection
Real-Time Data Source
Conclusion
Experimental Setup
Dataset:
Real world data:
12 days of cell phone network transaction dataDiscretized into 1 minute intervals18721 examples
Feature vector:
Timestamp: hour and minuteNumber of times each service is used in the interval
5 services
Evaluation:
Compare hybrid algorithm to 1-NN anomalydetection
![Page 14: A System for Detecting Anomalies in Data Streams for ...dddas/Papers/alec_proposal_slides.pdfOnline Anomaly Detection via Clustering Link Sampling and Anomalous Link Detection Real-Time](https://reader033.vdocuments.net/reader033/viewer/2022050110/5f4777f48cda017c553260ad/html5/thumbnails/14.jpg)
A System for Detecting
Anomalies in Data
Streams for Emergency
Response Applications
Alec Pawling
Overview
Detection and Alert
System
Online Anomaly Detectionvia Clustering
Link Sampling andAnomalous Link Detection
Real-Time Data Source
Conclusion
Results
Ful
lT
rial 2
Tria
l 5T
rial 8
0 500 1000 1500 2000 2500
Pairwise distances
Figure: Distribution of distances between outliers and theirnearest neighbor.
![Page 15: A System for Detecting Anomalies in Data Streams for ...dddas/Papers/alec_proposal_slides.pdfOnline Anomaly Detection via Clustering Link Sampling and Anomalous Link Detection Real-Time](https://reader033.vdocuments.net/reader033/viewer/2022050110/5f4777f48cda017c553260ad/html5/thumbnails/15.jpg)
A System for Detecting
Anomalies in Data
Streams for Emergency
Response Applications
Alec Pawling
Overview
Detection and Alert
System
Online Anomaly Detectionvia Clustering
Link Sampling andAnomalous Link Detection
Real-Time Data Source
Conclusion
Proposed Research
New first level clustering algorithm:
Deterministic, hierarchical
Additional analysis of clusters:
Movement of clusters
Rate at which examples are added to clusters
![Page 16: A System for Detecting Anomalies in Data Streams for ...dddas/Papers/alec_proposal_slides.pdfOnline Anomaly Detection via Clustering Link Sampling and Anomalous Link Detection Real-Time](https://reader033.vdocuments.net/reader033/viewer/2022050110/5f4777f48cda017c553260ad/html5/thumbnails/16.jpg)
A System for Detecting
Anomalies in Data
Streams for Emergency
Response Applications
Alec Pawling
Overview
Detection and Alert
System
Online Anomaly Detectionvia Clustering
Link Sampling andAnomalous Link Detection
Real-Time Data Source
Conclusion
Outline
Overview
Detection and Alert System
Online Anomaly Detection via ClusteringLink Sampling and Anomalous Link Detection
Problem Definition
Related Work
Datasets
Implementation Details
Experimental Setup
Results
Conclusions
Proposed Research
Real-Time Data Source
Conclusion
![Page 17: A System for Detecting Anomalies in Data Streams for ...dddas/Papers/alec_proposal_slides.pdfOnline Anomaly Detection via Clustering Link Sampling and Anomalous Link Detection Real-Time](https://reader033.vdocuments.net/reader033/viewer/2022050110/5f4777f48cda017c553260ad/html5/thumbnails/17.jpg)
A System for Detecting
Anomalies in Data
Streams for Emergency
Response Applications
Alec Pawling
Overview
Detection and Alert
System
Online Anomaly Detectionvia Clustering
Link Sampling andAnomalous Link Detection
Real-Time Data Source
Conclusion
Problem Definition
Problem:
How does sampling a graph (network) affect ourability to identify anomalous edges (links)?
Challenges:
Large graphs
Limited time
Limited memory
![Page 18: A System for Detecting Anomalies in Data Streams for ...dddas/Papers/alec_proposal_slides.pdfOnline Anomaly Detection via Clustering Link Sampling and Anomalous Link Detection Real-Time](https://reader033.vdocuments.net/reader033/viewer/2022050110/5f4777f48cda017c553260ad/html5/thumbnails/18.jpg)
A System for Detecting
Anomalies in Data
Streams for Emergency
Response Applications
Alec Pawling
Overview
Detection and Alert
System
Online Anomaly Detectionvia Clustering
Link Sampling andAnomalous Link Detection
Real-Time Data Source
Conclusion
Related Work
Sampling Networks
“Subnets of Scale-Free Networks are not Scale-Free”[Stumpf et al., 2005]
Sampling a network changes often changes itscharacteristics in predictable ways. [Lee et al., 2006]
Sampling from Streams
Sliding window: only contains most recent items inthe stream
Uniform sample [Vitter, 1985]: all items in thestream have equal probability of being retained bythe sample
Biased sample [Aggarwal, 2006]: compromisebetween sliding window and uniform sample
![Page 19: A System for Detecting Anomalies in Data Streams for ...dddas/Papers/alec_proposal_slides.pdfOnline Anomaly Detection via Clustering Link Sampling and Anomalous Link Detection Real-Time](https://reader033.vdocuments.net/reader033/viewer/2022050110/5f4777f48cda017c553260ad/html5/thumbnails/19.jpg)
A System for Detecting
Anomalies in Data
Streams for Emergency
Response Applications
Alec Pawling
Overview
Detection and Alert
System
Online Anomaly Detectionvia Clustering
Link Sampling andAnomalous Link Detection
Real-Time Data Source
Conclusion
Related Work
Anomalous Link Detection [Rattigan and Jensen, 2005]
Goal: Identify “surprising” edges in a graph
Methods from link prediction literature[Liben-Nowell and Kleinberg, 2003]
For each edge, (u, v), in the graph, compute theproximity of u and v
Anomalous links have a proximity below somethreshold
Two general approaches:
Neighborhood based methodsPath based methods
![Page 20: A System for Detecting Anomalies in Data Streams for ...dddas/Papers/alec_proposal_slides.pdfOnline Anomaly Detection via Clustering Link Sampling and Anomalous Link Detection Real-Time](https://reader033.vdocuments.net/reader033/viewer/2022050110/5f4777f48cda017c553260ad/html5/thumbnails/20.jpg)
A System for Detecting
Anomalies in Data
Streams for Emergency
Response Applications
Alec Pawling
Overview
Detection and Alert
System
Online Anomaly Detectionvia Clustering
Link Sampling andAnomalous Link Detection
Real-Time Data Source
Conclusion
Related Work
Neighborhood based methods. Let Γ(u) be the setof vertices that are connected to u by an edge
Common neighbors: the number of neighbors sharedby u and v
|Γ(u) ∩ Γ(v)|
Jaccard’s coefficient: the probability that a neighborof u or v is a neighbor of both u and v
|Γ(u) ∩ Γ(v)|
|Γ(u) ∪ Γ(v)|
Path based method
Rooted PageRank: the probability that a randomwalk starting at u will reach v if the walk fails ateach step with some probability
![Page 21: A System for Detecting Anomalies in Data Streams for ...dddas/Papers/alec_proposal_slides.pdfOnline Anomaly Detection via Clustering Link Sampling and Anomalous Link Detection Real-Time](https://reader033.vdocuments.net/reader033/viewer/2022050110/5f4777f48cda017c553260ad/html5/thumbnails/21.jpg)
A System for Detecting
Anomalies in Data
Streams for Emergency
Response Applications
Alec Pawling
Overview
Detection and Alert
System
Online Anomaly Detectionvia Clustering
Link Sampling andAnomalous Link Detection
Real-Time Data Source
Conclusion
Datasets
Cell phone network: transactions initiated bymembers of a single service provider
SMS: one day of text message transactionsPhone: one day of call transactions
Enron: snapshot of Enron email server. Containsemails to and from @enron.com addresses, May 10,1999 to January 31, 2002
vertices transaction edges
SMS (1 day) 2,350,793 3,339,708 1,597,818Call (1 day) 6,261,633 8,019,290 5,243,128Enron 25,854 1,033,638 201,243
![Page 22: A System for Detecting Anomalies in Data Streams for ...dddas/Papers/alec_proposal_slides.pdfOnline Anomaly Detection via Clustering Link Sampling and Anomalous Link Detection Real-Time](https://reader033.vdocuments.net/reader033/viewer/2022050110/5f4777f48cda017c553260ad/html5/thumbnails/22.jpg)
A System for Detecting
Anomalies in Data
Streams for Emergency
Response Applications
Alec Pawling
Overview
Detection and Alert
System
Online Anomaly Detectionvia Clustering
Link Sampling andAnomalous Link Detection
Real-Time Data Source
Conclusion
Implementation Details
Implementation is straightforward for commonneighbors and Jaccard’s coefficient
Rooted PageRank is typically determined using thestationary distribution of a Markov Chain
Stationary distribution is computed by repeatedmatrix multiplicationsMatrices for the SMS and call datasets are too largeto store in main memory
We use a series of random walks to approximaterooted PageRank
Bound the walk length using a geometricdistributionTotal number of random walks is based on theaverage degree of the graph
![Page 23: A System for Detecting Anomalies in Data Streams for ...dddas/Papers/alec_proposal_slides.pdfOnline Anomaly Detection via Clustering Link Sampling and Anomalous Link Detection Real-Time](https://reader033.vdocuments.net/reader033/viewer/2022050110/5f4777f48cda017c553260ad/html5/thumbnails/23.jpg)
A System for Detecting
Anomalies in Data
Streams for Emergency
Response Applications
Alec Pawling
Overview
Detection and Alert
System
Online Anomaly Detectionvia Clustering
Link Sampling andAnomalous Link Detection
Real-Time Data Source
Conclusion
Experimental Setup
Three sampling methods: sliding window, uniformsampling, and biased sampling
Three anomalous link detection methods: commonneighbors, Jaccard’s coefficient, and rootedPageRank
Sample sizes range from 10% to 90% of thetransactions
Evaluate using Spearman’s rank correlation
![Page 24: A System for Detecting Anomalies in Data Streams for ...dddas/Papers/alec_proposal_slides.pdfOnline Anomaly Detection via Clustering Link Sampling and Anomalous Link Detection Real-Time](https://reader033.vdocuments.net/reader033/viewer/2022050110/5f4777f48cda017c553260ad/html5/thumbnails/24.jpg)
A System for Detecting
Anomalies in Data
Streams for Emergency
Response Applications
Alec Pawling
Overview
Detection and Alert
System
Online Anomaly Detectionvia Clustering
Link Sampling andAnomalous Link Detection
Real-Time Data Source
Conclusion
Results
0.4
0.5
0.6
0.7
0.8
0.9
1
0 0.2 0.4 0.6 0.8 1
Ran
k C
orre
latio
n
Fraction of Data Set
Uniform sampleBiased sampleSliding window 0.4
0.5
0.6
0.7
0.8
0.9
1
0 0.2 0.4 0.6 0.8 1
Ran
k C
orre
latio
n
Fraction of Data Set
Uniform sampleBiased sampleSliding window
Figure: Rank correlations for call dataset. Left: Jaccard’scoefficient. Right: rooted PageRank.
![Page 25: A System for Detecting Anomalies in Data Streams for ...dddas/Papers/alec_proposal_slides.pdfOnline Anomaly Detection via Clustering Link Sampling and Anomalous Link Detection Real-Time](https://reader033.vdocuments.net/reader033/viewer/2022050110/5f4777f48cda017c553260ad/html5/thumbnails/25.jpg)
A System for Detecting
Anomalies in Data
Streams for Emergency
Response Applications
Alec Pawling
Overview
Detection and Alert
System
Online Anomaly Detectionvia Clustering
Link Sampling andAnomalous Link Detection
Real-Time Data Source
Conclusion
Observations
Rooted PageRank performs better on smaller samples
Rooted PageRank is computationally expensive
Better to use Jaccard’s coefficient with larger samples.
![Page 26: A System for Detecting Anomalies in Data Streams for ...dddas/Papers/alec_proposal_slides.pdfOnline Anomaly Detection via Clustering Link Sampling and Anomalous Link Detection Real-Time](https://reader033.vdocuments.net/reader033/viewer/2022050110/5f4777f48cda017c553260ad/html5/thumbnails/26.jpg)
A System for Detecting
Anomalies in Data
Streams for Emergency
Response Applications
Alec Pawling
Overview
Detection and Alert
System
Online Anomaly Detectionvia Clustering
Link Sampling andAnomalous Link Detection
Real-Time Data Source
Conclusion
Proposed Research
Extract and analyze city level subgraphs
Investigate changes in Jaccard’s coefficientdistribution over time
![Page 27: A System for Detecting Anomalies in Data Streams for ...dddas/Papers/alec_proposal_slides.pdfOnline Anomaly Detection via Clustering Link Sampling and Anomalous Link Detection Real-Time](https://reader033.vdocuments.net/reader033/viewer/2022050110/5f4777f48cda017c553260ad/html5/thumbnails/27.jpg)
A System for Detecting
Anomalies in Data
Streams for Emergency
Response Applications
Alec Pawling
Overview
Detection and Alert
System
Real-Time Data Source
Conclusion
Outline
Overview
Detection and Alert System
Online Anomaly Detection via ClusteringLink Sampling and Anomalous Link Detection
Real-Time Data Source
Overview
Prototype Implementation
Experimental Setup
Results
Conclusions
Proposed Research
Conclusion
![Page 28: A System for Detecting Anomalies in Data Streams for ...dddas/Papers/alec_proposal_slides.pdfOnline Anomaly Detection via Clustering Link Sampling and Anomalous Link Detection Real-Time](https://reader033.vdocuments.net/reader033/viewer/2022050110/5f4777f48cda017c553260ad/html5/thumbnails/28.jpg)
A System for Detecting
Anomalies in Data
Streams for Emergency
Response Applications
Alec Pawling
Overview
Detection and Alert
System
Real-Time Data Source
Conclusion
Overview
Motivation:
Use existing cell phone network as a sensor network
Advantages:
Cheap deployment
Disadvantages:
No control over the network
Goal:
Receive transaction data from the cellular serviceprovider
Summarize and distribute data to clients (DSS,DAS, SPS) in real-time
![Page 29: A System for Detecting Anomalies in Data Streams for ...dddas/Papers/alec_proposal_slides.pdfOnline Anomaly Detection via Clustering Link Sampling and Anomalous Link Detection Real-Time](https://reader033.vdocuments.net/reader033/viewer/2022050110/5f4777f48cda017c553260ad/html5/thumbnails/29.jpg)
A System for Detecting
Anomalies in Data
Streams for Emergency
Response Applications
Alec Pawling
Overview
Detection and Alert
System
Real-Time Data Source
Conclusion
Overview
Incoming data:
Time at which service was initiated
The network service used
Anonymized values indicating people involved inusing the service
Towers involved in providing the service
Outgoing data:
Stream of interval summaries
Each item in the stream consists of
A timestamp indicating the end of the intervalA vector containing the number of times eachservice was used in the interval
Clients specify interval length
![Page 30: A System for Detecting Anomalies in Data Streams for ...dddas/Papers/alec_proposal_slides.pdfOnline Anomaly Detection via Clustering Link Sampling and Anomalous Link Detection Real-Time](https://reader033.vdocuments.net/reader033/viewer/2022050110/5f4777f48cda017c553260ad/html5/thumbnails/30.jpg)
A System for Detecting
Anomalies in Data
Streams for Emergency
Response Applications
Alec Pawling
Overview
Detection and Alert
System
Real-Time Data Source
Conclusion
Prototype Implementation
Ruby:
Interpreted language
Web-services support
Multi-threading support with large priority space
Assumption:
Data from service provider arrives in order
Periodic Task Model:
Periodic tasks: send data to clients
For each client: a task executes at the end of everyintervalDeadline is the end of the next interval
Aperiodic tasks: maintain interval summaries
![Page 31: A System for Detecting Anomalies in Data Streams for ...dddas/Papers/alec_proposal_slides.pdfOnline Anomaly Detection via Clustering Link Sampling and Anomalous Link Detection Real-Time](https://reader033.vdocuments.net/reader033/viewer/2022050110/5f4777f48cda017c553260ad/html5/thumbnails/31.jpg)
A System for Detecting
Anomalies in Data
Streams for Emergency
Response Applications
Alec Pawling
Overview
Detection and Alert
System
Real-Time Data Source
Conclusion
Experimental Setup
Setup:
2 to 24 clients
Task periods of 0.05, 0.06, 0.07, 0.08, 0.09 seconds
Constant transaction streams: 100 transactions /second
Four evaluation measures:
the rate of missed deadlines
the rate of skipped tasks
the average delay for the periodic tasks
the correctness of the data source output
![Page 32: A System for Detecting Anomalies in Data Streams for ...dddas/Papers/alec_proposal_slides.pdfOnline Anomaly Detection via Clustering Link Sampling and Anomalous Link Detection Real-Time](https://reader033.vdocuments.net/reader033/viewer/2022050110/5f4777f48cda017c553260ad/html5/thumbnails/32.jpg)
A System for Detecting
Anomalies in Data
Streams for Emergency
Response Applications
Alec Pawling
Overview
Detection and Alert
System
Real-Time Data Source
Conclusion
Results
Observations:
System fails (incorrect output) with a low utilization(≈ 0.26)
In many cases, tasks were released after deadline,skipped
Conclusion:
Periodic task model is too inflexible for this system
![Page 33: A System for Detecting Anomalies in Data Streams for ...dddas/Papers/alec_proposal_slides.pdfOnline Anomaly Detection via Clustering Link Sampling and Anomalous Link Detection Real-Time](https://reader033.vdocuments.net/reader033/viewer/2022050110/5f4777f48cda017c553260ad/html5/thumbnails/33.jpg)
A System for Detecting
Anomalies in Data
Streams for Emergency
Response Applications
Alec Pawling
Overview
Detection and Alert
System
Real-Time Data Source
Conclusion
Proposed Research
Use rate-based execution model [Jeffay andGoddard, 1999]
Parameterize with:
Maximum expected aperiodic task rate
Desired aperiodic task response time
When aperiodic task rate exceeds maximumexpected rate:
Deadlines shift, response time decays
Remove assumption that transaction stream arrivesin order
Sporadic tasks with dynamic release times todistribute summaries to clientMinimize data loss, minimize delay
![Page 34: A System for Detecting Anomalies in Data Streams for ...dddas/Papers/alec_proposal_slides.pdfOnline Anomaly Detection via Clustering Link Sampling and Anomalous Link Detection Real-Time](https://reader033.vdocuments.net/reader033/viewer/2022050110/5f4777f48cda017c553260ad/html5/thumbnails/34.jpg)
A System for Detecting
Anomalies in Data
Streams for Emergency
Response Applications
Alec Pawling
Overview
Detection and Alert
System
Real-Time Data Source
Conclusion
Summary of ProposedResearch
Published Papers
Proposed Schedule
Outline
Overview
Detection and Alert System
Online Anomaly Detection via ClusteringLink Sampling and Anomalous Link Detection
Real-Time Data Source
Conclusion
Summary of Proposed Research
Proposed Schedule
![Page 35: A System for Detecting Anomalies in Data Streams for ...dddas/Papers/alec_proposal_slides.pdfOnline Anomaly Detection via Clustering Link Sampling and Anomalous Link Detection Real-Time](https://reader033.vdocuments.net/reader033/viewer/2022050110/5f4777f48cda017c553260ad/html5/thumbnails/35.jpg)
A System for Detecting
Anomalies in Data
Streams for Emergency
Response Applications
Alec Pawling
Overview
Detection and Alert
System
Real-Time Data Source
Conclusion
Summary of ProposedResearch
Published Papers
Proposed Schedule
Summary of Proposed Research
Detection and Alert SystemOnline Anomaly Detection via Clustering
Extend hybrid clustering algorithm into a
streaming algorithm
Link Sampling and Anomalous Link Detection
Identify feasible methods for reducing graph data
for online analysis
Identify graph features that can be quickly
computed and allow the identification of
anomalous behavior in graphs
Real-Time Data Source
Develop a real-time system for distributingsummaries of streaming transaction data to clientsHandle out of order data arrival dynamicallyOnline minimization of dropped data andpropagation delay
![Page 36: A System for Detecting Anomalies in Data Streams for ...dddas/Papers/alec_proposal_slides.pdfOnline Anomaly Detection via Clustering Link Sampling and Anomalous Link Detection Real-Time](https://reader033.vdocuments.net/reader033/viewer/2022050110/5f4777f48cda017c553260ad/html5/thumbnails/36.jpg)
A System for Detecting
Anomalies in Data
Streams for Emergency
Response Applications
Alec Pawling
Overview
Detection and Alert
System
Real-Time Data Source
Conclusion
Summary of ProposedResearch
Published Papers
Proposed Schedule
Published Papers
Online Anomaly Detection via Clustering:
Proceedings of the North American Association forComputational Social and Organization Science,2006. (Best student paper.)
Computational and Mathematical OrganizationTheory. To appear.
![Page 37: A System for Detecting Anomalies in Data Streams for ...dddas/Papers/alec_proposal_slides.pdfOnline Anomaly Detection via Clustering Link Sampling and Anomalous Link Detection Real-Time](https://reader033.vdocuments.net/reader033/viewer/2022050110/5f4777f48cda017c553260ad/html5/thumbnails/37.jpg)
A System for Detecting
Anomalies in Data
Streams for Emergency
Response Applications
Alec Pawling
Overview
Detection and Alert
System
Real-Time Data Source
Conclusion
Summary of ProposedResearch
Published Papers
Proposed Schedule
Proposed Schedule
Detection and Alert System
Online Anomaly Detection via Clustering
New conference paper early in 2008New journal paper late in 2008
Anomaly Detection in Graphs
Conference submission (SIAM) in October 2007Additional conference submission early in 2008Journal submissions in late 2008 or early 2009
Real-Time Data Source:
Conference submission in mid 2008 (describing acompletely redesigned and rebuilt system)
Journal submission in early 2009
Dissertation Defense: March 2009
![Page 38: A System for Detecting Anomalies in Data Streams for ...dddas/Papers/alec_proposal_slides.pdfOnline Anomaly Detection via Clustering Link Sampling and Anomalous Link Detection Real-Time](https://reader033.vdocuments.net/reader033/viewer/2022050110/5f4777f48cda017c553260ad/html5/thumbnails/38.jpg)
A System for Detecting
Anomalies in Data
Streams for Emergency
Response Applications
Alec Pawling
Overview
Detection and Alert
System
Real-Time Data Source
Conclusion
Summary of ProposedResearch
Published Papers
Proposed Schedule
Acknowledgments
The material presented here is based in part upon worksupported by the National Science Foundation, theDDDAS Program, under grant No. CNS-050348.
The committee:
Dr. Chaudhary
Dr. Chawla
Dr. Poellabauer
The outside chair:
Dr. Hachen
My advisor:
Dr. Madey
![Page 39: A System for Detecting Anomalies in Data Streams for ...dddas/Papers/alec_proposal_slides.pdfOnline Anomaly Detection via Clustering Link Sampling and Anomalous Link Detection Real-Time](https://reader033.vdocuments.net/reader033/viewer/2022050110/5f4777f48cda017c553260ad/html5/thumbnails/39.jpg)
A System for Detecting
Anomalies in Data
Streams for Emergency
Response Applications
Alec Pawling
Overview
Detection and Alert
System
Real-Time Data Source
Conclusion
Summary of ProposedResearch
Published Papers
Proposed Schedule
Questions?