Download - Qun Huang and Patrick P. C. Lee The Chinese University of Hong Kong, Hong Kong INFOCOM’14
![Page 1: Qun Huang and Patrick P. C. Lee The Chinese University of Hong Kong, Hong Kong INFOCOM’14](https://reader036.vdocuments.net/reader036/viewer/2022062520/56816589550346895dd84870/html5/thumbnails/1.jpg)
1
LD-Sketch: A Distributed Sketching Design for Accurate and Scalable Anomaly Detection in Network Data Streams
Qun Huang and Patrick P. C. Lee
The Chinese University of Hong Kong, Hong Kong
INFOCOM’14
![Page 2: Qun Huang and Patrick P. C. Lee The Chinese University of Hong Kong, Hong Kong INFOCOM’14](https://reader036.vdocuments.net/reader036/viewer/2022062520/56816589550346895dd84870/html5/thumbnails/2.jpg)
Network traffic: a stream of (key, value) tuples• Keys: src IPs, five-tuple flows• Value: # of packets, payload bytes
Heavy keys - classical anomalies in network traffic• Heavy hitters: keys with large volume in one period
• e.g. SLA violation• Heavy changers: keys with large volume change across two
periods• e.g. DoS attacks, component failures
Goal:• identify heavy keys in real time
Motivation
2
![Page 3: Qun Huang and Patrick P. C. Lee The Chinese University of Hong Kong, Hong Kong INFOCOM’14](https://reader036.vdocuments.net/reader036/viewer/2022062520/56816589550346895dd84870/html5/thumbnails/3.jpg)
Challenges Enormous key space
• e.g., 5-tuple IPv4 flows are drawn from key domain of size • Per-key tracking is infeasible
Line-rate processing• Single machine fails to keep pace with line rate
Seamless distributed detection• Apply single-machine detection in distributed architecture• Open issue:
• How to achieve both scalability and accuracy ?
3
![Page 4: Qun Huang and Patrick P. C. Lee The Chinese University of Hong Kong, Hong Kong INFOCOM’14](https://reader036.vdocuments.net/reader036/viewer/2022062520/56816589550346895dd84870/html5/thumbnails/4.jpg)
Related Works Counter-based techniques
• Misra-Gries algorithm [Misra & Gries 82]; Lossy Counting [Manku et al. 02]; Space Saving [Metwally et al. 05]; Probalistic Lossy Count [Dimitropoulos et al. 08]
• Only address for heavy hitter detection in single machine
Sketch-based techniques• Multi-stage filter [Estan et al. 03]; CGT [Cormode et al. 04]; Reversible
Sketch [Schweller et al. 06]; SeqHash [Tian et al. 07]; Fast Sketch [Liu et al. 12]
• Only work in single machine
Distributed detection• [Cormode et al. 2005]• [Manjhi et al. 2005]• [Yi et al. 2009]• Only address heavy hitter detection
4
![Page 5: Qun Huang and Patrick P. C. Lee The Chinese University of Hong Kong, Hong Kong INFOCOM’14](https://reader036.vdocuments.net/reader036/viewer/2022062520/56816589550346895dd84870/html5/thumbnails/5.jpg)
Our Work
5
LD-Sketch: a new sketching design for heavy key detection in a distributed architecture
A sketch technique for local detection• High accuracy• High speed• Low space complexity
A distributed detection scheme not only achieves scalability but also improves accuracy
Experiments on real-world traces
![Page 6: Qun Huang and Patrick P. C. Lee The Chinese University of Hong Kong, Hong Kong INFOCOM’14](https://reader036.vdocuments.net/reader036/viewer/2022062520/56816589550346895dd84870/html5/thumbnails/6.jpg)
Problem Formulation Perform detection in each time period (epoch) Input data: a stream (key, value) tuple True sum :
• sum of values of key in the time period
True change :• absolute value of difference of in current and last epochs
Heavy hitters: all with Heavy changers: all with Problem: infeasible to track and in real-time with
limited memory6
![Page 7: Qun Huang and Patrick P. C. Lee The Chinese University of Hong Kong, Hong Kong INFOCOM’14](https://reader036.vdocuments.net/reader036/viewer/2022062520/56816589550346895dd84870/html5/thumbnails/7.jpg)
Architecture
7
Remotesite
Remotesite
Remotesite
Remotesite
Remotesite
Datasource
Datasource
Datasource
Datasource
Datasource
WorkerWorkerWorkerLocal
detectionLocal
detectionLocal
detection
Local detection
results Final detection results
Distributed detection
![Page 8: Qun Huang and Patrick P. C. Lee The Chinese University of Hong Kong, Hong Kong INFOCOM’14](https://reader036.vdocuments.net/reader036/viewer/2022062520/56816589550346895dd84870/html5/thumbnails/8.jpg)
Local Detection
For each data item • select a bucket for row by
hashing key with function • update the bucket with the
data item
8
Update phase
Examine the buckets and report heavy keys
Detection phase
key rows
buckets
h1
h2
h𝑟
Structure of rows, with buckets each
LD-Sketch
![Page 9: Qun Huang and Patrick P. C. Lee The Chinese University of Hong Kong, Hong Kong INFOCOM’14](https://reader036.vdocuments.net/reader036/viewer/2022062520/56816589550346895dd84870/html5/thumbnails/9.jpg)
Inside a Bucket
9
Bucket
length:
𝑘𝑒𝑦 1𝑣𝑎𝑙𝑢𝑒𝑘𝑒𝑦 2𝑣𝑎𝑙𝑢𝑒
𝑒𝑚𝑝𝑡𝑦…
Array𝑉 𝑖 , 𝑗
Total sum:
𝑒𝑖 , 𝑗
Error:
Expansion parameter
Basic ideas• Track significant keys in a bucket with array • Increment length based of total sum and parameter • Record error due to dropping insignificant keys
![Page 10: Qun Huang and Patrick P. C. Lee The Chinese University of Hong Kong, Hong Kong INFOCOM’14](https://reader036.vdocuments.net/reader036/viewer/2022062520/56816589550346895dd84870/html5/thumbnails/10.jpg)
Update Bucket with
10
Case 1: • Update directly:
Case 2: but has empty slots• Insert key into , and set
Cases 3 & 4: , is full• Expansion number • Based on and :
• Case 3: decrement keys in • Case 4: expand dynamically
Four cases
![Page 11: Qun Huang and Patrick P. C. Lee The Chinese University of Hong Kong, Hong Kong INFOCOM’14](https://reader036.vdocuments.net/reader036/viewer/2022062520/56816589550346895dd84870/html5/thumbnails/11.jpg)
Case 3: Example
• Bucket
• New data item
Procedure• Step 1: calculate decrement value
Decrement Keys
11
y 5𝐴𝑖 , 𝑗
𝑙𝑖 , 𝑗=1 𝑒𝑖 , 𝑗=2
�̂�={3 ,𝑖𝑓 𝑣 𝑥=35 ,𝑖𝑓 𝑣 𝑥=55 , 𝑖𝑓 𝑣𝑥=8
Step 1
![Page 12: Qun Huang and Patrick P. C. Lee The Chinese University of Hong Kong, Hong Kong INFOCOM’14](https://reader036.vdocuments.net/reader036/viewer/2022062520/56816589550346895dd84870/html5/thumbnails/12.jpg)
Procedure (cont.)• Step 2: Update • Step 3: Update
• , for all • Remove all with • Insert key with if
Decrement Keys
12
emptyAfter
𝑣 𝑥=3
x 3After
y 2After
y 5Before 𝑣 𝑥=5
𝑣 𝑥=8
𝑒𝑖 , 𝑗={5 ,𝑖𝑓 𝑣𝑥=37 , 𝑖𝑓 𝑣𝑥=57 ,𝑖𝑓 𝑣𝑥=8
Step 3
Step 2
![Page 13: Qun Huang and Patrick P. C. Lee The Chinese University of Hong Kong, Hong Kong INFOCOM’14](https://reader036.vdocuments.net/reader036/viewer/2022062520/56816589550346895dd84870/html5/thumbnails/13.jpg)
Case 4: • Add new counters to • Set • Insert key with
Dynamic Expansion
13
𝑙𝑖 , 𝑗=5
𝑦 1𝐴𝑖 , 𝑗Before 𝑦 2𝑦 3
𝑙𝑖 , 𝑗=11
𝐴𝑖 , 𝑗After 𝑥
𝑦 4𝑦 5
𝑦 3𝑦 4𝑦 5𝑦 1𝑦 2
![Page 14: Qun Huang and Patrick P. C. Lee The Chinese University of Hong Kong, Hong Kong INFOCOM’14](https://reader036.vdocuments.net/reader036/viewer/2022062520/56816589550346895dd84870/html5/thumbnails/14.jpg)
Estimate True Sum or Change Estimate in bucket : a pair of values
Estimate in bucket
• Estimate change:
14
Bucket at 1st epoch
and
Bucket at 2nd epoch
and
![Page 15: Qun Huang and Patrick P. C. Lee The Chinese University of Hong Kong, Hong Kong INFOCOM’14](https://reader036.vdocuments.net/reader036/viewer/2022062520/56816589550346895dd84870/html5/thumbnails/15.jpg)
Identify Heavy Key
15
Bucket
Key point: consider keys tracked by buckets Enumerate all buckets
𝑉 𝑖 , 𝑗≥𝜙, check key
Check key for heavy hitters• for all row
Check key for heavy changers• for all row
![Page 16: Qun Huang and Patrick P. C. Lee The Chinese University of Hong Kong, Hong Kong INFOCOM’14](https://reader036.vdocuments.net/reader036/viewer/2022062520/56816589550346895dd84870/html5/thumbnails/16.jpg)
Analysis
16
Let maximum number of heavy keys = On accuracy
• Zero false negative rate• Upper bound of false positive rate
On complexity• time complexity to update a data item: • time complexity to identify heavy keys: • space complexity:
![Page 17: Qun Huang and Patrick P. C. Lee The Chinese University of Hong Kong, Hong Kong INFOCOM’14](https://reader036.vdocuments.net/reader036/viewer/2022062520/56816589550346895dd84870/html5/thumbnails/17.jpg)
Distributed Detection
17
Remotesite
WorkerLocal
detection
Local detection results
Final detection results
Goal• Scalability: reduce
complexity• Accuracy: reduce
false positive rate
Remote Site• How to partition data
streams
Final results• How to aggregate
local detection results
![Page 18: Qun Huang and Patrick P. C. Lee The Chinese University of Hong Kong, Hong Kong INFOCOM’14](https://reader036.vdocuments.net/reader036/viewer/2022062520/56816589550346895dd84870/html5/thumbnails/18.jpg)
Remote Sites Two-step partitioning
For same , the same workers are selected in all remote sites
18
Data item
Worker Worker Worker WorkerWorker
Step 1: select workers based on
Step 2: select one from the workers uniformly
Worker Worker Worker
![Page 19: Qun Huang and Patrick P. C. Lee The Chinese University of Hong Kong, Hong Kong INFOCOM’14](https://reader036.vdocuments.net/reader036/viewer/2022062520/56816589550346895dd84870/html5/thumbnails/19.jpg)
Detection and Aggregation Detection in workers
• For key , each selected worker expects to receive of • Perform local detection in each worker with threshold
Aggregate results
19
All workers report in the local detection
For key
Report as a heavy key
![Page 20: Qun Huang and Patrick P. C. Lee The Chinese University of Hong Kong, Hong Kong INFOCOM’14](https://reader036.vdocuments.net/reader036/viewer/2022062520/56816589550346895dd84870/html5/thumbnails/20.jpg)
Analysis
20
Let• Maximum number of heavy keys = • Total number of worker =
On accuracy• Reduce false positive rate• Introduce a small false negative rate due to unfair
partitioning
On complexity• time complexity to update a data item: • time complexity to identify heavy keys: • space complexity:
![Page 21: Qun Huang and Patrick P. C. Lee The Chinese University of Hong Kong, Hong Kong INFOCOM’14](https://reader036.vdocuments.net/reader036/viewer/2022062520/56816589550346895dd84870/html5/thumbnails/21.jpg)
Experimental Results Trace
• 3G UMTS network in mainland China in December 2010• 1.1 billion packets, 600GB traffic
Approach• Local detection: compare LD-Sketch with CGT, SeqHash, Fast
Sketch, all of which are allocated same amount of memory• Distributed detection: vary the value of
Metrics• Recall:
• (# of returned true heavy keys) / (# of true heavy keys)• Precision:
• (# of returned true heavy keys) / (# of return keys)• Update throughput
21
![Page 22: Qun Huang and Patrick P. C. Lee The Chinese University of Hong Kong, Hong Kong INFOCOM’14](https://reader036.vdocuments.net/reader036/viewer/2022062520/56816589550346895dd84870/html5/thumbnails/22.jpg)
Accuracy of Local Detection: Heavy Changer
22
LD-Sketch achieves 100% recall LD-Sketch has a little lower precision than CGT and
Seqhash, but we can improve with distributed detection
![Page 23: Qun Huang and Patrick P. C. Lee The Chinese University of Hong Kong, Hong Kong INFOCOM’14](https://reader036.vdocuments.net/reader036/viewer/2022062520/56816589550346895dd84870/html5/thumbnails/23.jpg)
Accuracy of Distributed Detection: Heavy Changer
23
When , the precision is similar to local detection When , the precision significantly increases while lose a
little recall
![Page 24: Qun Huang and Patrick P. C. Lee The Chinese University of Hong Kong, Hong Kong INFOCOM’14](https://reader036.vdocuments.net/reader036/viewer/2022062520/56816589550346895dd84870/html5/thumbnails/24.jpg)
Throughput
24
LD-Sketch has a little lower throughput than CGT and Fast Sketch in local detection
LD-Sketch can scale linearly in distributed detection
Local detection Distributed detection
![Page 25: Qun Huang and Patrick P. C. Lee The Chinese University of Hong Kong, Hong Kong INFOCOM’14](https://reader036.vdocuments.net/reader036/viewer/2022062520/56816589550346895dd84870/html5/thumbnails/25.jpg)
Conclusions
25
Propose LD-Sketch, a sketching approach for real-time heavy key detection in a distributed architecture• Composed of local detection and distributed detection
Propose a sketch structure for local detection• High accuracy• Low complexity in space and time• Seamlessly deployed in distributed architecture
Propose a distributed detection scheme• Reduce complexity• Improve accuracy