Transcript
Page 1: Qun  Huang  and Patrick P. C. Lee The Chinese University of Hong Kong, Hong Kong INFOCOM’14

1

LD-Sketch: A Distributed Sketching Design for Accurate and Scalable Anomaly Detection in Network Data Streams

Qun Huang and Patrick P. C. Lee

The Chinese University of Hong Kong, Hong Kong

INFOCOM’14

Page 2: Qun  Huang  and Patrick P. C. Lee The Chinese University of Hong Kong, Hong Kong INFOCOM’14

Network traffic: a stream of (key, value) tuples• Keys: src IPs, five-tuple flows• Value: # of packets, payload bytes

Heavy keys - classical anomalies in network traffic• Heavy hitters: keys with large volume in one period

• e.g. SLA violation• Heavy changers: keys with large volume change across two

periods• e.g. DoS attacks, component failures

Goal:• identify heavy keys in real time

Motivation

2

Page 3: Qun  Huang  and Patrick P. C. Lee The Chinese University of Hong Kong, Hong Kong INFOCOM’14

Challenges Enormous key space

• e.g., 5-tuple IPv4 flows are drawn from key domain of size • Per-key tracking is infeasible

Line-rate processing• Single machine fails to keep pace with line rate

Seamless distributed detection• Apply single-machine detection in distributed architecture• Open issue:

• How to achieve both scalability and accuracy ?

3

Page 4: Qun  Huang  and Patrick P. C. Lee The Chinese University of Hong Kong, Hong Kong INFOCOM’14

Related Works Counter-based techniques

• Misra-Gries algorithm [Misra & Gries 82]; Lossy Counting [Manku et al. 02]; Space Saving [Metwally et al. 05]; Probalistic Lossy Count [Dimitropoulos et al. 08]

• Only address for heavy hitter detection in single machine

Sketch-based techniques• Multi-stage filter [Estan et al. 03]; CGT [Cormode et al. 04]; Reversible

Sketch [Schweller et al. 06]; SeqHash [Tian et al. 07]; Fast Sketch [Liu et al. 12]

• Only work in single machine

Distributed detection• [Cormode et al. 2005]• [Manjhi et al. 2005]• [Yi et al. 2009]• Only address heavy hitter detection

4

Page 5: Qun  Huang  and Patrick P. C. Lee The Chinese University of Hong Kong, Hong Kong INFOCOM’14

Our Work

5

LD-Sketch: a new sketching design for heavy key detection in a distributed architecture

A sketch technique for local detection• High accuracy• High speed• Low space complexity

A distributed detection scheme not only achieves scalability but also improves accuracy

Experiments on real-world traces

Page 6: Qun  Huang  and Patrick P. C. Lee The Chinese University of Hong Kong, Hong Kong INFOCOM’14

Problem Formulation Perform detection in each time period (epoch) Input data: a stream (key, value) tuple True sum :

• sum of values of key in the time period

True change :• absolute value of difference of in current and last epochs

Heavy hitters: all with Heavy changers: all with Problem: infeasible to track and in real-time with

limited memory6

Page 7: Qun  Huang  and Patrick P. C. Lee The Chinese University of Hong Kong, Hong Kong INFOCOM’14

Architecture

7

Remotesite

Remotesite

Remotesite

Remotesite

Remotesite

Datasource

Datasource

Datasource

Datasource

Datasource

WorkerWorkerWorkerLocal

detectionLocal

detectionLocal

detection

Local detection

results Final detection results

Distributed detection

Page 8: Qun  Huang  and Patrick P. C. Lee The Chinese University of Hong Kong, Hong Kong INFOCOM’14

Local Detection

For each data item • select a bucket for row by

hashing key with function • update the bucket with the

data item

8

Update phase

Examine the buckets and report heavy keys

Detection phase

key rows

buckets

h1

h2

h𝑟

Structure of rows, with buckets each

LD-Sketch

Page 9: Qun  Huang  and Patrick P. C. Lee The Chinese University of Hong Kong, Hong Kong INFOCOM’14

Inside a Bucket

9

Bucket

length:

𝑘𝑒𝑦 1𝑣𝑎𝑙𝑢𝑒𝑘𝑒𝑦 2𝑣𝑎𝑙𝑢𝑒

𝑒𝑚𝑝𝑡𝑦…

Array𝑉 𝑖 , 𝑗

Total sum:

𝑒𝑖 , 𝑗

Error:

Expansion parameter

Basic ideas• Track significant keys in a bucket with array • Increment length based of total sum and parameter • Record error due to dropping insignificant keys

Page 10: Qun  Huang  and Patrick P. C. Lee The Chinese University of Hong Kong, Hong Kong INFOCOM’14

Update Bucket with

10

Case 1: • Update directly:

Case 2: but has empty slots• Insert key into , and set

Cases 3 & 4: , is full• Expansion number • Based on and :

• Case 3: decrement keys in • Case 4: expand dynamically

Four cases

Page 11: Qun  Huang  and Patrick P. C. Lee The Chinese University of Hong Kong, Hong Kong INFOCOM’14

Case 3: Example

• Bucket

• New data item

Procedure• Step 1: calculate decrement value

Decrement Keys

11

y 5𝐴𝑖 , 𝑗

𝑙𝑖 , 𝑗=1 𝑒𝑖 , 𝑗=2

�̂�={3 ,𝑖𝑓 𝑣 𝑥=35 ,𝑖𝑓 𝑣 𝑥=55 , 𝑖𝑓 𝑣𝑥=8

Step 1

Page 12: Qun  Huang  and Patrick P. C. Lee The Chinese University of Hong Kong, Hong Kong INFOCOM’14

Procedure (cont.)• Step 2: Update • Step 3: Update

• , for all • Remove all with • Insert key with if

Decrement Keys

12

emptyAfter

𝑣 𝑥=3

x 3After

y 2After

y 5Before 𝑣 𝑥=5

𝑣 𝑥=8

𝑒𝑖 , 𝑗={5 ,𝑖𝑓 𝑣𝑥=37 , 𝑖𝑓 𝑣𝑥=57 ,𝑖𝑓 𝑣𝑥=8

Step 3

Step 2

Page 13: Qun  Huang  and Patrick P. C. Lee The Chinese University of Hong Kong, Hong Kong INFOCOM’14

Case 4: • Add new counters to • Set • Insert key with

Dynamic Expansion

13

𝑙𝑖 , 𝑗=5

𝑦 1𝐴𝑖 , 𝑗Before 𝑦 2𝑦 3

𝑙𝑖 , 𝑗=11

𝐴𝑖 , 𝑗After 𝑥

𝑦 4𝑦 5

𝑦 3𝑦 4𝑦 5𝑦 1𝑦 2

Page 14: Qun  Huang  and Patrick P. C. Lee The Chinese University of Hong Kong, Hong Kong INFOCOM’14

Estimate True Sum or Change Estimate in bucket : a pair of values

Estimate in bucket

• Estimate change:

14

Bucket at 1st epoch

and

Bucket at 2nd epoch

and

Page 15: Qun  Huang  and Patrick P. C. Lee The Chinese University of Hong Kong, Hong Kong INFOCOM’14

Identify Heavy Key

15

Bucket

Key point: consider keys tracked by buckets Enumerate all buckets

𝑉 𝑖 , 𝑗≥𝜙, check key

Check key for heavy hitters• for all row

Check key for heavy changers• for all row

Page 16: Qun  Huang  and Patrick P. C. Lee The Chinese University of Hong Kong, Hong Kong INFOCOM’14

Analysis

16

Let maximum number of heavy keys = On accuracy

• Zero false negative rate• Upper bound of false positive rate

On complexity• time complexity to update a data item: • time complexity to identify heavy keys: • space complexity:

Page 17: Qun  Huang  and Patrick P. C. Lee The Chinese University of Hong Kong, Hong Kong INFOCOM’14

Distributed Detection

17

Remotesite

WorkerLocal

detection

Local detection results

Final detection results

Goal• Scalability: reduce

complexity• Accuracy: reduce

false positive rate

Remote Site• How to partition data

streams

Final results• How to aggregate

local detection results

Page 18: Qun  Huang  and Patrick P. C. Lee The Chinese University of Hong Kong, Hong Kong INFOCOM’14

Remote Sites Two-step partitioning

For same , the same workers are selected in all remote sites

18

Data item

Worker Worker Worker WorkerWorker

Step 1: select workers based on

Step 2: select one from the workers uniformly

Worker Worker Worker

Page 19: Qun  Huang  and Patrick P. C. Lee The Chinese University of Hong Kong, Hong Kong INFOCOM’14

Detection and Aggregation Detection in workers

• For key , each selected worker expects to receive of • Perform local detection in each worker with threshold

Aggregate results

19

All workers report in the local detection

For key

Report as a heavy key

Page 20: Qun  Huang  and Patrick P. C. Lee The Chinese University of Hong Kong, Hong Kong INFOCOM’14

Analysis

20

Let• Maximum number of heavy keys = • Total number of worker =

On accuracy• Reduce false positive rate• Introduce a small false negative rate due to unfair

partitioning

On complexity• time complexity to update a data item: • time complexity to identify heavy keys: • space complexity:

Page 21: Qun  Huang  and Patrick P. C. Lee The Chinese University of Hong Kong, Hong Kong INFOCOM’14

Experimental Results Trace

• 3G UMTS network in mainland China in December 2010• 1.1 billion packets, 600GB traffic

Approach• Local detection: compare LD-Sketch with CGT, SeqHash, Fast

Sketch, all of which are allocated same amount of memory• Distributed detection: vary the value of

Metrics• Recall:

• (# of returned true heavy keys) / (# of true heavy keys)• Precision:

• (# of returned true heavy keys) / (# of return keys)• Update throughput

21

Page 22: Qun  Huang  and Patrick P. C. Lee The Chinese University of Hong Kong, Hong Kong INFOCOM’14

Accuracy of Local Detection: Heavy Changer

22

LD-Sketch achieves 100% recall LD-Sketch has a little lower precision than CGT and

Seqhash, but we can improve with distributed detection

Page 23: Qun  Huang  and Patrick P. C. Lee The Chinese University of Hong Kong, Hong Kong INFOCOM’14

Accuracy of Distributed Detection: Heavy Changer

23

When , the precision is similar to local detection When , the precision significantly increases while lose a

little recall

Page 24: Qun  Huang  and Patrick P. C. Lee The Chinese University of Hong Kong, Hong Kong INFOCOM’14

Throughput

24

LD-Sketch has a little lower throughput than CGT and Fast Sketch in local detection

LD-Sketch can scale linearly in distributed detection

Local detection Distributed detection

Page 25: Qun  Huang  and Patrick P. C. Lee The Chinese University of Hong Kong, Hong Kong INFOCOM’14

Conclusions

25

Propose LD-Sketch, a sketching approach for real-time heavy key detection in a distributed architecture• Composed of local detection and distributed detection

Propose a sketch structure for local detection• High accuracy• Low complexity in space and time• Seamlessly deployed in distributed architecture

Propose a distributed detection scheme• Reduce complexity• Improve accuracy


Top Related