reversible sketches for efficient and accurate change detection over network data streams
DESCRIPTION
Reversible Sketches for Efficient and Accurate Change Detection over Network Data Streams. Robert Schweller Ashish Gupta Elliot Parsons Yan Chen Computer Science Department, Northwestern University. Online Change Detection. Network anomalies are common - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Reversible Sketches for Efficient and Accurate Change Detection over Network Data Streams](https://reader031.vdocuments.net/reader031/viewer/2022020417/5681360f550346895d9d857a/html5/thumbnails/1.jpg)
1
Reversible Sketches for Efficient and Accurate Change Detection over
Network Data Streams
Robert Schweller Ashish GuptaElliot ParsonsYan Chen
Computer Science Department, Northwestern University
![Page 2: Reversible Sketches for Efficient and Accurate Change Detection over Network Data Streams](https://reader031.vdocuments.net/reader031/viewer/2022020417/5681360f550346895d9d857a/html5/thumbnails/2.jpg)
2
Online Change Detection• Network anomalies are common
– Flash crowds, failures, DoS, worms, …
Online Detection over Data Streams
• Data Stream: key/update pairs (k,u)
–Heavy hitters (lots of prior work)
–Heavy changes
![Page 3: Reversible Sketches for Efficient and Accurate Change Detection over Network Data Streams](https://reader031.vdocuments.net/reader031/viewer/2022020417/5681360f550346895d9d857a/html5/thumbnails/3.jpg)
3
-first to detect flow-level heavy changes in massive data streams at network traffic speeds.
k-ary sketch [Krishnamurthy, Sen, Zhang, Chen, 2003][Krishnamurthy, Sen, Zhang, Chen, 2003]
1
j
H
0 1 K-1…
……
![Page 4: Reversible Sketches for Efficient and Accurate Change Detection over Network Data Streams](https://reader031.vdocuments.net/reader031/viewer/2022020417/5681360f550346895d9d857a/html5/thumbnails/4.jpg)
4
k-ary sketch [Krishnamurthy, Sen, Zhang, Chen, 2003][Krishnamurthy, Sen, Zhang, Chen, 2003]
1
j
H
0 1 K-1…
……
hj(k)
hH(k)
h1(k)
Update (k, u): Tj [ hj(k)] += u (for all j)
Estimate v(S, k): sum of updates for key k
K
KsumkhT jjj /11
/)]([median
![Page 5: Reversible Sketches for Efficient and Accurate Change Detection over Network Data Streams](https://reader031.vdocuments.net/reader031/viewer/2022020417/5681360f550346895d9d857a/html5/thumbnails/5.jpg)
5
??
![Page 6: Reversible Sketches for Efficient and Accurate Change Detection over Network Data Streams](https://reader031.vdocuments.net/reader031/viewer/2022020417/5681360f550346895d9d857a/html5/thumbnails/6.jpg)
6
??
• Main problem– Cannot efficiently report keys with heavy change
• Our Contribution– Determine set of keys that have “large” estimates in sketch
• Requires very little space:–E.g. 5 hash tables with 16 K buckets = 80 KB–Fits in high speed memory
![Page 7: Reversible Sketches for Efficient and Accurate Change Detection over Network Data Streams](https://reader031.vdocuments.net/reader031/viewer/2022020417/5681360f550346895d9d857a/html5/thumbnails/7.jpg)
7
1
2
3
5
4
“Heavy”
Input:
Output: Set of keys that hash to heavy buckets in majority (or all) hash tables
-Sketch-Threshold
Reverse Sketch Problem
![Page 8: Reversible Sketches for Efficient and Accurate Change Detection over Network Data Streams](https://reader031.vdocuments.net/reader031/viewer/2022020417/5681360f550346895d9d857a/html5/thumbnails/8.jpg)
8
Outline
Streamingdatarecording
k-ary sketch
value
key
Heavychangedetection
k-ary sketch
heavychangekeys
changethreshold
fast
slow
Modularhashing
IP mangling
ReverseHashing
Algorithms
Improve Heavy Change Detection
![Page 9: Reversible Sketches for Efficient and Accurate Change Detection over Network Data Streams](https://reader031.vdocuments.net/reader031/viewer/2022020417/5681360f550346895d9d857a/html5/thumbnails/9.jpg)
9
• Intersect A1, A2, A3, A4, A5
Taking Intersections
H = 5 K = 212 #keys = 232 (IP addresses)
E[false positives] << 1
![Page 10: Reversible Sketches for Efficient and Accurate Change Detection over Network Data Streams](https://reader031.vdocuments.net/reader031/viewer/2022020417/5681360f550346895d9d857a/html5/thumbnails/10.jpg)
10
The problem with simple intersection• Why is this difficult ?
• Each set Ai can be very large !
H = 5 K = 212 #keys = 232 (IP addresses)
|A1| = 232 / 212 = 220
![Page 11: Reversible Sketches for Efficient and Accurate Change Detection over Network Data Streams](https://reader031.vdocuments.net/reader031/viewer/2022020417/5681360f550346895d9d857a/html5/thumbnails/11.jpg)
11
The problem with simple intersection• Why is this difficult ?
• Each set Ai can be very large !
• Solution:
Modular hashing
![Page 12: Reversible Sketches for Efficient and Accurate Change Detection over Network Data Streams](https://reader031.vdocuments.net/reader031/viewer/2022020417/5681360f550346895d9d857a/html5/thumbnails/12.jpg)
12
Modular hashing reduces the set size
32 bits
8 bits
10010100 10101011 10010101 10100011
010 110 001 101
h()
12 bits
![Page 13: Reversible Sketches for Efficient and Accurate Change Detection over Network Data Streams](https://reader031.vdocuments.net/reader031/viewer/2022020417/5681360f550346895d9d857a/html5/thumbnails/13.jpg)
13
Modular hashing reduces the set size
32 bits
8 bits
10010100 10101011 10010101 10100011
h1() h2() h3() h4()
010 110 001 101
010 110 001 101
Greatly reduces size of reverse mapped sets
![Page 14: Reversible Sketches for Efficient and Accurate Change Detection over Network Data Streams](https://reader031.vdocuments.net/reader031/viewer/2022020417/5681360f550346895d9d857a/html5/thumbnails/14.jpg)
14
Modular hashing reduces the set size
32 bits
8 bits
10010100 10101011 10010101 10100011
h1() h2() h3() h4()
010 110 001 101
010 110 001 101
Greatly reduces size of reverse mapped sets
28/23 = 25
![Page 15: Reversible Sketches for Efficient and Accurate Change Detection over Network Data Streams](https://reader031.vdocuments.net/reader031/viewer/2022020417/5681360f550346895d9d857a/html5/thumbnails/15.jpg)
15
1
2
3
5
4
b1
b2
b4
b5
b3
A1: 25 * 25 * 25 * 25
Modular hashing reduces the set size
Intersection:
Only 32 elements per partition
![Page 16: Reversible Sketches for Efficient and Accurate Change Detection over Network Data Streams](https://reader031.vdocuments.net/reader031/viewer/2022020417/5681360f550346895d9d857a/html5/thumbnails/16.jpg)
16
1
2
3
5
4
b1
b2
b4
b5
b3
A1: 25 * 25 * 25 * 25 A2: 25 * 25 * 25 * 25
Modular hashing reduces the set size
Intersection:
Only 32 elements per partition
![Page 17: Reversible Sketches for Efficient and Accurate Change Detection over Network Data Streams](https://reader031.vdocuments.net/reader031/viewer/2022020417/5681360f550346895d9d857a/html5/thumbnails/17.jpg)
17
1
2
3
5
4
b1
b2
b4
b5
b3b3
b1
b2
b4
b5
Handling Multiple Intersections…
2H different intersections
Much more difficult - Need sophisticated Reverse Hashing algorithms ( see tech report )
![Page 18: Reversible Sketches for Efficient and Accurate Change Detection over Network Data Streams](https://reader031.vdocuments.net/reader031/viewer/2022020417/5681360f550346895d9d857a/html5/thumbnails/18.jpg)
18
Problem: Too many collisions
129.105.56.23 129.105.56.28129.105.56.109129.105.56.35129.105.56.98 ...
7 . 4 . 0 . *
32 bits 12 bits
![Page 19: Reversible Sketches for Efficient and Accurate Change Detection over Network Data Streams](https://reader031.vdocuments.net/reader031/viewer/2022020417/5681360f550346895d9d857a/html5/thumbnails/19.jpg)
19
Problem: Too many collisions
129.105.56.23 129.105.56.28129.105.56.109129.105.56.35129.105.56.98 ...
7 . 4 . 0 . *
32 bits 12 bits
IP Mangling
Solution:
![Page 20: Reversible Sketches for Efficient and Accurate Change Detection over Network Data Streams](https://reader031.vdocuments.net/reader031/viewer/2022020417/5681360f550346895d9d857a/html5/thumbnails/20.jpg)
20
IP-mangling
![Page 21: Reversible Sketches for Efficient and Accurate Change Detection over Network Data Streams](https://reader031.vdocuments.net/reader031/viewer/2022020417/5681360f550346895d9d857a/html5/thumbnails/21.jpg)
21
Invertible Modular Linear Equation
f(x) a·x mod n
To be invertible: Must be relatively prime
• a is odd, chosen randomly
![Page 22: Reversible Sketches for Efficient and Accurate Change Detection over Network Data Streams](https://reader031.vdocuments.net/reader031/viewer/2022020417/5681360f550346895d9d857a/html5/thumbnails/22.jpg)
22
Modular Hashing
Optimal Hashing
![Page 23: Reversible Sketches for Efficient and Accurate Change Detection over Network Data Streams](https://reader031.vdocuments.net/reader031/viewer/2022020417/5681360f550346895d9d857a/html5/thumbnails/23.jpg)
23
Modular Hashing
Modular Hashing with IP Mangling Optimal Hashing
![Page 24: Reversible Sketches for Efficient and Accurate Change Detection over Network Data Streams](https://reader031.vdocuments.net/reader031/viewer/2022020417/5681360f550346895d9d857a/html5/thumbnails/24.jpg)
24
Recap:
Streamingdatarecording
reversiblek-ary
sketch
value storedvalue
Modularhashing
IP manglingkey
Heavychangedetection
reversiblek-ary
sketch
Reversehashing
ReverseIP mangling
heavychangekeys
changethreshold
)( loglog/1 nn
)loglog
log(
n
n
![Page 25: Reversible Sketches for Efficient and Accurate Change Detection over Network Data Streams](https://reader031.vdocuments.net/reader031/viewer/2022020417/5681360f550346895d9d857a/html5/thumbnails/25.jpg)
25
Evaluation• Traffic traces from Northwestern University edge router
– Each 5 min interval average traffic 7.5 GB in each interval
• Compared with Ground Truth• 6 hash tables, 4K buckets each, totally 192KB memory• Up to 140 true heavy change keys in 1.5 seconds
– Over 95% TPP– Less than 2% FPP
• All missing changes are due to boundary effects
![Page 26: Reversible Sketches for Efficient and Accurate Change Detection over Network Data Streams](https://reader031.vdocuments.net/reader031/viewer/2022020417/5681360f550346895d9d857a/html5/thumbnails/26.jpg)
26
Conclusions/ Future Work
• Sketches: efficient summary structures • Our contribution: Reversible Sketches
– efficient online detection of keys with heavy changes
Work in Progress (see tech report)
• Improved reverse hashing• Statistical guarantee on detection accuracy• More advanced applications:
– Hierarchical change detection• E.g. 129.105.100.* shows a big change !
![Page 27: Reversible Sketches for Efficient and Accurate Change Detection over Network Data Streams](https://reader031.vdocuments.net/reader031/viewer/2022020417/5681360f550346895d9d857a/html5/thumbnails/27.jpg)
27
See tech report for more!
http://list.cs.northwestern.edu
Thank you !