improving sketch reconstruction accuracy

20
Improving Sketch Reconstruction Accuracy Using Linear Least Squares Method Gene Moo Lee , Huiya Liu, Young Yoon, Yin Zhang University of Texas at Austin [email protected] IMC 2005, Berkeley, CA, USA

Upload: gene-moo-lee

Post on 05-Jul-2015

53 views

Category:

Science


2 download

TRANSCRIPT

Page 1: Improving Sketch Reconstruction Accuracy

Improving Sketch Reconstruction Accuracy Using Linear Least Squares Method

Gene Moo Lee, Huiya Liu, Young Yoon, Yin ZhangUniversity of Texas at Austin

[email protected]

IMC 2005, Berkeley, CA, USA

Page 2: Improving Sketch Reconstruction Accuracy

IMC’05

Roadmap

●Introduction to Sketch●Problem Definition●Our Approach●Evaluation – Accuracy, Tolerance●Conclusion and Future work

Page 3: Improving Sketch Reconstruction Accuracy

IMC’05

Sketch: a data structure

● Sketch is a “lossy” data structure, which is used to summarize massive data streams○ Avoid per-flow state maintenance○ Using constant memory ○ With small number of memory access

● We can use sketch for○ Heavy-hitter detection, Usage-based Pricing,

Bandwidth Provisioning, DoS attack detection

Page 4: Improving Sketch Reconstruction Accuracy

IMC’05

Sketch: a data structure

1

j

H

0 1 K-1…

Update (key, value): Tj [ hj(k)] += u (for all j)

Say we’ve got an update of (key k, value u)

= hj(k’)

hH(k’)

h1(k’)

hj(k)

hH(k)

h1(k)

Page 5: Improving Sketch Reconstruction Accuracy

IMC’05

Point Estimation

Point Estimation : key • value of the key?

Nontrivial because of collisions!

1

j

H

0 1 K-1…

= hj(k’)

hH(k’)

h1(k’)

hj(k)

hH(k)

h1(k)

Page 6: Improving Sketch Reconstruction Accuracy

IMC’05

Point Estimation

hj(k)

hH(k)

h1(k)

[5] Countmin : key • minj { Tj [ hj(k)] }

Can we do better than this?

1

j

H

0 1 K-1…

= hj(k’)

hH(k’)

h1(k’)

hj(k)

hH(k)

h1(k)

take min

Page 7: Improving Sketch Reconstruction Accuracy

IMC’05

Our Approach: lsquare

●Say we have a sketch and a set of keys○We want to accurately estimate the

accumulated values of those keys

● Construct a linear system Ax=b, based on the information sketch provides

● Find the optimal solution using least squares method [10, 13]

Page 8: Improving Sketch Reconstruction Accuracy

IMC’05

An example: constructing a sketch

●A sketch with H=2, K=3○ H1(j) = j mod 3, H2(j) = (j XOR 3) mod 3

●Total update values for keys○ U0 = 5, U1 = 4, U2 = 3, U3 = 9, U4 = 16

Page 9: Improving Sketch Reconstruction Accuracy

IMC’05

An example: building a linear system

●Now, we want to reconstruct the values of key 3 and 4

X3 + Y = 14, X4 + Y = 20, Y = 3

X3 + Y = 14, X4 + y = 19, Y = 4

Here, y is a variable to capture noise effect

Page 10: Improving Sketch Reconstruction Accuracy

IMC’05

An example: solving the linear system

lsquare:

X3 = 10.5

X4 = 16

countmin:

X3 = min{14, 14} = 14

X4 = min{20, 19} = 19

answer:

U3 = 9

U4 = 16

Page 11: Improving Sketch Reconstruction Accuracy

IMC’05

Evaluation - data sets

May 2002 [Bell02]

Feb 2004 [Tera04]

IP addresses with traffic amounts

Page 12: Improving Sketch Reconstruction Accuracy

IMC’05

Evaluation – lsquare vs countmin

X axis = Top 50 hitters

Y axis = Relative error

Lsquare vs Countmin

Lsquare is more accurate than countmin

Page 13: Improving Sketch Reconstruction Accuracy

IMC’05

Evaluation – Accuracy with Light hitters

X axis = Top 200 hitters

Y axis = Traffic amounts

Actual

Countmin vs Lsquare

Lsquare has good accuracy even for

“light” hitters

Page 14: Improving Sketch Reconstruction Accuracy

IMC’05

Evaluation – Multiple noise variables

X axis = Top 20 hitters

Y axis = Relative error

# of noise variable:

1 vs 31 vs 181

We can get better accuracy using more

noise variables

Page 15: Improving Sketch Reconstruction Accuracy

IMC’05

X axis = sketch config

Y axis = avg relative error

Lsquare vs Countmin

Lsquare is tolerant with limited memory

sketch

Evaluation – Tolerant with limited memory

Page 16: Improving Sketch Reconstruction Accuracy

IMC’05

Conclusion

●We propose a new method for point estimation in sketch data structure○ More accurate!○ Tolerant with small-sized sketch

●Future Direction○ Applying statistical inference in data streaming

Page 17: Improving Sketch Reconstruction Accuracy

IMC’05

Q&A

Thank you for your attention!

Questions?

Contact Info: [email protected]

Page 18: Improving Sketch Reconstruction Accuracy

IMC’05

Evaluation - Time Complexity

●In the experiment, it took just 1~5 seconds to do lsquare○ Time is a function of number

of heavy hitters, which is relatively small number

●Lots of room to further speedup○ exploiting scarcity

Page 19: Improving Sketch Reconstruction Accuracy

IMC’05

How to get the set of keys

● Countmin only computes the value of a single key individually, but we try to find values of a “set” of keys

● Set of keys can be obtained by

○ maintaining a priority queue

○ using reversible sketch

Page 20: Improving Sketch Reconstruction Accuracy

IMC’05

Evaluation – Error Metric

We use a relative error metric and the average of it

n: # of IPs

Uest = estimation

U = real value