1 a deterministic algorithm for summarizing asynchronous streams over a sliding window costas busch...

58
1 A Deterministic Algorithm for Summarizing Asynchronous Streams over a Sliding Window Costas Busch Rensselaer Polytechnic Institute Srikanta Tirthapura Iowa State University

Upload: issac-gillard

Post on 15-Dec-2015

221 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: 1 A Deterministic Algorithm for Summarizing Asynchronous Streams over a Sliding Window Costas Busch Rensselaer Polytechnic Institute Srikanta Tirthapura

1

A Deterministic Algorithm forSummarizing Asynchronous

Streamsover a Sliding Window

Costas BuschRensselaer Polytechnic Institute

Srikanta TirthapuraIowa State University

Page 2: 1 A Deterministic Algorithm for Summarizing Asynchronous Streams over a Sliding Window Costas Busch Rensselaer Polytechnic Institute Srikanta Tirthapura

2

Introduction

Algorithm

Analysis

Outline of Talk

Page 3: 1 A Deterministic Algorithm for Summarizing Asynchronous Streams over a Sliding Window Costas Busch Rensselaer Polytechnic Institute Srikanta Tirthapura

3

1 C

Time

1tData stream:

For simplicity assumeunit valued elements

2t 3t 4t 5t1v 2v 3v

4v 5v

Page 4: 1 A Deterministic Algorithm for Summarizing Asynchronous Streams over a Sliding Window Costas Busch Rensselaer Polytechnic Institute Srikanta Tirthapura

4

1 C

Current time

Most recent time window of durationW

Compute the sum of elements with time stamps in time window ],[ CWC

Goal:

1tData stream:

2t 3t 4t 5t1v 2v 3v

4v 5v

CtWCi

i

v

Page 5: 1 A Deterministic Algorithm for Summarizing Asynchronous Streams over a Sliding Window Costas Busch Rensselaer Polytechnic Institute Srikanta Tirthapura

5

Example I: All packets on a network link, maintain the number of different ip sources in the last one hour

Example II: Large database, continuously maintain averages and frequency moments

Page 6: 1 A Deterministic Algorithm for Summarizing Asynchronous Streams over a Sliding Window Costas Busch Rensselaer Polytechnic Institute Srikanta Tirthapura

6

Synchronous streamti: In ascending order

Asynchronous streamti: No order guaranteed

1tData stream:

2t 3t 4t 5t1v 2v 3v

4v 5v

Page 7: 1 A Deterministic Algorithm for Summarizing Asynchronous Streams over a Sliding Window Costas Busch Rensselaer Polytechnic Institute Srikanta Tirthapura

7

Why Asynchronous Data Streams?

NetworkSynchronous stream Asynchronous stream

Synchronous

SynchronousAsynchronous

Merge w/o control

Network delay & multi-path routing

Page 8: 1 A Deterministic Algorithm for Summarizing Asynchronous Streams over a Sliding Window Costas Busch Rensselaer Polytechnic Institute Srikanta Tirthapura

8

Processing Requirements:•One pass processing•Small workspace: poly-logarithmic in the size of data•Fast processing time per element•Approximate answers are ok

Page 9: 1 A Deterministic Algorithm for Summarizing Asynchronous Streams over a Sliding Window Costas Busch Rensselaer Polytechnic Institute Srikanta Tirthapura

9

Our results:

A deterministic data aggregation algorithm

Time:

W

BOlog

log

Space:

BWWBO

loglogloglog

SSX ||

Relative Error:

Page 10: 1 A Deterministic Algorithm for Summarizing Asynchronous Streams over a Sliding Window Costas Busch Rensselaer Polytechnic Institute Srikanta Tirthapura

10

Previous Work:

[Datar, Gionis, Indyk, Motwani. SIAM Journal on Computing, 2002]

Deterministic, Synchronous

[Tirthapura, Xu, Busch, PODC, 2006]

Randomized, Asynchronous

Merging buckets

Random sampling

Page 11: 1 A Deterministic Algorithm for Summarizing Asynchronous Streams over a Sliding Window Costas Busch Rensselaer Polytechnic Institute Srikanta Tirthapura

11

Introduction

Algorithm

Analysis

Outline of Talk

Page 12: 1 A Deterministic Algorithm for Summarizing Asynchronous Streams over a Sliding Window Costas Busch Rensselaer Polytechnic Institute Srikanta Tirthapura

12

1 C

Current time

Time

1t 2tData stream:

For simplicity assumeunit valued elements

3t 4t 5t 6t

Page 13: 1 A Deterministic Algorithm for Summarizing Asynchronous Streams over a Sliding Window Costas Busch Rensselaer Polytechnic Institute Srikanta Tirthapura

13

1 C

Current time

Most recent time window of durationW

1t 2tData stream:

3t 4t 5t 6t

Compute the sum of elements with time stamps in time window ],[ CWC

Goal:

Page 14: 1 A Deterministic Algorithm for Summarizing Asynchronous Streams over a Sliding Window Costas Busch Rensselaer Polytechnic Institute Srikanta Tirthapura

14

1

Divide time into periods of durationW

W W W W W

W W2 W3 W4

Page 15: 1 A Deterministic Algorithm for Summarizing Asynchronous Streams over a Sliding Window Costas Busch Rensselaer Polytechnic Institute Srikanta Tirthapura

15

1 W W2 W3 W4

The sliding window may span at most two time periods

C

Wsliding window

T

Page 16: 1 A Deterministic Algorithm for Summarizing Asynchronous Streams over a Sliding Window Costas Busch Rensselaer Polytechnic Institute Srikanta Tirthapura

16

1 W W2 W3 W4C

Wsliding window

leftS rightS

21 SSS

Sum can be written as two sub-sumsIn two time periods

T

Page 17: 1 A Deterministic Algorithm for Summarizing Asynchronous Streams over a Sliding Window Costas Busch Rensselaer Polytechnic Institute Srikanta Tirthapura

17

1 W W2 W3 W4C

Wsliding window

leftD rightD

Data structure that maintains an estimate ofIn left time period

leftS

TleftS

rightS

Page 18: 1 A Deterministic Algorithm for Summarizing Asynchronous Streams over a Sliding Window Costas Busch Rensselaer Polytechnic Institute Srikanta Tirthapura

18

1 WT

Without loss of Generality,Consider data structure in time period ],1[ W

leftS

leftD

leftD

Page 19: 1 A Deterministic Algorithm for Summarizing Asynchronous Streams over a Sliding Window Costas Busch Rensselaer Polytechnic Institute Srikanta Tirthapura

19

leftD

1D

2D

LD

Data structure consists of various levels

L2is an upper bound of the sum in a period

Page 20: 1 A Deterministic Algorithm for Summarizing Asynchronous Streams over a Sliding Window Costas Busch Rensselaer Polytechnic Institute Srikanta Tirthapura

20

1 W

Counts up to elements 12 i

0Time period

Bucket at Level 1i

Consider level iD

Page 21: 1 A Deterministic Algorithm for Summarizing Asynchronous Streams over a Sliding Window Costas Busch Rensselaer Polytechnic Institute Srikanta Tirthapura

21

1 W1

Increase counter value

Wt 11Stream:1t

Page 22: 1 A Deterministic Algorithm for Summarizing Asynchronous Streams over a Sliding Window Costas Busch Rensselaer Polytechnic Institute Srikanta Tirthapura

22

1 W2

Increase counter value

Wt 21Stream:1t 2t

Page 23: 1 A Deterministic Algorithm for Summarizing Asynchronous Streams over a Sliding Window Costas Busch Rensselaer Polytechnic Institute Srikanta Tirthapura

23

1 W

Wt 31

3

Stream:

Increase counter value

1t 2t 3t

Page 24: 1 A Deterministic Algorithm for Summarizing Asynchronous Streams over a Sliding Window Costas Busch Rensselaer Polytechnic Institute Srikanta Tirthapura

24

1 W12 1 i

Increase counter value

Wt i 12 11Stream:

1t 2t 3t 12 1it......

Page 25: 1 A Deterministic Algorithm for Summarizing Asynchronous Streams over a Sliding Window Costas Busch Rensselaer Polytechnic Institute Srikanta Tirthapura

25

Wt i 12 11Stream:

1t 2t 3t 12 1it...... 12 it

1 W

12

W1

2

W W

12 i

i2 i2

Counter threshold of reached 12 i

Split bucket

Page 26: 1 A Deterministic Algorithm for Summarizing Asynchronous Streams over a Sliding Window Costas Busch Rensselaer Polytechnic Institute Srikanta Tirthapura

26

Wt i 12 11Stream:

1t 2t 3t 12 1it...... 12 it

12

W1

2

W W

i2 i2

New buckets have threshold also 12 i

Page 27: 1 A Deterministic Algorithm for Summarizing Asynchronous Streams over a Sliding Window Costas Busch Rensselaer Polytechnic Institute Srikanta Tirthapura

27

21

12 1

Wt i

Stream:1t 2t 3t 12 1it...... 12 it

12 1it

12

W1

2

W W12 i i2

Increase appropriate bucket

Page 28: 1 A Deterministic Algorithm for Summarizing Asynchronous Streams over a Sliding Window Costas Busch Rensselaer Polytechnic Institute Srikanta Tirthapura

28

WtW

i 22 1

2Stream:

1t 2t 3t 12 1it...... 12 it12 1it

12

W1

2

W W12 i 12 i

Increase appropriate bucket

22 1it

Page 29: 1 A Deterministic Algorithm for Summarizing Asynchronous Streams over a Sliding Window Costas Busch Rensselaer Polytechnic Institute Srikanta Tirthapura

29

Stream:1t 2t 3t 12 1it...... 12 it

12 1it

12

W1

2

W W22 i 12 i

Increase appropriate bucket

22 1it32 1it

21

32 1

Wt i

Page 30: 1 A Deterministic Algorithm for Summarizing Asynchronous Streams over a Sliding Window Costas Busch Rensselaer Polytechnic Institute Srikanta Tirthapura

30

12

W1

2

W W

12

W

43W

14

3

W W

1x 12 i

i2 i2

1t ......mtStream:

Split bucket

21

2W

tW

m

Page 31: 1 A Deterministic Algorithm for Summarizing Asynchronous Streams over a Sliding Window Costas Busch Rensselaer Polytechnic Institute Srikanta Tirthapura

31

12

W

12

W

43W

14

3

W W

i2 i2

1t ......mtStream:

1x

Page 32: 1 A Deterministic Algorithm for Summarizing Asynchronous Streams over a Sliding Window Costas Busch Rensselaer Polytechnic Institute Srikanta Tirthapura

32

12

W

12

W

43W

14

3

W W12 i i2

1t ......mtStream: 1mt

Increase appropriate bucket

1x

43

12 1

Wt

Wm

Page 33: 1 A Deterministic Algorithm for Summarizing Asynchronous Streams over a Sliding Window Costas Busch Rensselaer Polytechnic Institute Srikanta Tirthapura

33

12

W

12

W

43W

14

3

W W

1x

12 i4x

1t ......mtStream: 1mt

12

W

43W

12

W

43W

85W

18

5

W

i2 i2

......mt

Split bucket

Page 34: 1 A Deterministic Algorithm for Summarizing Asynchronous Streams over a Sliding Window Costas Busch Rensselaer Polytechnic Institute Srikanta Tirthapura

34

12

W

14

3

W W

1x

4x

1t ......mtStream: 1mt

12

W

43W

85W

18

5

W

i2 i2

......mt

Page 35: 1 A Deterministic Algorithm for Summarizing Asynchronous Streams over a Sliding Window Costas Busch Rensselaer Polytechnic Institute Srikanta Tirthapura

35

1 W

12

W1

2

W W

12

W

43W

14

3

W W

12

W

43W

85W

18

5

W

12 i

12 i

12 i

1x

4x

2x 3x

122 ik

i x

Splitting Tree

Page 36: 1 A Deterministic Algorithm for Summarizing Asynchronous Streams over a Sliding Window Costas Busch Rensselaer Polytechnic Institute Srikanta Tirthapura

36

1 W

Leaf buckets of duration 1 are not split any further

1t 11 t2t 12 t

12 i

Max depth =

Wlog

Page 37: 1 A Deterministic Algorithm for Summarizing Asynchronous Streams over a Sliding Window Costas Busch Rensselaer Polytechnic Institute Srikanta Tirthapura

37

1 W

12 i

The initial bucket may be split intomany buckets

Leaf buckets

Page 38: 1 A Deterministic Algorithm for Summarizing Asynchronous Streams over a Sliding Window Costas Busch Rensselaer Polytechnic Institute Srikanta Tirthapura

38

1 W

12 i

Due to space limitations we only keep the last buckets

Wa log2

Leaf buckets

Page 39: 1 A Deterministic Algorithm for Summarizing Asynchronous Streams over a Sliding Window Costas Busch Rensselaer Polytechnic Institute Srikanta Tirthapura

39

1 WT

Suppose we want to find the sum of elements in time period ],[ WT

S

S

Page 40: 1 A Deterministic Algorithm for Summarizing Asynchronous Streams over a Sliding Window Costas Busch Rensselaer Polytechnic Institute Srikanta Tirthapura

40

1 WT

of splitting threshold

a

a

a

a

12

22

k2

12 k

Consider various levels

S

Page 41: 1 A Deterministic Algorithm for Summarizing Asynchronous Streams over a Sliding Window Costas Busch Rensselaer Polytechnic Institute Srikanta Tirthapura

41

1 WT

a

a

a

12

22

12 k

First level with a leaf bucketthat intersects timeline

a

k2

S

Page 42: 1 A Deterministic Algorithm for Summarizing Asynchronous Streams over a Sliding Window Costas Busch Rensselaer Polytechnic Institute Srikanta Tirthapura

42

1 WT

a

k2

Estimate of S:

1x 2x zx

zxxxX 21

Consider buckets on right of timelineaz

S

Page 43: 1 A Deterministic Algorithm for Summarizing Asynchronous Streams over a Sliding Window Costas Busch Rensselaer Polytechnic Institute Srikanta Tirthapura

43

1 WT

a

a

a

12

22

12 k

First level with a leaf bucketOn right timeline

a

k2

OR

S

Page 44: 1 A Deterministic Algorithm for Summarizing Asynchronous Streams over a Sliding Window Costas Busch Rensselaer Polytechnic Institute Srikanta Tirthapura

44

Introduction

Algorithm

Analysis

Outline of Talk

Page 45: 1 A Deterministic Algorithm for Summarizing Asynchronous Streams over a Sliding Window Costas Busch Rensselaer Polytechnic Institute Srikanta Tirthapura

45

Suppose that we use level in order to compute the estimate

12 i

1 WTS

Page 46: 1 A Deterministic Algorithm for Summarizing Asynchronous Streams over a Sliding Window Costas Busch Rensselaer Polytechnic Institute Srikanta Tirthapura

46

ktStream:

1 bb xx

lt rt

A data element is counted in the appropriate bucket

Consider splitting threshold level12 i

Page 47: 1 A Deterministic Algorithm for Summarizing Asynchronous Streams over a Sliding Window Costas Busch Rensselaer Polytechnic Institute Srikanta Tirthapura

47

kt

Stream: kt

We can assume that the element is placed in the respective bucket

lt rt

rkl ttt

Page 48: 1 A Deterministic Algorithm for Summarizing Asynchronous Streams over a Sliding Window Costas Busch Rensselaer Polytechnic Institute Srikanta Tirthapura

48

Stream: kt

We can assume that when bucket splits the element is placed in an arbitrary child bucket

lt rt

lt rtkt

2rl tt

12

rl tt

12 i

i2i2

Page 49: 1 A Deterministic Algorithm for Summarizing Asynchronous Streams over a Sliding Window Costas Busch Rensselaer Polytechnic Institute Srikanta Tirthapura

49

Stream: kt

lt rt

lt rtkt

2rl tt

12

rl tt

12 i

i2 i2

2rl

kl

tttt

If: GOOD!

Element counted in correct bucket

Page 50: 1 A Deterministic Algorithm for Summarizing Asynchronous Streams over a Sliding Window Costas Busch Rensselaer Polytechnic Institute Srikanta Tirthapura

50

Stream: kt

lt rt

lt rt2rl tt

12

rl tt

12 i

i2 i2

rkrl tt

tt

1

2If: BAD!

Element counted in wrong bucket

kt

Page 51: 1 A Deterministic Algorithm for Summarizing Asynchronous Streams over a Sliding Window Costas Busch Rensselaer Polytechnic Institute Srikanta Tirthapura

51

1 WT

ktConsider Leaf Buckets

If WtT k

1 W

GOOD!

S

Page 52: 1 A Deterministic Algorithm for Summarizing Asynchronous Streams over a Sliding Window Costas Busch Rensselaer Polytechnic Institute Srikanta Tirthapura

52

1 WT

ktConsider Leaf Buckets

If Ttk

1 W

BAD!

Element counted in wrong bucket

S

Page 53: 1 A Deterministic Algorithm for Summarizing Asynchronous Streams over a Sliding Window Costas Busch Rensselaer Polytechnic Institute Srikanta Tirthapura

53

|||| 21 ZZSX

:elements of left part counted on right

1 WT

ktConsider Leaf Buckets

1 W

S

1Z

2Z :elements of right part counted on left

Page 54: 1 A Deterministic Algorithm for Summarizing Asynchronous Streams over a Sliding Window Costas Busch Rensselaer Polytechnic Institute Srikanta Tirthapura

54

kt

1 W

kt1Z

elements of left part counted on right

T1 W

Must have been initially inserted in one of these buckets

Page 55: 1 A Deterministic Algorithm for Summarizing Asynchronous Streams over a Sliding Window Costas Busch Rensselaer Polytechnic Institute Srikanta Tirthapura

55

Since tree depth Wlog

)log2(|| 1 WOZ i

Page 56: 1 A Deterministic Algorithm for Summarizing Asynchronous Streams over a Sliding Window Costas Busch Rensselaer Polytechnic Institute Srikanta Tirthapura

56

Since tree depth Wlog

)log2(|| 1 WOZ i

Similarly, we can prove

)log2(|| 2 WOZ i

Therefore: )log2(|||||||| 21 WOZZSX i

Page 57: 1 A Deterministic Algorithm for Summarizing Asynchronous Streams over a Sliding Window Costas Busch Rensselaer Polytechnic Institute Srikanta Tirthapura

57

It can be proven

Wa log2

Since

)log2( WS i

Page 58: 1 A Deterministic Algorithm for Summarizing Asynchronous Streams over a Sliding Window Costas Busch Rensselaer Polytechnic Institute Srikanta Tirthapura

58

It can be proven

Wa log2

Since

Combined with )log2(|| WOSX i

)log2( WS i

We obtain relative error : S

SX ||