braid: stream mining through group lag correlations yasushi sakurai spiros papadimitriou christos...

27
BRAID: Stream Mining through Group Lag Correlations Yasushi Sakurai Spiros Papadimitriou Christos Faloutsos SIGMOD 2005

Upload: leonard-mcbride

Post on 02-Jan-2016

217 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: BRAID: Stream Mining through Group Lag Correlations Yasushi Sakurai Spiros Papadimitriou Christos Faloutsos SIGMOD 2005

BRAID: Stream Mining through Group Lag Correlations

Yasushi Sakurai Spiros Papadimitriou Christos Faloutsos

SIGMOD 2005

Page 2: BRAID: Stream Mining through Group Lag Correlations Yasushi Sakurai Spiros Papadimitriou Christos Faloutsos SIGMOD 2005

Outline

Introduction Proposed method EXPERIMENTS CONCLUSIONS

Page 3: BRAID: Stream Mining through Group Lag Correlations Yasushi Sakurai Spiros Papadimitriou Christos Faloutsos SIGMOD 2005

Introduction

Data Stream Lag correlations :

For example: Higher amounts of fluoride in water →

fewer dental cavities some years later

Goal : Monitor multiple numerical streams

determine the pair correlated with lag and the value

Page 4: BRAID: Stream Mining through Group Lag Correlations Yasushi Sakurai Spiros Papadimitriou Christos Faloutsos SIGMOD 2005

Introduction

k numerical sequences X1,…Xk , report all pair of Xi and Xj which Xi follow Xj with lag l

Page 5: BRAID: Stream Mining through Group Lag Correlations Yasushi Sakurai Spiros Papadimitriou Christos Faloutsos SIGMOD 2005

Introduction

Page 6: BRAID: Stream Mining through Group Lag Correlations Yasushi Sakurai Spiros Papadimitriou Christos Faloutsos SIGMOD 2005

Introduction

In this paper, propose BRAID handle data stream Any time processing, and fast Nimble Accurate Small resource consumption

Page 7: BRAID: Stream Mining through Group Lag Correlations Yasushi Sakurai Spiros Papadimitriou Christos Faloutsos SIGMOD 2005

Proposed method

Data stream X : {x1, …, xt, ..., xn} , xn is the most recent value

R(0) : X and Y with the same length n and have zero lag

Pearson ρ Coefficient :

Page 8: BRAID: Stream Mining through Group Lag Correlations Yasushi Sakurai Spiros Papadimitriou Christos Faloutsos SIGMOD 2005

Proposed method

For lag l ,consider common part of X and shifted Y

Page 9: BRAID: Stream Mining through Group Lag Correlations Yasushi Sakurai Spiros Papadimitriou Christos Faloutsos SIGMOD 2005

Proposed method

Page 10: BRAID: Stream Mining through Group Lag Correlations Yasushi Sakurai Spiros Papadimitriou Christos Faloutsos SIGMOD 2005

Proposed method

R(l) : correlation coefficient, X is delayed by l

Score at lag l :

Page 11: BRAID: Stream Mining through Group Lag Correlations Yasushi Sakurai Spiros Papadimitriou Christos Faloutsos SIGMOD 2005

Proposed method

R(l) for large value of lag l ≈ n, the original and shifted time sequence have too few overlapping Restrict maximum lag m to be n/2

Page 12: BRAID: Stream Mining through Group Lag Correlations Yasushi Sakurai Spiros Papadimitriou Christos Faloutsos SIGMOD 2005

Proposed method

Naive solution : At time n, access all value of X and Y,

compute R(l) of all value lag l(=0,1,…) Choose earliest max score above r , or

report no lag The solution based on three major

step

Page 13: BRAID: Stream Mining through Group Lag Correlations Yasushi Sakurai Spiros Papadimitriou Christos Faloutsos SIGMOD 2005

Proposed method

Need some sufficient statistics for R to computed easily Sx(l,n) = : sum of X of length n

Sxx(l,n) = : sum of square X of length n

Sxy(l) = : sum of square X of length n

n

tx1t

2

1

n

ttx

n

lt

ttyx1

1

Page 14: BRAID: Stream Mining through Group Lag Correlations Yasushi Sakurai Spiros Papadimitriou Christos Faloutsos SIGMOD 2005

Proposed method

R(l) is obtained :

Page 15: BRAID: Stream Mining through Group Lag Correlations Yasushi Sakurai Spiros Papadimitriou Christos Faloutsos SIGMOD 2005

Proposed method

R(l) can estimate at any point time, only need to keep track five sufficient statistics

It still needs linear time to compute the cross-correlation function between two sequences

Page 16: BRAID: Stream Mining through Group Lag Correlations Yasushi Sakurai Spiros Papadimitriou Christos Faloutsos SIGMOD 2005

Proposed method

Propose to keep track of only a geometric progression of the lag value : l= 0,1,2,..2i,.

Only O(logn) number to track of, instead of O(n) that “Naïve solution” requires

Space required grow linearly with length n

Page 17: BRAID: Stream Mining through Group Lag Correlations Yasushi Sakurai Spiros Papadimitriou Christos Faloutsos SIGMOD 2005

Proposed method

In order to compute R(l) at any time, keep sliding window of size l, m=n/2 need O(n) space

Instead of operating on original time sequence, we also compute their smoothed version, by computing the means of non-overlapping windows

Page 18: BRAID: Stream Mining through Group Lag Correlations Yasushi Sakurai Spiros Papadimitriou Christos Faloutsos SIGMOD 2005

Proposed method

Window size : power of g=2 X : original time sequence Axh : smoothed version with window of

length 2h

Ax0 : original sequence, Ax1 : consists of n/2 ticks ,..etc

Axh ‘s sufficient statistic need compute every 2h time ticks

At time n, need O(log n) level, for each level compute sufficient statistic

Page 19: BRAID: Stream Mining through Group Lag Correlations Yasushi Sakurai Spiros Papadimitriou Christos Faloutsos SIGMOD 2005
Page 20: BRAID: Stream Mining through Group Lag Correlations Yasushi Sakurai Spiros Papadimitriou Christos Faloutsos SIGMOD 2005

Proposed method

In contrast with small lags, the larger one are sparse Use cubic spline to interpolate the

missing correlation coefficient

Page 21: BRAID: Stream Mining through Group Lag Correlations Yasushi Sakurai Spiros Papadimitriou Christos Faloutsos SIGMOD 2005

Proposed method

Axh(t) : window average at time tick t for level h

Axh(0) ≡ xt

Page 22: BRAID: Stream Mining through Group Lag Correlations Yasushi Sakurai Spiros Papadimitriou Christos Faloutsos SIGMOD 2005

Proposed method

Sufficient statistics:

Page 23: BRAID: Stream Mining through Group Lag Correlations Yasushi Sakurai Spiros Papadimitriou Christos Faloutsos SIGMOD 2005
Page 24: BRAID: Stream Mining through Group Lag Correlations Yasushi Sakurai Spiros Papadimitriou Christos Faloutsos SIGMOD 2005

EXPERIMENTS

Page 25: BRAID: Stream Mining through Group Lag Correlations Yasushi Sakurai Spiros Papadimitriou Christos Faloutsos SIGMOD 2005

EXPERIMENTS

Page 26: BRAID: Stream Mining through Group Lag Correlations Yasushi Sakurai Spiros Papadimitriou Christos Faloutsos SIGMOD 2005

EXPERIMENTS

Page 27: BRAID: Stream Mining through Group Lag Correlations Yasushi Sakurai Spiros Papadimitriou Christos Faloutsos SIGMOD 2005

Conclusion

Proposed BRAID to detection lag correlation on streaming data At any time Low resource consumption High accuracy