realtime analytics with cassandra

35
Realtime Analytics with Cassandra Acunu Analytics Tom Wilkie, Acunu 21st August 2012

Upload: acunu

Post on 24-Jan-2015

2.014 views

Category:

Technology


0 download

DESCRIPTION

My talk at NoSQL Now 2012

TRANSCRIPT

Page 1: Realtime Analytics with Cassandra

Realtime Analytics with Cassandra

Acunu Analytics

Tom Wilkie, Acunu21st August 2012

Page 2: Realtime Analytics with Cassandra

Analytics

• Motivation / alternatives• What is it?• How does it work?• Approximate Analytics• Whats it good for?

2

Page 3: Realtime Analytics with Cassandra

Analytics

• Motivation / alternatives

• What is it?• How does it work?• Approximate Analytics• Whats it good for?

3

Page 4: Realtime Analytics with Cassandra

Analytics

Why bother?

“Companies that can harness big data will trample data incompetents”

The Economist, May 26th 2011

4

Page 5: Realtime Analytics with Cassandra

Analytics

time page session id duration

... ... ... ...

14:58:03.234 /index.html 248.180.3.40 175

14:58:03.409 /csi/csi/council/freedom.html 248.180.3.40 1234

14:58:03.877 /docs/access/chapter8.txt 99.1.10.178 52

14:58:03.877 /docs/access/chapter8.txt 99.1.10.178 52

14:58:03.877 /docs/access/chapter8.txt 99.1.10.178 52

14:58:03.409 /csi/csi/council/freedom.html 248.180.3.40 1234

14:58:03.877 /docs/access/chapter8.txt 99.1.10.178 52

14:58:03.409 /csi/csi/council/freedom.html 248.180.3.40 1234

14:58:03.877 /docs/access/chapter8.txt 99.1.10.178 52

time page session id duration

... ... ... ...

14:58:03.234 /index.html 248.180.3.40 175

14:58:03.409 /csi/csi/council/freedom.html 248.180.3.40 1234

14:58:03.877 /docs/access/chapter8.txt 99.1.10.178 52

14:58:03.877 /docs/access/chapter8.txt 99.1.10.178 52

14:58:03.877 /docs/access/chapter8.txt 99.1.10.178 52

14:58:03.409 /csi/csi/council/freedom.html 248.180.3.40 1234

14:58:03.877 /docs/access/chapter8.txt 99.1.10.178 52

14:58:03.409 /csi/csi/council/freedom.html 248.180.3.40 1234

14:58:03.877 /docs/access/chapter8.txt 99.1.10.178 52

time page session id duration

... ... ... ...

14:58:03.234 /index.html 248.180.3.40 175

14:58:03.409 /csi/csi/council/freedom.html 248.180.3.40 1234

14:58:03.877 /docs/access/chapter8.txt 99.1.10.178 52

14:58:03.877 /docs/access/chapter8.txt 99.1.10.178 52

14:58:03.877 /docs/access/chapter8.txt 99.1.10.178 52

14:58:03.409 /csi/csi/council/freedom.html 248.180.3.40 1234

14:58:03.877 /docs/access/chapter8.txt 99.1.10.178 52

14:58:03.409 /csi/csi/council/freedom.html 248.180.3.40 1234

14:58:03.877 /docs/access/chapter8.txt 99.1.10.178 52

time page session id duration

... ... ... ...

14:58:03.234 /index.html 248.180.3.40 175

14:58:03.409 /csi/csi/council/freedom.html 248.180.3.40 1234

14:58:03.877 /docs/access/chapter8.txt 99.1.10.178 52

14:58:03.877 /docs/access/chapter8.txt 99.1.10.178 52

14:58:03.877 /docs/access/chapter8.txt 99.1.10.178 52

14:58:03.409 /csi/csi/council/freedom.html 248.180.3.40 1234

14:58:03.877 /docs/access/chapter8.txt 99.1.10.178 52

14:58:03.409 /csi/csi/council/freedom.html 248.180.3.40 1234

14:58:03.877 /docs/access/chapter8.txt 99.1.10.178 52

time page session id duration

... ... ... ...

14:58:03.234 /index.html 248.180.3.40 175

14:58:03.409 /csi/csi/council/freedom.html 248.180.3.40 1234

14:58:03.877 /docs/access/chapter8.txt 99.1.10.178 52

14:58:03.877 /docs/access/chapter8.txt 99.1.10.178 52

14:58:03.877 /docs/access/chapter8.txt 99.1.10.178 52

14:58:03.409 /csi/csi/council/freedom.html 248.180.3.40 1234

14:58:03.877 /docs/access/chapter8.txt 99.1.10.178 52

14:58:03.409 /csi/csi/council/freedom.html 248.180.3.40 1234

14:58:03.877 /docs/access/chapter8.txt 99.1.10.178 52

time page session id duration

... ... ... ...

14:58:03.234 /index.html 248.180.3.40 175

14:58:03.409 /csi/csi/council/freedom.html 248.180.3.40 1234

14:58:03.877 /docs/access/chapter8.txt 99.1.10.178 52

14:58:03.877 /docs/access/chapter8.txt 99.1.10.178 52

14:58:03.877 /docs/access/chapter8.txt 99.1.10.178 52

14:58:03.409 /csi/csi/council/freedom.html 248.180.3.40 1234

14:58:03.877 /docs/access/chapter8.txt 99.1.10.178 52

14:58:03.409 /csi/csi/council/freedom.html 248.180.3.40 1234

14:58:03.877 /docs/access/chapter8.txt 99.1.10.178 52

time page session id duration

... ... ... ...

14:58:03.234 /index.html 248.180.3.40 175

14:58:03.409 /csi/csi/council/freedom.html 248.180.3.40 1234

14:58:03.877 /docs/access/chapter8.txt 99.1.10.178 52

14:58:03.877 /docs/access/chapter8.txt 99.1.10.178 52

14:58:03.877 /docs/access/chapter8.txt 99.1.10.178 52

14:58:03.409 /csi/csi/council/freedom.html 248.180.3.40 1234

14:58:03.877 /docs/access/chapter8.txt 99.1.10.178 52

14:58:03.409 /csi/csi/council/freedom.html 248.180.3.40 1234

14:58:03.877 /docs/access/chapter8.txt 99.1.10.178 52

time page session id duration

... ... ... ...

14:58:03.234 /index.html 248.180.3.40 175

14:58:03.409 /csi/csi/council/freedom.html 248.180.3.40 1234

14:58:03.877 /docs/access/chapter8.txt 99.1.10.178 52

14:58:03.877 /docs/access/chapter8.txt 99.1.10.178 52

14:58:03.877 /docs/access/chapter8.txt 99.1.10.178 52

14:58:03.409 /csi/csi/council/freedom.html 248.180.3.40 1234

14:58:03.877 /docs/access/chapter8.txt 99.1.10.178 52

14:58:03.409 /csi/csi/council/freedom.html 248.180.3.40 1234

14:58:03.877 /docs/access/chapter8.txt 99.1.10.178 52

time page session id duration

... ... ... ...

14:58:03.234 /index.html 248.180.3.40 175

14:58:03.409 /csi/csi/council/freedom.html 248.180.3.40 1234

14:58:03.877 /docs/access/chapter8.txt 99.1.10.178 52

14:58:03.877 /docs/access/chapter8.txt 99.1.10.178 52

14:58:03.877 /docs/access/chapter8.txt 99.1.10.178 52

14:58:03.409 /csi/csi/council/freedom.html 248.180.3.40 1234

14:58:03.877 /docs/access/chapter8.txt 99.1.10.178 52

14:58:03.409 /csi/csi/council/freedom.html 248.180.3.40 1234

14:58:03.877 /docs/access/chapter8.txt 99.1.10.178 52

time page session id duration

... ... ... ...

14:58:03.234 /index.html 248.180.3.40 175

14:58:03.409 /csi/csi/council/freedom.html 248.180.3.40 1234

14:58:03.877 /docs/access/chapter8.txt 99.1.10.178 52

14:58:03.877 /docs/access/chapter8.txt 99.1.10.178 52

14:58:03.877 /docs/access/chapter8.txt 99.1.10.178 52

14:58:03.409 /csi/csi/council/freedom.html 248.180.3.40 1234

14:58:03.877 /docs/access/chapter8.txt 99.1.10.178 52

14:58:03.409 /csi/csi/council/freedom.html 248.180.3.40 1234

14:58:03.877 /docs/access/chapter8.txt 99.1.10.178 52

time page session id duration

... ... ... ...

14:58:03.234 /index.html 248.180.3.40 175

14:58:03.409 /csi/csi/council/freedom.html 248.180.3.40 1234

14:58:03.877 /docs/access/chapter8.txt 99.1.10.178 52

14:58:03.877 /docs/access/chapter8.txt 99.1.10.178 52

14:58:03.877 /docs/access/chapter8.txt 99.1.10.178 52

14:58:03.409 /csi/csi/council/freedom.html 248.180.3.40 1234

14:58:03.877 /docs/access/chapter8.txt 99.1.10.178 52

14:58:03.409 /csi/csi/council/freedom.html 248.180.3.40 1234

14:58:03.877 /docs/access/chapter8.txt 99.1.10.178 52

time page session id duration

... ... ... ...

14:58:03.234 /index.html 248.180.3.40 175

14:58:03.409 /csi/csi/council/freedom.html 248.180.3.40 1234

14:58:03.877 /docs/access/chapter8.txt 99.1.10.178 52

14:58:03.877 /docs/access/chapter8.txt 99.1.10.178 52

14:58:03.877 /docs/access/chapter8.txt 99.1.10.178 52

14:58:03.409 /csi/csi/council/freedom.html 248.180.3.40 1234

14:58:03.877 /docs/access/chapter8.txt 99.1.10.178 52

14:58:03.409 /csi/csi/council/freedom.html 248.180.3.40 1234

14:58:03.877 /docs/access/chapter8.txt 99.1.10.178 52

time page session id duration

... ... ... ...

14:58:03.234 /index.html 248.180.3.40 175

14:58:03.409 /csi/csi/council/freedom.html 248.180.3.40 1234

14:58:03.877 /docs/access/chapter8.txt 99.1.10.178 52

14:58:03.877 /docs/access/chapter8.txt 99.1.10.178 52

14:58:03.877 /docs/access/chapter8.txt 99.1.10.178 52

14:58:03.409 /csi/csi/council/freedom.html 248.180.3.40 1234

14:58:03.877 /docs/access/chapter8.txt 99.1.10.178 52

14:58:03.409 /csi/csi/council/freedom.html 248.180.3.40 1234

14:58:03.877 /docs/access/chapter8.txt 99.1.10.178 52

time page session id duration

... ... ... ...

14:58:03.234 /index.html 248.180.3.40 175

14:58:03.409 /csi/csi/council/freedom.html 248.180.3.40 1234

14:58:03.877 /docs/access/chapter8.txt 99.1.10.178 52

14:58:03.877 /docs/access/chapter8.txt 99.1.10.178 52

14:58:03.877 /docs/access/chapter8.txt 99.1.10.178 52

14:58:03.409 /csi/csi/council/freedom.html 248.180.3.40 1234

14:58:03.877 /docs/access/chapter8.txt 99.1.10.178 52

14:58:03.409 /csi/csi/council/freedom.html 248.180.3.40 1234

14:58:03.877 /docs/access/chapter8.txt 99.1.10.178 52

time page session id duration

... ... ... ...

14:58:03.234 /index.html 248.180.3.40 175

14:58:03.409 /csi/csi/council/freedom.html 248.180.3.40 1234

14:58:03.877 /docs/access/chapter8.txt 99.1.10.178 52

14:58:03.877 /docs/access/chapter8.txt 99.1.10.178 52

14:58:03.877 /docs/access/chapter8.txt 99.1.10.178 52

14:58:03.409 /csi/csi/council/freedom.html 248.180.3.40 1234

14:58:03.877 /docs/access/chapter8.txt 99.1.10.178 52

14:58:03.409 /csi/csi/council/freedom.html 248.180.3.40 1234

14:58:03.877 /docs/access/chapter8.txt 99.1.10.178 52

time page session id duration

... ... ... ...

14:58:03.234 /index.html 248.180.3.40 175

14:58:03.409 /csi/csi/council/freedom.html 248.180.3.40 1234

14:58:03.877 /docs/access/chapter8.txt 99.1.10.178 52

14:58:03.877 /docs/access/chapter8.txt 99.1.10.178 52

14:58:03.877 /docs/access/chapter8.txt 99.1.10.178 52

14:58:03.409 /csi/csi/council/freedom.html 248.180.3.40 1234

14:58:03.877 /docs/access/chapter8.txt 99.1.10.178 52

14:58:03.409 /csi/csi/council/freedom.html 248.180.3.40 1234

14:58:03.877 /docs/access/chapter8.txt 99.1.10.178 52

time page session id duration

... ... ... ...

14:58:03.234 /index.html 248.180.3.40 175

14:58:03.409 /csi/csi/council/freedom.html 248.180.3.40 1234

14:58:03.877 /docs/access/chapter8.txt 99.1.10.178 52

14:58:03.877 /docs/access/chapter8.txt 99.1.10.178 52

14:58:03.877 /docs/access/chapter8.txt 99.1.10.178 52

14:58:03.409 /csi/csi/council/freedom.html 248.180.3.40 1234

14:58:03.877 /docs/access/chapter8.txt 99.1.10.178 52

14:58:03.409 /csi/csi/council/freedom.html 248.180.3.40 1234

14:58:03.877 /docs/access/chapter8.txt 99.1.10.178 52

time page session id duration

... ... ... ...

14:58:03.234 /index.html 248.180.3.40 175

14:58:03.409 /csi/csi/council/freedom.html 248.180.3.40 1234

14:58:03.877 /docs/access/chapter8.txt 99.1.10.178 52

14:58:03.877 /docs/access/chapter8.txt 99.1.10.178 52

14:58:03.877 /docs/access/chapter8.txt 99.1.10.178 52

14:58:03.409 /csi/csi/council/freedom.html 248.180.3.40 1234

14:58:03.877 /docs/access/chapter8.txt 99.1.10.178 52

14:58:03.409 /csi/csi/council/freedom.html 248.180.3.40 1234

14:58:03.877 /docs/access/chapter8.txt 99.1.10.178 52

time page session id duration

... ... ... ...

14:58:03.234 /index.html 248.180.3.40 175

14:58:03.409 /csi/csi/council/freedom.html 248.180.3.40 1234

14:58:03.877 /docs/access/chapter8.txt 99.1.10.178 52

14:58:03.877 /docs/access/chapter8.txt 99.1.10.178 52

14:58:03.877 /docs/access/chapter8.txt 99.1.10.178 52

14:58:03.409 /csi/csi/council/freedom.html 248.180.3.40 1234

14:58:03.877 /docs/access/chapter8.txt 99.1.10.178 52

14:58:03.409 /csi/csi/council/freedom.html 248.180.3.40 1234

14:58:03.877 /docs/access/chapter8.txt 99.1.10.178 52

5

Page 6: Realtime Analytics with Cassandra

Analytics

Live & historicalaggregates... Trends... Drill downs

and roll ups

Combining “big” and “real-time” is hard

6

Page 7: Realtime Analytics with Cassandra

Analytics7

Solution Con

Scalability$$$

Not realtime

Spartan query semantics => complex, DIY solutions

Page 8: Realtime Analytics with Cassandra

Analytics

• Motivation / alternatives• What is it?

• How does it work?• Approximate Analytics• Whats it good for?

8

Page 9: Realtime Analytics with Cassandra

Analytics

• Aggregate incrementally, on the fly• Store live + historical aggregates

events

counterupdates

Acunu Analytics

Click streamSensor data

etc

Page 10: Realtime Analytics with Cassandra

Analytics

{time : TIME(HOUR; MIN; SEC),page : PATH(/),category : STRING,loadTime : LONG

}

{select : ["COUNT", "AVG(loadTime)"],where : “time, ?path”,group : “time, ?category”

}

10

Page 11: Realtime Analytics with Cassandra

Analytics11

Dashboard UI

Page 12: Realtime Analytics with Cassandra

Analytics

• Motivation / alternatives• What is it?• How does it work?

• Approximate Analytics• Whats it good for?

12

Page 13: Realtime Analytics with Cassandra

Analytics

countgrouped by ...

daycount

distinct (session)

count ... geography

... browseravg(duration)

13

Page 14: Realtime Analytics with Cassandra

Analytics

time : TIME(HOUR; MIN; SEC),cust_id : LONG,session_id : LONG,geography : STRING,browser : STRING,load_time : LONG

Data Definition

{ select: “COUNT” patterns: [ { where : “?time”, group : “?time” }, { where : “”, group : “geography” }, { where : “”, group : “browser” } ]}, { select: [“COUNT_DISTINCT(session_id)”, “AVG(load_time)”], where: “time”, group: “”}

QueryPatterns

14

Page 15: Realtime Analytics with Cassandra

Analytics

21:00 all→1345 :00→45 :01→62 :02→87 ...

22:00 all→3221 :00→22 :00→19 :02→104 ...

... ...

UK all→228 user01→1 user14→12 user99→7 ...

US all→354 user01→4 user04→8 user56→17 ...

...

UK, 22:00 all→1904 ...

∅ all→87314 UK→238 US→354 ...

{cust_id: user01,session_id: 102,geography: UK,browser: IE,time: 22:02,

}

15

Page 16: Realtime Analytics with Cassandra

Analytics

21:00 all→1345 :00→45 :01→62 :02→87 ...

22:00 all→3222 :00→22 :00→19 :02→105 ...

... ...

UK all→229 user01→2 user14→12 user99→7 ...

US all→354 user01→4 user04→8 user56→17 ...

...

UK, 22:00 all→1905 ...

∅ all→87315 UK→239 US→354 ...

16

{cust_id: user01,session_id: 102,geography: UK,browser: IE,time: 22:02,

}

Page 17: Realtime Analytics with Cassandra

Analytics

21:00 all→1345 :00→45 :01→62 :02→87 ...

22:00 all→3221 :00→22 :00→19 :02→104 ...

... ...

UK all→228 user01→1 user14→12 user99→7 ...

US all→354 user01→4 user04→8 user56→17 ...

...

UK, 22:00 all→1904 ...

∅ all→87314 UK→238 US→354 ...

17

Page 18: Realtime Analytics with Cassandra

Analytics

21:00 all→1345 :00→45 :01→62 :02→87 ...

22:00 all→3222 :00→22 :01→19 :02→105 ...

... ...

UK all→229 user01→2 user14→12 user99→7 ...

US all→354 user01→4 user04→8 user56→17 ...

...

UK, 22:00 all→1905 ...

∅ all→87315 UK→239 US→354 ...

18

where time 21:00-22:00count(*)

Page 19: Realtime Analytics with Cassandra

Analytics

21:00 all→1345 :00→45 :01→62 :02→87 ...

22:00 all→3222 :00→22 :01→19 :02→105 ...

... ...

UK all→229 user01→2 user14→12 user99→7 ...

US all→354 user01→4 user04→8 user56→17 ...

...

UK, 22:00 all→1905 ...

∅ all→87315 UK→239 US→354 ...

19

where time 21:00-22:00count(*)

where time 22:00-23:00, group by minute

Page 20: Realtime Analytics with Cassandra

Analytics

21:00 all→1345 :00→45 :01→62 :02→87 ...

22:00 all→3222 :00→22 :01→19 :02→105 ...

... ...

UK all→229 user01→2 user14→12 user99→7 ...

US all→354 user01→4 user04→8 user56→17 ...

...

UK, 22:00 all→1905 ...

∅ all→87315 UK→239 US→354 ...

20

where time 21:00-22:00count(*)

where time 22:00-23:00, group by minute

where geography=UK group all by user,

Page 21: Realtime Analytics with Cassandra

Analytics

21:00 all→1345 :00→45 :01→62 :02→87 ...

22:00 all→3222 :00→22 :01→19 :02→105 ...

... ...

UK all→229 user01→2 user14→12 user99→7 ...

US all→354 user01→4 user04→8 user56→17 ...

...

UK, 22:00 all→1905 ...

∅ all→87315 UK→239 US→354 ...

21

where time 21:00-22:00count(*)

where time 22:00-23:00, group by minute

where geography=UK group all by user,

count all

Page 22: Realtime Analytics with Cassandra

Analytics

21:00 all→1345 :00→45 :01→62 :02→87 ...

22:00 all→3222 :00→22 :01→19 :02→105 ...

... ...

UK all→229 user01→2 user14→12 user99→7 ...

US all→354 user01→4 user04→8 user56→17 ...

...

UK, 22:00 all→1905 ...

∅ all→87315 UK→239 US→354 ...

22

where time 21:00-22:00count(*)

where time 22:00-23:00, group by minute

where geography=UK group all by user,

count all

group all by geo

Page 23: Realtime Analytics with Cassandra

Analytics

• Motivation / alternatives• What is it?• How does it work?• Approximate Analytics

• Whats it good for?

23

Page 24: Realtime Analytics with Cassandra

Analytics

Approximate Analytics

Exact

Large ScaleReal-time

24

Page 25: Realtime Analytics with Cassandra

Analytics

Count Distinct

Plan A: keep a list of all the things you’ve seen count them at query time

Quick to update ... but at scale ...Takes lots of spaceTakes a long time to query

25

Page 26: Realtime Analytics with Cassandra

Analytics

Approximate Distinct

xitem

00101001110...

hash max so far

22leading zeroes

y 11010100111... 0 2z 00011101011... 3 3

...

max # leading zeroes seen so far

... to see a max of M takes about 2M items

26

Page 27: Realtime Analytics with Cassandra

Analytics

Approximate Distinct

to reduce var, average over m=2k sub-streams

xitem

00101001110...

hash

0, 0

index, zeroes max so far

0,0,0,0y 11010100111... 3, 1 0,0,1,0z 00011101011... 0, 1 1,0,1,0

...

take the harmonic mean

27

Page 28: Realtime Analytics with Cassandra

Analytics

• Motivation / alternatives• What is it?• How does it work?• Approximate Analytics• Whats it good for?

28

Page 29: Realtime Analytics with Cassandra

Analytics

Was it worth it?

29

Page 30: Realtime Analytics with Cassandra

Analytics

• Ad Hoc: same queries, but without the need to pre-define them

• Geolocation: support for location-based events and queries

• Drill down: see the events that make up any given aggregate

30

What’s Coming?

Page 31: Realtime Analytics with Cassandra

Analytics

• Motivation / alternatives• What is it?• How does it work?• Approximate Analytics• Whats it good for?

31

Page 32: Realtime Analytics with Cassandra

Analytics

Manufacturing

Systems Monitoring

Financial Services

Social Media Ad Analytics

Oil + Gas

Page 33: Realtime Analytics with Cassandra

Analytics

“Up and running in about 4 hours”

“We found out a competitor was scraping our data”

“We keep discovering use cases we hadn’t thought of ”

Page 34: Realtime Analytics with Cassandra

Analytics

Page 35: Realtime Analytics with Cassandra

Analytics

www.acunu.com @acunu

Apache, Apache Cassandra, Cassandra, Hadoop, and the eye and elephant logos are trademarks of the Apache Software Foundation.

35