acunu analytics

27
Acunu Analytics Simple, powerful, real-time Andrew Byde Principal Scientist Tuesday, 27 March 2012

Upload: acunu

Post on 20-Jan-2015

3.728 views

Category:

Technology


0 download

DESCRIPTION

 

TRANSCRIPT

Page 1: Acunu Analytics

Acunu AnalyticsSimple, powerful, real-time

Andrew BydePrincipal Scientist

Tuesday, 27 March 2012

Page 2: Acunu Analytics

Making big data useful

time page session id duration

... ... ... ...

14:58:03.234 /index.html 248.180.3.40 898

14:58:03.234 /csi/csi/council/freedom.html 248.180.3.40 1234

14:58:03.234 /docs/access/chapter8.txt 99.1.10.178 52

... ... ... ...

x billions

How do we turn this ...

Tuesday, 27 March 2012

Page 3: Acunu Analytics

IntroductionMY

into this...

Tuesday, 27 March 2012

Page 4: Acunu Analytics

or this...

Tuesday, 27 March 2012

Page 5: Acunu Analytics

or this...

Tuesday, 27 March 2012

Page 6: Acunu Analytics

• SQL + materialised views

Tuesday, 27 March 2012

Page 7: Acunu Analytics

• SQL + materialised views

... would be nice if it scaled

Tuesday, 27 March 2012

Page 8: Acunu Analytics

• Hadoop/Map-Reduce can do anything

Tuesday, 27 March 2012

Page 9: Acunu Analytics

• Hadoop/Map-Reduce can do anything

Not real-time

Inefficient re-computation

Tuesday, 27 March 2012

Page 10: Acunu Analytics

• Hadoop/Map-Reduce can do anything

Not real-time

Inefficient re-computation

(100TB on a 100 node cluster is > 3 hours)

Tuesday, 27 March 2012

Page 11: Acunu Analytics

• Cassandra counters are pretty cool

Tuesday, 27 March 2012

Page 12: Acunu Analytics

• Cassandra counters are pretty cool

but the query semantics is spartan

=> DIY solutions

Tuesday, 27 March 2012

Page 13: Acunu Analytics

Acunu Analytics

• Simple, real-time, incremental analytics

• push processing into ingest phase

CassandraeventAA

counterupdates

Tuesday, 27 March 2012

Page 14: Acunu Analytics

Acunu Analytics

• Event template, e.g.,

• specifies “blow-up” strategy according to supported queries

select : ["COUNT", "AVG(loadTime)"],type : { time : [TIME(HOUR; MIN; SEC), ?, 0], page : PATH(/), loadTime : [LONG, 0, 0]}

Tuesday, 27 March 2012

Page 15: Acunu Analytics

Acunu Analytics

21:00 all→1345 :00→45 :01→62 :02→87 ...

22:00 all→3221 :00→22 :00→19 :02→104 ...

... ...

click all→228 user01→1 user14→12 user99→7 ...

open all→354 user01→4 user04→8 user56→17 ...

...

click, 22:00 all→1904 ...

∅ all→87314 click→238 open→354 ...

type : { time : TIME(HOUR; MIN), category : STRING, user : STRING}

Tuesday, 27 March 2012

Page 16: Acunu Analytics

Acunu Analytics

21:00 all→1345 :00→45 :01→62 :02→87 ...

22:00 all→3221 :00→22 :00→19 :02→104 ...

... ...

click all→228 user01→1 user14→12 user99→7 ...

open all→354 user01→4 user04→8 user56→17 ...

...

click, 22:00 all→1904 ...

∅ all→87314 click→238 open→354 ...

(22:02, “click”, user01)

type : { time : TIME(HOUR; MIN), category : STRING, user : STRING}

Tuesday, 27 March 2012

Page 17: Acunu Analytics

Acunu Analytics

(22:02, “click”, user01)

21:00 all→1345 :00→45 :01→62 :02→87 ...

22:00 all→3222 :00→22 :00→19 :02→105 ...

... ...

click all→229 user01→2 user14→12 user99→7 ...

open all→354 user01→4 user04→8 user56→17 ...

...

click, 22:00 all→1905 ...

∅ all→87315 click→239 open→355 ...

type : { time : TIME(HOUR; MIN), category : STRING, user : STRING}

Tuesday, 27 March 2012

Page 18: Acunu Analytics

Acunu Analytics

21:00 all→1345 :00→45 :01→62 :02→87 ...

22:00 all→3222 :00→22 :00→19 :02→105 ...

... ...

click all→229 user01→2 user14→12 user99→7 ...

open all→354 user01→4 user04→8 user56→17 ...

...

click, 22:00 all→1905 ...

∅ all→87315 click→239 open→355 ...

Pre-assembled queries, e.g. ...

count all

group all by category

group all by user, where category=click

for 22:00-23:00, group by minute

Tuesday, 27 March 2012

Page 19: Acunu Analytics

Summary

• Simple, real-time, incremental analytics

• work done on ingest

• sum, count, distinct, avg, stddev, min-max etc

• time + hierarchy bucketing

• efficient ‘group’ semantics

• works with Apache Cassandra

Tuesday, 27 March 2012

Page 20: Acunu Analytics

Early Access Program

[email protected]

Tuesday, 27 March 2012

Page 21: Acunu Analytics

Tuesday, 27 March 2012

Page 22: Acunu Analytics

count

Tuesday, 27 March 2012

Page 23: Acunu Analytics

count distinct

(session)

count

Tuesday, 27 March 2012

Page 24: Acunu Analytics

count distinct

(session)

count

avg(duration)

Tuesday, 27 March 2012

Page 25: Acunu Analytics

countgrouped by ...

daycount

distinct (session)

count

avg(duration)

Tuesday, 27 March 2012

Page 26: Acunu Analytics

countgrouped by ...

daycount

distinct (session)

count ... geography

avg(duration)

Tuesday, 27 March 2012

Page 27: Acunu Analytics

countgrouped by ...

daycount

distinct (session)

count ... geography

... browseravg(duration)

Tuesday, 27 March 2012