acunu analytics

Post on 20-Jan-2015

3.728 Views

Category:

Technology

0 Downloads

Preview:

Click to see full reader

DESCRIPTION

 

TRANSCRIPT

Acunu AnalyticsSimple, powerful, real-time

Andrew BydePrincipal Scientist

Tuesday, 27 March 2012

Making big data useful

time page session id duration

... ... ... ...

14:58:03.234 /index.html 248.180.3.40 898

14:58:03.234 /csi/csi/council/freedom.html 248.180.3.40 1234

14:58:03.234 /docs/access/chapter8.txt 99.1.10.178 52

... ... ... ...

x billions

How do we turn this ...

Tuesday, 27 March 2012

IntroductionMY

into this...

Tuesday, 27 March 2012

or this...

Tuesday, 27 March 2012

or this...

Tuesday, 27 March 2012

• SQL + materialised views

Tuesday, 27 March 2012

• SQL + materialised views

... would be nice if it scaled

Tuesday, 27 March 2012

• Hadoop/Map-Reduce can do anything

Tuesday, 27 March 2012

• Hadoop/Map-Reduce can do anything

Not real-time

Inefficient re-computation

Tuesday, 27 March 2012

• Hadoop/Map-Reduce can do anything

Not real-time

Inefficient re-computation

(100TB on a 100 node cluster is > 3 hours)

Tuesday, 27 March 2012

• Cassandra counters are pretty cool

Tuesday, 27 March 2012

• Cassandra counters are pretty cool

but the query semantics is spartan

=> DIY solutions

Tuesday, 27 March 2012

Acunu Analytics

• Simple, real-time, incremental analytics

• push processing into ingest phase

CassandraeventAA

counterupdates

Tuesday, 27 March 2012

Acunu Analytics

• Event template, e.g.,

• specifies “blow-up” strategy according to supported queries

select : ["COUNT", "AVG(loadTime)"],type : { time : [TIME(HOUR; MIN; SEC), ?, 0], page : PATH(/), loadTime : [LONG, 0, 0]}

Tuesday, 27 March 2012

Acunu Analytics

21:00 all→1345 :00→45 :01→62 :02→87 ...

22:00 all→3221 :00→22 :00→19 :02→104 ...

... ...

click all→228 user01→1 user14→12 user99→7 ...

open all→354 user01→4 user04→8 user56→17 ...

...

click, 22:00 all→1904 ...

∅ all→87314 click→238 open→354 ...

type : { time : TIME(HOUR; MIN), category : STRING, user : STRING}

Tuesday, 27 March 2012

Acunu Analytics

21:00 all→1345 :00→45 :01→62 :02→87 ...

22:00 all→3221 :00→22 :00→19 :02→104 ...

... ...

click all→228 user01→1 user14→12 user99→7 ...

open all→354 user01→4 user04→8 user56→17 ...

...

click, 22:00 all→1904 ...

∅ all→87314 click→238 open→354 ...

(22:02, “click”, user01)

type : { time : TIME(HOUR; MIN), category : STRING, user : STRING}

Tuesday, 27 March 2012

Acunu Analytics

(22:02, “click”, user01)

21:00 all→1345 :00→45 :01→62 :02→87 ...

22:00 all→3222 :00→22 :00→19 :02→105 ...

... ...

click all→229 user01→2 user14→12 user99→7 ...

open all→354 user01→4 user04→8 user56→17 ...

...

click, 22:00 all→1905 ...

∅ all→87315 click→239 open→355 ...

type : { time : TIME(HOUR; MIN), category : STRING, user : STRING}

Tuesday, 27 March 2012

Acunu Analytics

21:00 all→1345 :00→45 :01→62 :02→87 ...

22:00 all→3222 :00→22 :00→19 :02→105 ...

... ...

click all→229 user01→2 user14→12 user99→7 ...

open all→354 user01→4 user04→8 user56→17 ...

...

click, 22:00 all→1905 ...

∅ all→87315 click→239 open→355 ...

Pre-assembled queries, e.g. ...

count all

group all by category

group all by user, where category=click

for 22:00-23:00, group by minute

Tuesday, 27 March 2012

Summary

• Simple, real-time, incremental analytics

• work done on ingest

• sum, count, distinct, avg, stddev, min-max etc

• time + hierarchy bucketing

• efficient ‘group’ semantics

• works with Apache Cassandra

Tuesday, 27 March 2012

Early Access Program

analytics@acunu.com

Tuesday, 27 March 2012

Tuesday, 27 March 2012

count

Tuesday, 27 March 2012

count distinct

(session)

count

Tuesday, 27 March 2012

count distinct

(session)

count

avg(duration)

Tuesday, 27 March 2012

countgrouped by ...

daycount

distinct (session)

count

avg(duration)

Tuesday, 27 March 2012

countgrouped by ...

daycount

distinct (session)

count ... geography

avg(duration)

Tuesday, 27 March 2012

countgrouped by ...

daycount

distinct (session)

count ... geography

... browseravg(duration)

Tuesday, 27 March 2012

top related