how we built analytics from scratch (in seven easy steps)

44
How we built analytics from scratch (in seven easy steps) Jodi Moran, Co-founder & CTO 1

Post on 21-Oct-2014

518 views

Category:

Technology


1 download

DESCRIPTION

 

TRANSCRIPT

Page 1: How we built analytics from scratch (in seven easy steps)

How we built analytics from scratch (in seven easy steps)

Jodi Moran, Co-founder & CTO

1

Page 2: How we built analytics from scratch (in seven easy steps)

Plumbee: social casino games

2

Page 3: How we built analytics from scratch (in seven easy steps)

Plumbee’s growth

3

Oct 2011

• 3 founders & 3 founding employees

• 0 in data

March 2012

• Mirrorball Slots on Facebook launch

• 15 staff

• 0 in data

Dec 2012

• Mirrorball Slots on iOS beta launch

• 29 staff

• 4 in data

Today

• 1.2M MAU

• 250K DAU

• 39 staff

• 5 in data

Page 4: How we built analytics from scratch (in seven easy steps)

“Build, measure, learn”

4

Timing and targeting of offers

Balancing of the virtual economy

Creation of engaging features

Cost-effective acquisition

Page 5: How we built analytics from scratch (in seven easy steps)

Goals

5

Never say “we don’t have that data”

Breadth of data use

Depth of data use

Agile data use

Scalable foundation for the future

Page 6: How we built analytics from scratch (in seven easy steps)

In the beginning…

6

Page 7: How we built analytics from scratch (in seven easy steps)

Step #1:

7

Blank slate No time

No bandwidth

No experience

3rd party analytics

Page 8: How we built analytics from scratch (in seven easy steps)

Third-party analytics

• Low opportunity cost

• Full stack solution

• Lots of choices

• Get useful data to everyone fast

8

Page 9: How we built analytics from scratch (in seven easy steps)

9

Page 10: How we built analytics from scratch (in seven easy steps)

Step #2:

10

3rd party systems lack

flexibility

Want to own the data

Don’t know what we want

to know

Analytics is strategic

Collect everything

Page 11: How we built analytics from scratch (in seven easy steps)

What is everything?

• State-changing calls from client to server

• Changes of state

• State-changing calls from client to third parties (Facebook)

Yes, this is a lot of data: 450m events (45 GB compressed) per day.

Using Amazon Web Services makes this possible.

11

Page 12: How we built analytics from scratch (in seven easy steps)

12

Page 13: How we built analytics from scratch (in seven easy steps)

12

Page 14: How we built analytics from scratch (in seven easy steps)

12

Page 15: How we built analytics from scratch (in seven easy steps)

12

Page 16: How we built analytics from scratch (in seven easy steps)

12

Page 17: How we built analytics from scratch (in seven easy steps)

Why we like it

No need:

– To test instrumentation

– To add instrumentation of new features

– To touch transactional databases

– To worry we won’t have the data

Easy and fast to implement

... but we still miss things.

13

Page 18: How we built analytics from scratch (in seven easy steps)

14

Page 19: How we built analytics from scratch (in seven easy steps)

14

Page 20: How we built analytics from scratch (in seven easy steps)

Step #3:

15

Lots and lots of data

Need access

Data is unstructured

No time to build

structure

Elastic MapReduce & Hive

Page 21: How we built analytics from scratch (in seven easy steps)

16

Page 22: How we built analytics from scratch (in seven easy steps)

16

Page 23: How we built analytics from scratch (in seven easy steps)

The secret to success

17

The right

analyst

Technical skills

Unstructured data

Data architecture

Page 24: How we built analytics from scratch (in seven easy steps)

Step #4:

18

Only access via SQL

Lack of visibility

Want data to be everyday

Google Spreadsheets

Page 25: How we built analytics from scratch (in seven easy steps)

19

Page 26: How we built analytics from scratch (in seven easy steps)

20

Page 27: How we built analytics from scratch (in seven easy steps)

20

Page 28: How we built analytics from scratch (in seven easy steps)

Step #5:

21

Want to know what worked

Can’t separate factors

Want flexibility

In-house split testing

Page 29: How we built analytics from scratch (in seven easy steps)

It’s easy to serve experiments…

• Server-side random assignment of users

• Second tier allows deep tests (bonus: canary deployments)

• Tool for configuration-only tests

• Test & variant pairs attached to every analytics event

22

Page 30: How we built analytics from scratch (in seven easy steps)

… but it’s hard to analyse experiments

23

Web analytics

Conversion rate

Binomial distribution

Simple tests

•Measuring variables that don’t satisfy “conversion rate” assumptions •The need for an Overall Evaluation Criterion

Page 31: How we built analytics from scratch (in seven easy steps)

Step #6:

24

All data processing is

manual

This is getting expensive

And it takes a long time to

run

Automation & optimization

Page 32: How we built analytics from scratch (in seven easy steps)
Page 33: How we built analytics from scratch (in seven easy steps)
Page 34: How we built analytics from scratch (in seven easy steps)
Page 35: How we built analytics from scratch (in seven easy steps)

(Basic) optimization

• Spot instances

• Output compression with snappy

• Python streaming jobs

• There’s a lot more we could do…

26

Page 36: How we built analytics from scratch (in seven easy steps)

Step #7:

27

Expensive Hive clusters

Queries take a long time to

run

Hive functionality

is limited

Relational data mart

Page 37: How we built analytics from scratch (in seven easy steps)

Why Hive AND a traditional database?

15 GB of aggregates

20 TB total

28

Page 38: How we built analytics from scratch (in seven easy steps)

29

Page 39: How we built analytics from scratch (in seven easy steps)

29

Page 40: How we built analytics from scratch (in seven easy steps)

29

Plumbee analytics today

Page 41: How we built analytics from scratch (in seven easy steps)

Goals

30

Never say “we don’t have that data”

Breadth of data use

Depth of data use

Agile data use

Scalable foundation for the future

Page 42: How we built analytics from scratch (in seven easy steps)

The results: average daily spenders

31 Month

Page 43: How we built analytics from scratch (in seven easy steps)

But we have tons to do. E

ng

ine

eri

ng

• Replace our custom event aggregators with Flume

• Replace pull-based Hive & Python streaming jobs with Cascading + JVM-based languages

• Change event storage from JSON to Avro

• Better dashboards and tools

• Consider in-memory processing, e.g. Spark/Shark

• Toward “big data” A

nal

ysis

• More “actionable”, less “interesting”

• Continuous optimization: split / multivariate testing, multi-armed bandit

• Better predictive models

• Clustering, segmentation, personalization

• Toward “data science”

32

Page 44: How we built analytics from scratch (in seven easy steps)

33

Jodi Moran jobs.plumbee.com

[email protected] www.plumbee.com

@jodi_p_moran apps.facebook.com/mirrorballslots

www.facebook.com/jodipmoran

www.linkedin.com/in/jmoran

Questions? Get in touch!