social analytics on mongodb at mongonyc

23
Social Analytics with MongoDB @BuddyMedia

Upload: patrick-stokes

Post on 28-Nov-2014

1.503 views

Category:

Technology


1 download

DESCRIPTION

 

TRANSCRIPT

Page 1: Social Analytics on MongoDB at MongoNYC

Social Analytics with MongoDB

@BuddyMedia

Page 2: Social Analytics on MongoDB at MongoNYC

Disclaimer

+= maybe not the best deck in the world

Page 3: Social Analytics on MongoDB at MongoNYC

What is MongoDB?

• Document Store. • Schemaless.• High performance.

Page 4: Social Analytics on MongoDB at MongoNYC

Why MongoDB?

• Months of testing– Data Types– Horizontal Scaling – Replication– Querying– Atomicity – Concurrency

Page 5: Social Analytics on MongoDB at MongoNYC

Everything in that last slide was a LIE.

Page 6: Social Analytics on MongoDB at MongoNYC

Same reason most of you do.

• It’s new and cool and we wanted to check it out.

• We become cool by association.• But mostly because we like learning new

things.

Page 7: Social Analytics on MongoDB at MongoNYC

That last slide was kind of a lie too.

• We started with Cassandra.• Cassandra was written by Facebook and

Facebook is really cool, we wanted to be as cool as them.

Page 8: Social Analytics on MongoDB at MongoNYC

Why Not Cassandra?

• Thrift. – “Thrift is a software framework for scalable cross-

language services development. It combines a software stack with a code generation engine to build services that work efficiently and seamlessly between C++, Java, Python, PHP, Ruby, Erlang, Perl, Haskell, C#, Cocoa, JavaScript, Node.js, Smalltalk, and OCaml.”

• Eff that. We’re a startup.

Page 9: Social Analytics on MongoDB at MongoNYC

So MongoDB it Was.

Also, MongoDB Happened to be in NYC. We are in NYC. NYC is Cool.

Proof that NYC is cool.

Page 10: Social Analytics on MongoDB at MongoNYC

What You Should Know

• MongoDB is not relational.• It’s also not schemaless even though they love to say that.

(applications always have schemas/data models).• Right tool for right job.

– Logging– Queues– Aggregate Analytics

• Don’t get confused with ORM.• Return what you need.• Don’t worry about document size limits.

Page 11: Social Analytics on MongoDB at MongoNYC

Aggregate Analytics

• Lots of “Stuff” happens at Buddy Media.• Need to keep track of it all.• Need to it to be real time. • Need to be able to group it by various levels and

resolutions.• Need to be able to create new metrics on the fly.• Write heavy, Read light.

Page 12: Social Analytics on MongoDB at MongoNYC

What does it look like?

Event Queue Processor Metric

Page 13: Social Analytics on MongoDB at MongoNYC

Architecture

Page 14: Social Analytics on MongoDB at MongoNYC

The Event Listener

• Node.js is the perfect event listener.– Evented IO like Twisted or Event Machine.– 2 days of development (maybe ~100 lines of JS). – 0 lost events– 0 downtime.– Just don’t upgrade

Page 15: Social Analytics on MongoDB at MongoNYC

Raw Event

A Pageview

{"_id" : ObjectId("4d8d0df101cddf2e6e0027af"),"created_date" : "2010-07-26 20:15:01","data" : {

"client_id" : "1034","page_id" : "175”

},"status" : {

"state" : 0,"updated" : "2011-04-12 10:15:15"

},"type" : "pageview"

}

Page 16: Social Analytics on MongoDB at MongoNYC

Processing

• 3 resolutions– Minute– Hour– Day

• 1 event = 3 metric updates * number of groupings.

"pageview": {"metrics": [

{ "name":"client.pageviews", "key":"client_id" },{ "name":"page.pageviews", "key":"page_id" }

]}

Page 17: Social Analytics on MongoDB at MongoNYC

Creating a Metric

A pageview happened and I want to update metrics for the client the page belongs to.

metrics.update({

'name’:client.pageview','period':'minute','start_date':'2010-05-12 12:50:00'

}, { '$inc': {'aggregates.1034':1} }, upsert=True

);

Page 18: Social Analytics on MongoDB at MongoNYC

Completed Metric

{"_id" : ObjectId("4da45cf6306a22719829b71b"),"aggregates" : {

”1034" : 11},"end_date" : "2010-05-12 12:54:59","name" : ”client.pageview","period" : "minute","start_date" : "2010-05-12 12:50:00","total" : 11

}

Page 19: Social Analytics on MongoDB at MongoNYC

What about another client?If a second pageview comes in for a different client, we end up updating the exact same record. Thus our last metric becomes:

{"_id" : ObjectId("4da45cf6306a22719829b71b"),"aggregates" : {

”1034" : 1,“1213”: 1

},"end_date" : "2010-05-12 12:54:59","name" : ”client.pageview","period" : "minute","start_date" : "2010-05-12 12:50:00","total" : 11

}

Page 20: Social Analytics on MongoDB at MongoNYC

Some Queries1. Get pageviews for all clients that occurred on May 12 between 12:50 and 12:51

db.metrics.find({name:"client.pageview",period:"minute",start_date:"2010-05-12 12:50:00”

});

2. Get pageviews for client 1034 that occurred on May 12 between 12:50 and 12:51

db.metrics.find({name:"client.pageview",period:"minute",start_date:"2010-05-12 12:50:00”

},{“aggregates.1034”:1});

1 Document, n entries.

1 Document, 1 entry.

Page 21: Social Analytics on MongoDB at MongoNYC

More Queries1. Get pageviews for all clients that occurred on May 12 and graph by hour.

db.metrics.find({name:"client.pageview",period:”hour",start_date:”/2010-05-12/”

});

2. Get pageviews for client 1034 that occurred on May 12 and graph by minute.

db.metrics.find({name:"client.pageview",period:"minute",start_date:”/2010-05-/”

},{“aggregates.1034”:1});

24 Documents, n entries.

1440 Documents, 1 entry.

Page 22: Social Analytics on MongoDB at MongoNYC

Let’s take a peak.

Page 23: Social Analytics on MongoDB at MongoNYC

@patr1cks@buddymedia