taming social media with mongodb

Post on 20-Jun-2015

564 Views

Category:

Technology

1 Downloads

Preview:

Click to see full reader

DESCRIPTION

Slides from presentation given at MongoDC on June 26, 2012.

TRANSCRIPT

Taming Social Media with MongoDB

Danny Hollowaydanny@thehumangeo.com

June 26, 2012

2

Overview

• Introduction• Social Media Challenges• MongoDB Setup• Collecting Tweets• Querying Tweets• Accessing the Data• Finding Most Active Tweeter• Lessons Learned• Building an Interface• Demo

3

Introduction

• Built a tool to collect tweets over Australia and interact with them on a map

• Working at HumanGeo– Building tools and services for geospatial analysis

of Big Data– Using MongoDB for horizontally scalable storage

and geospatial analysis

4

Social Media Challenges

• No control over data– “Consumers of Tweets should tolerate the addition

of new fields and variance in ordering of fields with ease.” - Twitter

• High Volume– ~17k tweets in a day or 6.2M per year with exact

coordinates in Australia– Record high of >25k tweets per second or >788B

per year around the world - Twitter

5

MongoDB Setup

• Create database• Create capped collections• Create indexes

6

Collecting Tweets

• Using tweetstream to collect tweets over Australia from statuses/filter endpoint

• Insert results into collections

7

Collecting Tweets (cont)

• Augment results for better queries– Twitter provides date strings like "Wed Jun 13

23:17:58 +0000 2012“

8

Querying Tweets

• Get all of the latest tweets

• Get all the tweets from a user

9

Querying Tweets (cont)

• Get tweets near a point

• Get tweets within a bounding box

10

Accessing the Data

• Using Bottle to create a RESTful API

11

Finding Most Active Tweeter

• Calculate tweet count for each user and return tweets for that user

12

Lessons Learned

• Use Longitude, Latitude ordering for coordinates

• Default index value range is exclusive of upper bound

• Twitter has bugs too• Making your own maps isn’t hard (it can take

some time)

13

Building an Interface

• Dust javascript templating library• Leaflet javascript interactive map library• jQuery javascript library• TileStream map tile server

top related