building and improving products with hadoop matthew rathbone

42
2013 Building and Improving Products with Hadoop Matthew Rathbone

Upload: tamber

Post on 24-Feb-2016

27 views

Category:

Documents


0 download

DESCRIPTION

Building and Improving Products with Hadoop Matthew Rathbone. What is Foursquare. Foursquare helps you explore the world around you. Meet up with friends, discover new places, and save money using your phone. 4 bn check-ins 35mm users 50mm POI 150 employees 1tb+ a day of data. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Building and Improving Products with  Hadoop Matthew Rathbone

2013

Building and Improving Products with HadoopMatthew Rathbone

Page 2: Building and Improving Products with  Hadoop Matthew Rathbone

2013

What is FoursquareFoursquare helps you explore the world around you.

Meet up with friends, discover new places, and save money using your phone.

4bn check-ins 35mm users 50mm POI 150 employees 1tb+ a day of data

Page 3: Building and Improving Products with  Hadoop Matthew Rathbone

2013

FIRST, A STORYhttp://www.flickr.com/photos/shannonpatrick17

Page 4: Building and Improving Products with  Hadoop Matthew Rathbone

2013

The Right Tool for the Job

• Nginx – Serving static files

• Perl – Regular expressions

• XML – Frustrating people

• Hadoop (Map Reduce) – Counting

Page 5: Building and Improving Products with  Hadoop Matthew Rathbone

2013

COUNTING – WHAT IS IT GOOD FORhttp://www.flickr.com/photos/blaahhi/

Page 6: Building and Improving Products with  Hadoop Matthew Rathbone

2013

Page 7: Building and Improving Products with  Hadoop Matthew Rathbone

2013

Page 8: Building and Improving Products with  Hadoop Matthew Rathbone

2013

Page 9: Building and Improving Products with  Hadoop Matthew Rathbone

2013

Page 10: Building and Improving Products with  Hadoop Matthew Rathbone

2013

Page 11: Building and Improving Products with  Hadoop Matthew Rathbone

2013

Statistically Improbable PhrasesStatistically Improbable Phrases

Page 12: Building and Improving Products with  Hadoop Matthew Rathbone

2013

SIPS use cases

• menu extraction• sentiment analysis• venue ratings• specific recommendations• search indexing• pricing data• facility information

Page 13: Building and Improving Products with  Hadoop Matthew Rathbone

2013

How is SIPS built?

Basically lots of counting.

Page 14: Building and Improving Products with  Hadoop Matthew Rathbone

2013

SIPS• Tokenize data with a language model (into N-

Grams)• built using tips, shouts, menu items, likes, etc

• Apply a TF-IDF algorithm (Term frequency, inverse document frequency)

• Global phrase count• Local phrase count ( in a venue )• Some Filtering and ranking

• Re-compute & deploy nightly

Page 15: Building and Improving Products with  Hadoop Matthew Rathbone

2013

WHY USE HADOOP?http://www.flickr.com/photos/dbrekke/

Page 16: Building and Improving Products with  Hadoop Matthew Rathbone

2013

SIPS – Without Hadoop

Potential Problems• Database Query Throttling• Venues are out of sync• Altering the algorithm could take forever

to populate for all venues• Where would you store the results? • What about debug data?• Does it scale to 10x, 100x?• What about other, similar workflows?

Page 17: Building and Improving Products with  Hadoop Matthew Rathbone

2013

SIPS – Hadoop Benefits• Quick Deployment

• Modular & Reusable

• Arbitrarily complex combination of many datasets

• Every step of the workflow creates value

Page 18: Building and Improving Products with  Hadoop Matthew Rathbone

2013

Apple Store - Downtown San Francisco

1 tip mentions "haircuts"

Search for "haircuts" in "san francisco" Apple store???

Fixed by looking at % of tips and overall frequency

“Hey Apple, how bout less shiny pizzazz and fancy haircuts and more fix-my-f!@#$-imac”

Page 19: Building and Improving Products with  Hadoop Matthew Rathbone

2013

Data & Modularity

Page 20: Building and Improving Products with  Hadoop Matthew Rathbone

2013

Page 21: Building and Improving Products with  Hadoop Matthew Rathbone

2013

Page 22: Building and Improving Products with  Hadoop Matthew Rathbone

2013

Page 23: Building and Improving Products with  Hadoop Matthew Rathbone

2013

ACTUALLY, IT’S A BIT MORE COMPLICATED

http://www.flickr.com/photos/bfishadow

Page 24: Building and Improving Products with  Hadoop Matthew Rathbone

2013

These benefits require infrastructure

Page 25: Building and Improving Products with  Hadoop Matthew Rathbone

2013

Dependency Management

Many options• Oozie (Apache)• Azkaban (LinkedIn)• Luigi ( Spotify, we <3 this )• Hamake ( Codeminders )• Chronos ( AirBNB)

Page 26: Building and Improving Products with  Hadoop Matthew Rathbone

2013

Page 27: Building and Improving Products with  Hadoop Matthew Rathbone

2013

Database / Log Ingestion

• Sqoop• Mongo-Hadoop• Kafka• Flume• Scribe• etc

Page 28: Building and Improving Products with  Hadoop Matthew Rathbone

2013

Page 29: Building and Improving Products with  Hadoop Matthew Rathbone

2013

MapReduce Friendly Datastore

A few obvious ones:• Hbase• Cassandra• Voldemort

we built our own, it’s very similar to Voldemort and uses the Hfile API

Page 30: Building and Improving Products with  Hadoop Matthew Rathbone

2013

Page 31: Building and Improving Products with  Hadoop Matthew Rathbone

2013

Getting started without all that stuff

Page 32: Building and Improving Products with  Hadoop Matthew Rathbone

2013

Components you likely don’t have

Page 33: Building and Improving Products with  Hadoop Matthew Rathbone

2013

The best way to start

Don’t use Hadoop.

*but pretend you do

Page 34: Building and Improving Products with  Hadoop Matthew Rathbone

2013

Other reasons to not use Hadoop• Your idea might not be very good

• Hadoop will slow you down to start with

• You don’t have enough infrastructure yet• build it when you need it

• V1 might not be that complex

• V1 could be a spreadsheet

Page 35: Building and Improving Products with  Hadoop Matthew Rathbone

2013

Page 36: Building and Improving Products with  Hadoop Matthew Rathbone

2013

Page 37: Building and Improving Products with  Hadoop Matthew Rathbone

2013

SIPS

Version 1• Off the shelf language model• A subset of Venues & Tips• Did not use Map Reduce• Did not push to production at all

Page 38: Building and Improving Products with  Hadoop Matthew Rathbone

2013

SIPS

Version 2• Started building our own language

model• Rewritten as a Map Reduce• Manually loaded data to production• Filters for English data only.

Tweak, improve, etc

Page 39: Building and Improving Products with  Hadoop Matthew Rathbone

2013

SIPSVersion 3

• Incorporated more data sources into our language model

• Deployment to KV store (auto)

• Incorporated lots of debug output

• Language pipeline also feeds sentiment analysis

Now we’re in the perfect place to iterate & improve

Page 40: Building and Improving Products with  Hadoop Matthew Rathbone

2013

…to explore data

Page 41: Building and Improving Products with  Hadoop Matthew Rathbone

2013

In Summary• Hadoop is good for counting, so use it for

counting

• Move quickly whenever possible and don’t worry about automation

• Bring in new production services as you need them

• Freedom!

Page 42: Building and Improving Products with  Hadoop Matthew Rathbone

20132013

[email protected]@rathboma

Bonus:http://hadoopweekly.comfrom my colleague, Joe Crobak (presenting later!)