modeling for performance

72
Data Modeling for Performance Mongo Boulder January 21, 2010 Michael Dwan Snapjoy

Upload: mongodb

Post on 10-Jul-2015

579 views

Category:

Technology


0 download

DESCRIPTION

Mongo Boulder talk by Michael Dwan

TRANSCRIPT

Page 1: Modeling for Performance

Data Modeling for Performance

Mongo BoulderJanuary 21, 2010

Michael DwanSnapjoy

Page 2: Modeling for Performance

i’m michael dwan@michaeldwan on the twitter

Page 3: Modeling for Performance

the projectCompany X

Page 4: Modeling for Performance

application spec

• find business details (web + api)

• search by category/keyword + geo (web + api)

• update (api)

Page 5: Modeling for Performance

why is this interesting?

15,000,000businesses

30,000partners

100,000geo areas

2,300categories

2,000,000requests daily

24,000,000urls in sitemap

100,000,000tags

Page 6: Modeling for Performance

updates

• infrequent changes

• monthly updates w/ 12M monthly changes

• “zero downtime”

Page 7: Modeling for Performance

the problemmo’ data, mo’ problems

Page 8: Modeling for Performance

complexity

Page 9: Modeling for Performance

businesses

phone_numbers

businesses _phone_numbers

cities

states

zips

neighborhoods

businesses_neighborhoods

tags

taggings

assets

users

categories

categorizations

providers mappings

Page 10: Modeling for Performance

architecture

x

xx x

Page 11: Modeling for Performance

read performance

Page 12: Modeling for Performance

solr

downtime

Page 13: Modeling for Performance

solr getting fussy

Page 14: Modeling for Performance

migrations

downtime

Page 15: Modeling for Performance

the solution

Page 16: Modeling for Performance

> gem install acts_as_web_scale

Page 17: Modeling for Performance
Page 18: Modeling for Performance
Page 19: Modeling for Performance

a business...

{ "_id" : ObjectId("4ce838ef4a882579960001b9"), "name" : "Acme Glass Co", "tagline" : "Your trusty glass hole", "description" : "Glass repair...", "hours" : "Mon Fri 8 5", "url" : "http://acmeglasshole.biz",}

Page 20: Modeling for Performance

a business... has many phone numbers

{ "_id" : ObjectId("4ce838ef4a882579960001b9"), "name" : "Acme Glass Co", "tagline" : "Your trusty glass hole", "description" : "Glass repair...", "hours" : "Mon Fri 8 5", "url" : "http://acmeglasshole.biz",}

Page 21: Modeling for Performance

a business... has many phone numbers

{ "_id" : ObjectId("4ce838ef4a882579960001b9"), "name" : "Acme Glass Co", "tagline" : "Your trusty glass hole", "description" : "Glass repair...", "hours" : "Mon Fri 8 5", "url" : "http://acmeglasshole.biz", "phone_numbers" : [ "5035550091", "8005555456" ]}

Page 22: Modeling for Performance

a business... has coordinates

{ "_id" : ObjectId("4ce838ef4a882579960001b9"), "name" : "Acme Glass Co", "tagline" : "Your trusty glass hole", "description" : "Glass repair...", "hours" : "Mon Fri 8 5", "url" : "http://acmeglasshole.biz", "phone_numbers" : [ "5035550091", "8005555456" ]}

Page 23: Modeling for Performance

a business... has coordinates

{ "_id" : ObjectId("4ce838ef4a882579960001b9"), "name" : "Acme Glass Co", "tagline" : "Your trusty glass hole", "description" : "Glass repair...", "hours" : "Mon Fri 8 5", "url" : "http://acmeglasshole.biz", "phone_numbers" : [ "5035550091", "8005555456" ], "coordinates" : [ 45.559294, -122.644053 ]}

Page 24: Modeling for Performance

a business... has many tags

{ "_id" : ObjectId("4ce838ef4a882579960001b9"), "name" : "Acme Glass Co", "tagline" : "Your trusty glass hole", "description" : "Glass repair...", "hours" : "Mon Fri 8 5", "url" : "http://acmeglasshole.biz", "phone_numbers" : [ "5035550091", "8005555456" ], "coordinates" : [ 45.559294, -122.644053 ]}

Page 25: Modeling for Performance

a business... has many tags

{ "_id" : ObjectId("4ce838ef4a882579960001b9"), "name" : "Acme Glass Co", "tagline" : "Your trusty glass hole", "description" : "Glass repair...", "hours" : "Mon Fri 8 5", "url" : "http://acmeglasshole.biz", "phone_numbers" : [ "5035550091", "8005555456" ], "coordinates" : [ 45.559294, -122.644053 ], "tags" : [ "glass", "mirrors", "flat glass" ]}

Page 26: Modeling for Performance

a business... has an address

{ "_id" : ObjectId("4ce838ef4a882579960001b9"), "name" : "Acme Glass Co", "tagline" : "Your trusty glass hole", "description" : "Glass repair...", "hours" : "Mon Fri 8 5", "url" : "http://acmeglasshole.biz", "phone_numbers" : [ "5035550091", "8005555456" ], "coordinates" : [ 45.559294, -122.644053 ], "tags" : [ "glass", "mirrors", "flat glass" ]}

Page 27: Modeling for Performance

a business... has an address

{ "_id" : ObjectId("4ce838ef4a882579960001b9"), "name" : "Acme Glass Co", "tagline" : "Your trusty glass hole", "description" : "Glass repair...", "hours" : "Mon Fri 8 5", "url" : "http://acmeglasshole.biz", "phone_numbers" : [ "5035550091", "8005555456" ], "coordinates" : [ 45.559294, -122.644053 ], "tags" : [ "glass", "mirrors", "flat glass" ], "location" : { "street_address" : "2035 NE Alberta St" }}

Page 28: Modeling for Performance

belongs to?

Page 29: Modeling for Performance

a state

{ "_id" : ObjectId("4ce82937961552247900000f"), "name" : "Illinois", "slug" : "il", ...}

Page 30: Modeling for Performance

a business... belongs to a state

{ "_id" : ObjectId("4ce838ef4a882579960001b9"), "name" : "Acme Glass Co", "tagline" : "Your trusty glass hole", "description" : "Glass repair...", "hours" : "Mon Fri 8 5", "url" : "http://acmeglasshole.biz", "phone_numbers" : [ "5035550091", "8005555456" ], "coordinates" : [ 45.559294, -122.644053 ], "tags" : [ "glass", "mirrors", "flat glass" ], "location" : { "street_address" : "2035 NE Alberta St" }}

Page 31: Modeling for Performance

a business... belongs to a state

{ "_id" : ObjectId("4ce838ef4a882579960001b9"), "name" : "Acme Glass Co", "tagline" : "Your trusty glass hole", "description" : "Glass repair...", "hours" : "Mon Fri 8 5", "url" : "http://acmeglasshole.biz", "phone_numbers" : [ "5035550091", "8005555456" ], "coordinates" : [ 45.559294, -122.644053 ], "tags" : [ "glass", "mirrors", "flat glass" ], "location" : { "street_address" : "2035 NE Alberta St" }}

Page 32: Modeling for Performance

a business... belongs to a state

{ "_id" : ObjectId("4ce838ef4a882579960001b9"), "name" : "Acme Glass Co", "tagline" : "Your trusty glass hole", "description" : "Glass repair...", "hours" : "Mon Fri 8 5", "url" : "http://acmeglasshole.biz", "phone_numbers" : [ "5035550091", "8005555456" ], "coordinates" : [ 45.559294, -122.644053 ], "tags" : [ "glass", "mirrors", "flat glass" ], "location" : { "street_address" : "2035 NE Alberta St", "state" : { "_id" : ObjectId("4ce829379615522479000026"), "meta" : { "slug" : "or" }, "display_name" : "Oregon" } }}

Page 33: Modeling for Performance

a business... belongs to a city

{ "_id" : ObjectId("4ce838ef4a882579960001b9"), "name" : "Acme Glass Co", "tagline" : "Your trusty glass hole", "description" : "Glass repair...", "hours" : "Mon Fri 8 5", "url" : "http://acmeglasshole.biz", "phone_numbers" : [ "5035550091", "8005555456" ], "coordinates" : [ 45.559294, -122.644053 ], "tags" : [ "glass", "mirrors", "flat glass" ], "location" : { "street_address" : "2035 NE Alberta St", "state" : { "_id" : ObjectId("4ce829379615522479000026"), "meta" : { "slug" : "or" }, "display_name" : "Oregon" } }}

Page 34: Modeling for Performance

a business... belongs to a city

{ "_id" : ObjectId("4ce838ef4a882579960001b9"), "name" : "Acme Glass Co", "tagline" : "Your trusty glass hole", "description" : "Glass repair...", "hours" : "Mon Fri 8 5", "url" : "http://acmeglasshole.biz", "phone_numbers" : [ "5035550091", "8005555456" ], "coordinates" : [ 45.559294, -122.644053 ], "tags" : [ "glass", "mirrors", "flat glass" ], "location" : { "street_address" : "2035 NE Alberta St", "state" : { "_id" : ObjectId("4ce829379615522479000026"), "meta" : { "slug" : "or" }, "display_name" : "Oregon" }, "city" : { "_id" : ObjectId("4ce82abdd3dfaa10f8006faa"), "meta" : { "slug" : "portland", }, "display_name" : "Portland, OR" }, }}

Page 35: Modeling for Performance

a business... belongs to a zip code

{ "_id" : ObjectId("4ce838ef4a882579960001b9"), "name" : "Acme Glass Co", "tagline" : "Your trusty glass hole", "description" : "Glass repair...", "hours" : "Mon Fri 8 5", "url" : "http://acmeglasshole.biz", "phone_numbers" : [ "5035550091", "8005555456" ], "coordinates" : [ 45.559294, -122.644053 ], "tags" : [ "glass", "mirrors", "flat glass" ], "location" : { "street_address" : "2035 NE Alberta St", "state" : { "_id" : ObjectId("4ce829379615522479000026"), "meta" : { "slug" : "or" }, "display_name" : "Oregon" }, "city" : { "_id" : ObjectId("4ce82abdd3dfaa10f8006faa"), "meta" : { "slug" : "portland", }, "display_name" : "Portland, OR" }, }}

Page 36: Modeling for Performance

a business... belongs to a zip code

{ "_id" : ObjectId("4ce838ef4a882579960001b9"), "name" : "Acme Glass Co", "tagline" : "Your trusty glass hole", "description" : "Glass repair...", "hours" : "Mon Fri 8 5", "url" : "http://acmeglasshole.biz", "phone_numbers" : [ "5035550091", "8005555456" ], "coordinates" : [ 45.559294, -122.644053 ], "tags" : [ "glass", "mirrors", "flat glass" ], "location" : { "street_address" : "2035 NE Alberta St", "state" : { "_id" : ObjectId("4ce829379615522479000026"), "meta" : { "slug" : "or" }, "display_name" : "Oregon" }, "city" : { "_id" : ObjectId("4ce82abdd3dfaa10f8006faa"), "meta" : { "slug" : "portland", }, "display_name" : "Portland, OR" }, "zip" : { "_id" : ObjectId("4ce82c29d3dfaa116b006dfa"), "display_name" : "97211" } }}

Page 37: Modeling for Performance

many-to-many?

Page 38: Modeling for Performance

a category

{ "_id" : ObjectId("4ce82e64d3dfaa16360014eb"), "name" : "Auto Glass", "slug" : "3063-auto-glass", "tags" : [ "windshields" ], ...}

Page 39: Modeling for Performance

a business... belongs to a zip code

{ "_id" : ObjectId("4ce838ef4a882579960001b9"), "name" : "Acme Glass Co", "tagline" : "Your trusty glass hole", "description" : "Glass repair...", "hours" : "Mon Fri 8 5", "url" : "http://acmeglasshole.biz", "phone_numbers" : [ "5035550091", "8005555456" ], "coordinates" : [ 45.559294, -122.644053 ], "tags" : [ "glass", "mirrors", "flat glass" ], "location" : { "street_address" : "2035 NE Alberta St", "state" : { "_id" : ObjectId("4ce829379615522479000026"), "meta" : { "slug" : "or" }, "display_name" : "Oregon" }, "city" : { "_id" : ObjectId("4ce82abdd3dfaa10f8006faa"), "meta" : { "slug" : "portland", }, "display_name" : "Portland, OR" }, "zip" : { "_id" : ObjectId("4ce82c29d3dfaa116b006dfa"), "display_name" : "97211" } }}

Page 40: Modeling for Performance

a business... belongs to many categories

{ "_id" : ObjectId("4ce838ef4a882579960001b9"), "name" : "Acme Glass Co", "tagline" : "Your trusty glass hole", "description" : "Glass repair...", "hours" : "Mon Fri 8 5", "url" : "http://acmeglasshole.biz", "phone_numbers" : [ "5035550091", "8005555456" ], "coordinates" : [ 45.559294, -122.644053 ], "tags" : [ "glass", "mirrors", "flat glass" ], "location" : { "street_address" : "2035 NE Alberta St", "state" : { "_id" : ObjectId("4ce829379615522479000026"), "meta" : { "slug" : "or" }, "display_name" : "Oregon" }, "city" : { "_id" : ObjectId("4ce82abdd3dfaa10f8006faa"), "meta" : { "slug" : "portland", }, "display_name" : "Portland, OR" }, "zip" : { "_id" : ObjectId("4ce82c29d3dfaa116b006dfa"), "display_name" : "97211" } }}

Page 41: Modeling for Performance

a business... belongs to many categories

{ "_id" : ObjectId("4ce838ef4a882579960001b9"), "name" : "Acme Glass Co", "tagline" : "Your trusty glass hole", "description" : "Glass repair...", "hours" : "Mon Fri 8 5", "url" : "http://acmeglasshole.biz", "phone_numbers" : [ "5035550091", "8005555456" ], "coordinates" : [ 45.559294, -122.644053 ], "tags" : [ "glass", "mirrors", "flat glass" ], "location" : { "street_address" : "2035 NE Alberta St", "state" : { "_id" : ObjectId("4ce829379615522479000026"), "meta" : { "slug" : "or" }, "display_name" : "Oregon" }, "city" : { "_id" : ObjectId("4ce82abdd3dfaa10f8006faa"), "meta" : { "slug" : "portland", }, "display_name" : "Portland, OR" }, "zip" : { "_id" : ObjectId("4ce82c29d3dfaa116b006dfa"), "display_name" : "97211" } }, "categories" : [ { "_id" : ObjectId("4ce82e50d3dfaa16360004f2"), "meta" : { "slug" : "282-glass", "tags" : [ "windows" ], }, "display_name" : "Glass" }, { "_id" : ObjectId("4ce82e64d3dfaa16360014eb"), "meta" : { "slug" : "3063-auto-glass", "tags" : [ "windshields" ], }, "display_name" : "Auto Glass" } ]}

Page 42: Modeling for Performance

queries & indexesknow what you want

Page 43: Modeling for Performance

#1 find a businessI want *that* one

Page 44: Modeling for Performance

find a business

// single businessdb.businesses.findOne({ _id: ObjectId("4ce838ef4a882579960001b9")})

Page 45: Modeling for Performance

#2 find by locationBusinesses in San Francisco, CA

Page 46: Modeling for Performance

find businesses by state/city/zip

// find all within statedb.businesses.find({ "location.state._id": ObjectId("4ce82937961552247900000f")})

Page 47: Modeling for Performance

find businesses by state/city/zip

// find all within statedb.businesses.find({ "location.state._id": ObjectId("4ce82937961552247900000f")})

// find all within citydb.businesses.find({ "location.city._id": ObjectId("4ce82aa0d3dfaa10f8004a95")})

Page 48: Modeling for Performance

find businesses by state/city/zip

// find all within statedb.businesses.find({ "location.state._id": ObjectId("4ce82937961552247900000f")})

// find all within citydb.businesses.find({ "location.city._id": ObjectId("4ce82aa0d3dfaa10f8004a95")})

// find all within zipdb.businesses.find({ "location.zip._id": ObjectId("4ce82b5ed3dfaa116b0026f0")})

Page 49: Modeling for Performance

indexes

// the indexesdb.businesses.ensureIndex({"location.city._id": 1})db.businesses.ensureIndex({"location.zip._id": 1})

skip “location.state._id” -- only 51 possibilities

1.5GBeach

Page 50: Modeling for Performance

#3 find by categoryBusinesses in the Auto Repair category

Page 51: Modeling for Performance

businesses by category

// find by category iddb.businesses.find({ "categories._id": ObjectId("4ce82e50d3dfaa16360004f2")})

// the indexdb.businesses.ensureIndex({ "categories._id":1})

Page 52: Modeling for Performance

#4 - find by category + location Businesses in the Plumbing category in Chicago, IL

Page 53: Modeling for Performance

businesses by category + city

// find by city id and category iddb.businesses.find({ "location.city._id": ObjectId("4ce82aa0d3dfaa10f8004a95"), "categories._id": ObjectId("4ce82e50d3dfaa16360004f2")})

Page 54: Modeling for Performance

which index should we use?

// city id{"location.city._id":1}

// category id{"categories._id":1}

~ or ~

we need a compound indexanswer: both suck

Page 55: Modeling for Performance

which order?

db.businesses.ensureIndex({ "location.city._id" : 1, "categories._id" : 1})

db.businesses.ensureIndex({ "categories._id" : 1, "location.city._id" : 1})

~ or ~

answer: cities ! categories

35,000 cities & 2,500 categories

create one for zip codes and categories too!

Page 56: Modeling for Performance

don’t we have 2 indexes on city id?

answer: yes

{"location.city._id" : 1}{"location.city._id" : 1, "categories._id" : 1}

db.businesses.dropIndex("location.city._id_1")

Page 57: Modeling for Performance

#5 - find by keyword“something awesome” in Boulder, CO

Page 58: Modeling for Performance

find businesses in city by keyword

{ "_id" : ObjectId("4ce838ef4a882579960001b9"), "name" : "Acme Glass Co", "keywords" : [ "glass", "repair", "acme", ... ]}

db.businesses.ensureIndex({ "location.city._id":1, "keywords":1})

db.businesses.find({ "location.city._id":ObjectId("4ce82aa0d3dfaa10f8004a95"), "keywords":/glass/i})

Page 59: Modeling for Performance

chat with Kyle Banker

me: we’re switching from postgres+solr to mongo

kyle: oh wow, you can replace solr with mongo?

me: with some creativity

kyle: seems like it’d still be hard to get just right

me: it works well

kyle: gotcha

Page 60: Modeling for Performance

i was wrong, kyle was right

Page 61: Modeling for Performance

I’ll never leave you again

...until MongoDB supports full text later this year:)

I

Page 62: Modeling for Performance

aggregationmap/reduce to the rescue

Page 63: Modeling for Performance

sitemapsbig list of every url

Page 64: Modeling for Performance

sitemaps

• xml files containing each unique url ~ 24M

• 50,000 urls per file, about 500 files

• urls are generated from live data

• http://companyx.com/sitemaps/1.xml

Page 65: Modeling for Performance

partition by consistent hash

>> "hello!".hash % 6 #=> 5

>> "/ny/new-york/c/apartments".hash % 6 #=> 5

returns an integer between 0 and the number specified

Page 66: Modeling for Performance

map/reduce

1. map each url in the site to a partition

2. reduce all partitions to a single document containing all urls in that partition

3. save to a permanent collection

Page 67: Modeling for Performance

map

/il/chicago/c/pizza 4/ny/new-york/c/apartments 1nd/rugby/c/apartments 6/14076500-bayside-marina 2/13401000-comtrak-logistics-inc 3/12347500-allstate-auto-insurance 1il/downers-grove/c/computer-web-design 6/1009500-heidelberg-lodges 5mn/redwood-falls/c/food-service 4/14077000-bank-of-america 5mn/savage/c/audio-visual-equipment 1...

1

2

3

4

5

6

Page 68: Modeling for Performance

reduce

{ "total" : 2, "urls" : [ "/12347500-allstate-auto-insurance", "/ny/new-york/c/apartments" ]}

{ "total" : 1, "urls" : [ "/mn/savage/c/audio-visual-equipment" ]}

{ "_id" : 1, "value" : { "total" : 2, "urls" : [ "/12347500-allstate-auto-insurance", "/mn/savage/c/audio-visual-equipment", "/ny/new-york/c/apartments" ] }}

Page 69: Modeling for Performance

usage

db.sitemaps.findOne({_id:1}).value.urls

[ "/12347500-allstate-auto-insurance", "/mn/savage/c/audio-visual-equipment", "/ny/new-york/c/apartments"]

Page 70: Modeling for Performance

wrap up

Page 71: Modeling for Performance

2 months later

115ms average response times

Page 72: Modeling for Performance

thank you@michaeldwan