optimizing mongodb: lessons learned at localytics
DESCRIPTION
MongoDB Optimizations done at Localytics to improve throughput while reducing cost.TRANSCRIPT
Optimizing MongoDB:
Lessons Learned at Localytics
Benjamin DarflerMongoBoston - September 2011
Introduction
• Benjamin Darflero @bdarflero http://bdarfler.como Senior Software Engineer at Localytics
• Localyticso Real time analytics for mobile applicationso 100M+ datapoints a dayo More than 2x growth over the past 4 monthso Heavy users of Scala, MongoDB and AWS
• This Talko Revised and updated from MongoNYC 2011
MongoDB at Localytics
• Use caseso Anonymous loyalty informationo De-duplication of incoming data
• Scale todayo Hundreds of GBs of data per shardo Thousands of ops per second per shard
• Historyo In production for ~8 monthso Increased load 10x in that timeo Reduced shard count by more than a half
Disclaimer
These steps worked for us and our dataWe verified them by testing early and often
You should too
Quick Poll
• Who is using MongoDB in production?
• Who is deployed on AWS?
• Who has a sharded deployment?o More than 2 shards?o More than 4 shards?o More than 8 shards?
Optimizing Our Data
Documents and Indexes
Shorten Names
Before{super_happy_fun_awesome_name:"yay!"}
After{s:"yay!"}
• Significantly reduced document size
Use BinData for uuids/hashes
Before{u:"21EC2020-3AEA-1069-A2DD-08002B30309D"}
After{u:BinData(0, "...")}
• Used BinData type 0, least overhead• Reduced data size by more then 2x over UUID• Reduced index size on the field
Override _id
Before{_id:ObjectId("..."), u:BinData(0, "...")}
After {_id:BinData(0, "...")}
• Reduced data size• Eliminated an index• Warning: Locality - more on that later
Pre-aggregate
Before{u:BinData(0, "..."), k:BinData(0, "abc")}{u:BinData(0, "..."), k:BinData(0, "abc")}{u:BinData(0, "..."), k:BinData(0, "def")}
After{u:BinData(0, "abc"), c:2}{u:BinData(0, "def"), c:1}
• Actually kept data in both forms• Fewer records meant smaller indexes
Prefix Indexes
Before{k:BinData(0, "...")} // indexed
After{p:BinData(0, "...") // prefix of k, indexeds:BinData(0, "...") // suffix of k, not indexed}
• Reduced index size• Warning: Prefix must be sufficiently unique• Would be nice to have it built in - SERVER-3260
Sparse Indexes
Create a sparse indexdb.collection.ensureIndex({middle:1}, {sparse:true});
Only indexes documents that contain the field{u:BinData(0, "abc"), first:"Ben", last:"Darfler"}{u:BinData(0, "abc"), first:"Mike", last:"Smith"}{u:BinData(0, "abc"), first:"John", middle:"F", last:"Kennedy"}
• Fewer records meant smaller indexes• New in 1.8
Upgrade to {v:1} indexes
• Upto 25% smaller• Upto 25% faster• New in 2.0• Must reindex after upgrade
Optimizing Our Queries
Reading and Writing
You are using an index right?
Create an indexdb.collection.ensureIndex({user:1});
Ensure you are using itdb.collection.find(query).explain();
Hint that it should be used if its notdb.collection.find({user:u, foo:d}).hint({user:1});
• I've seen the wrong index used beforeo open a bug if you see this happen
Only as much as you need
Beforedb.collection.find();
Afterdb.collection.find().limit(10);db.collection.findOne();
• Reduced bytes on the wire• Reduced bytes read from disk• Result cursor streams data but in large chunks
Only what you need
Beforedb.collection.find({u:BinData(0, "...")});
Afterdb.collection.find({u:BinData(0, "...")}, {field:1});
• Reduced bytes on the wire• Necessary to exploit covering indexes
Covering Indexes
Create an indexdb.collection.ensureIndex({first:1, last:1});
Query for data only in the indexdb.collection.find({last:"Darfler"}, {_id:0, first:1, last:1});
• Can service the query entirely from the index• Eliminates having to read the data extent• Explicitly exclude _id if its not in the index• New in 1.8
Prefetch
Beforedb.collection.update({u:BinData(0, "...")}, {$inc:{c:1}});
Afterdb.collection.find({u:BinData(0, "...")});db.collection.update({u:BinData(0, "...")}, {$inc:{c:1}});
• Prevents holding a write lock while paging in data• Most updates fit this pattern anyhow• Less necessary with yield improvements in 2.0
Optimizing Our Disk
Fragmentation
Inserts
doc1
doc2
doc3
doc4
doc5
Deletes
doc1
doc2
doc3
doc4
doc5
doc1
doc2
doc3
doc4
doc5
Updates
doc1
doc2
doc3
doc4
doc5
doc1
doc2
doc3
doc4
doc5
doc3
Updates can be in place if the document doesn't grow
Reclaiming Freespace
doc1
doc2
doc6
doc4
doc5
doc1
doc2
doc3
doc4
doc5
Memory Mapped Files
doc1
doc2
doc6
doc4
doc5
}}
page
page
Data is mapped into memory a full page at a time
Fragmentation
RAM used to be filled with useful dataNow it contains useless space or useless data
Inserts used to cause sequential writesNow inserts cause random writes
Fragmentation Mitigation
• Automatic Padding o MongoDB auto-pads recordso Manual tuning scheduled for 2.2
• Manual Paddingo Pad arrays that are known to growo Pad with a BinData field, then remove it
• Free list improvement in 2.0 and scheduled in 2.2
Fragmentation Fixes
• Repairo db.repairDatabase(); o Run on secondary, swap with primaryo Requires 2x disk space
• Compacto db.collection.runCommand( "compact" );o Run on secondary, swap with primaryo Faster than repairo Requires minimal extra disk spaceo New in 2.0
• Repair, compact and import remove padding
Optimizing Our Keys
Index and Shard
B-Tree Indexes - hash/uuid key
Hashes/UUIDs randomly distribute across the whole b-tree
B-Tree Indexes - temporal key
Keys with a temporal prefix (i.e. ObjectId) are right aligned
Migrations - hash/uuid shard key
Chunk 1k: 1 to 5
Chunk 2k: 6 to 9
Shard 1 Shard 2
Chunk 1k: 1 to 5
{k: 4, …}
{k: 8, …}
{k: 3, …}
{k: 7, …}
{k: 5, …}
{k: 6, …}
{k: 4, …}
{k: 3, …}
{k: 5, …}
Hash/uuid shard key
• Distributes read/write load evenly across nodes• Migrations cause random I/O and fragmentation
o Makes it harder to add new shards• Pre-split
o db.runCommand({split:"db.collection", middle:{_id:99}});
• Pre-moveo db.adminCommand({moveChunk:"db.collection", find:{_id:5}, to:"s2"});
• Turn off balancero db.settings.update({_id:"balancer"}, {$set:{stopped:true}}, true});
Migrations - temporal shard key
Chunk 1k: 1 to 5
Chunk 2k: 6 to 9
Shard 1 Shard 2
Chunk 1k: 1 to 5
{k: 3, …}
{k: 4, …}
{k: 5, …}
{k: 6, …}
{k: 7, …}
{k: 8, …}
{k: 3, …}
{k: 4, …}
{k: 5, …}
Temporal shard key
• Can cause hot chunks• Migrations are less destructive
o Makes it easier to add new shards• Include a temporal prefix in your shard key
o {day: ..., id: ...}• Choose prefix granularity based on insert rate
o low 100s of chunks (64MB) per "unit" of prefixo i.e. 10 GB per day => ~150 chunks per day
Optimizing Our Deployment
Hardware and Configuration
Elastic Compute Cloud
• Noisy Neighboro Used largest instance in a family (m1 or m2)
• Used m2 family for mongodso Best RAM to dollar ratio
• Used micros for arbiters and config servers
Elastic Block Storage
• Noisy Neighboro Netflix claims to only use 1TB disks
• RAID'ed our diskso Minimum of 4-8 diskso Recommended 8-16 diskso RAID0 for write heavy workloado RAID10 for read heavy workload
Pathological Test
• What happens when data far exceeds RAM?o 10:1 read/write ratioo Reads evenly distributed over entire key space
One Mongod
Index out of RAM
Index in RAM
• One mongod on the hosto Throughput drops more than 10x
Many Mongods
Index out of RAM
Index in RAM
• 16 mongods on the hosto Throughput drops less than 3xo Graph for one shard, multiply by 16x for total
Sharding within a node
• One read/write lock per mongod o Ticket for lock per collection - SERVER-1240o Ticket for lock per extent - SERVER-1241
• For in memory work loado Shard per core
• For out of memory work loado Shard per disk
• Warning: Must have shard key in every queryo Otherwise scatter gather across all shardso Requires manually managing secondary keys
• Less necessary in 2.0 with yield improvements
Reminder
These steps worked for us and our dataWe verified them by testing early and often
You should too
Questions?
@bdarflerhttp://bdarfler.com