optimizing mongodb: lessons learned at localytics

Optimizing MongoDB:

Lessons Learned at Localytics

Benjamin DarflerMongoBoston - September 2011

Introduction

• Benjamin Darflero @bdarflero http://bdarfler.como Senior Software Engineer at Localytics

• Localyticso Real time analytics for mobile applicationso 100M+ datapoints a dayo More than 2x growth over the past 4 monthso Heavy users of Scala, MongoDB and AWS

• This Talko Revised and updated from MongoNYC 2011

MongoDB at Localytics

• Use caseso Anonymous loyalty informationo De-duplication of incoming data

• Scale todayo Hundreds of GBs of data per shardo Thousands of ops per second per shard

• Historyo In production for ~8 monthso Increased load 10x in that timeo Reduced shard count by more than a half

Disclaimer

These steps worked for us and our dataWe verified them by testing early and often

You should too

Quick Poll

• Who is using MongoDB in production?

• Who is deployed on AWS?

• Who has a sharded deployment?o More than 2 shards?o More than 4 shards?o More than 8 shards?

Optimizing Our Data

Documents and Indexes

Shorten Names

Before{super_happy_fun_awesome_name:"yay!"}

After{s:"yay!"}

• Significantly reduced document size

Use BinData for uuids/hashes

Before{u:"21EC2020-3AEA-1069-A2DD-08002B30309D"}

After{u:BinData(0, "...")}

• Used BinData type 0, least overhead• Reduced data size by more then 2x over UUID• Reduced index size on the field

Override _id

Before{_id:ObjectId("..."), u:BinData(0, "...")}

After {_id:BinData(0, "...")}

• Reduced data size• Eliminated an index• Warning: Locality - more on that later

Pre-aggregate

Before{u:BinData(0, "..."), k:BinData(0, "abc")}{u:BinData(0, "..."), k:BinData(0, "abc")}{u:BinData(0, "..."), k:BinData(0, "def")}

After{u:BinData(0, "abc"), c:2}{u:BinData(0, "def"), c:1}

• Actually kept data in both forms• Fewer records meant smaller indexes

Prefix Indexes

Before{k:BinData(0, "...")} // indexed

After{p:BinData(0, "...") // prefix of k, indexeds:BinData(0, "...") // suffix of k, not indexed}

• Reduced index size• Warning: Prefix must be sufficiently unique• Would be nice to have it built in - SERVER-3260

Sparse Indexes

Create a sparse indexdb.collection.ensureIndex({middle:1}, {sparse:true});

Only indexes documents that contain the field{u:BinData(0, "abc"), first:"Ben", last:"Darfler"}{u:BinData(0, "abc"), first:"Mike", last:"Smith"}{u:BinData(0, "abc"), first:"John", middle:"F", last:"Kennedy"}

• Fewer records meant smaller indexes• New in 1.8

Upgrade to {v:1} indexes

• Upto 25% smaller• Upto 25% faster• New in 2.0• Must reindex after upgrade

Optimizing Our Queries

Reading and Writing

You are using an index right?

Create an indexdb.collection.ensureIndex({user:1});

Ensure you are using itdb.collection.find(query).explain();

Hint that it should be used if its notdb.collection.find({user:u, foo:d}).hint({user:1});

• I've seen the wrong index used beforeo open a bug if you see this happen

Only as much as you need

Beforedb.collection.find();

Afterdb.collection.find().limit(10);db.collection.findOne();

• Reduced bytes on the wire• Reduced bytes read from disk• Result cursor streams data but in large chunks

Only what you need

Beforedb.collection.find({u:BinData(0, "...")});

Afterdb.collection.find({u:BinData(0, "...")}, {field:1});

• Reduced bytes on the wire• Necessary to exploit covering indexes

Covering Indexes

Create an indexdb.collection.ensureIndex({first:1, last:1});

Query for data only in the indexdb.collection.find({last:"Darfler"}, {_id:0, first:1, last:1});

• Can service the query entirely from the index• Eliminates having to read the data extent• Explicitly exclude _id if its not in the index• New in 1.8

Prefetch

Beforedb.collection.update({u:BinData(0, "...")}, {$inc:{c:1}});

Afterdb.collection.find({u:BinData(0, "...")});db.collection.update({u:BinData(0, "...")}, {$inc:{c:1}});

• Prevents holding a write lock while paging in data• Most updates fit this pattern anyhow• Less necessary with yield improvements in 2.0

Optimizing Our Disk

Fragmentation

Inserts

doc1

doc2

doc3

doc4

doc5

Deletes

doc1

doc2

doc3

doc4

doc5

doc1

doc2

doc3

doc4

doc5

Updates

doc1

doc2

doc3

doc4

doc5

doc1

doc2

doc3

doc4

doc5

doc3

Updates can be in place if the document doesn't grow

Reclaiming Freespace

doc1

doc2

doc6

doc4

doc5

doc1

doc2

doc3

doc4

doc5

Memory Mapped Files

doc1

doc2

doc6

doc4

doc5

}}

page

page

Data is mapped into memory a full page at a time

Fragmentation

RAM used to be filled with useful dataNow it contains useless space or useless data

Inserts used to cause sequential writesNow inserts cause random writes

Fragmentation Mitigation

• Automatic Padding o MongoDB auto-pads recordso Manual tuning scheduled for 2.2

• Manual Paddingo Pad arrays that are known to growo Pad with a BinData field, then remove it

• Free list improvement in 2.0 and scheduled in 2.2

Fragmentation Fixes

• Repairo db.repairDatabase(); o Run on secondary, swap with primaryo Requires 2x disk space

• Compacto db.collection.runCommand( "compact" );o Run on secondary, swap with primaryo Faster than repairo Requires minimal extra disk spaceo New in 2.0

• Repair, compact and import remove padding

Optimizing Our Keys

Index and Shard

B-Tree Indexes - hash/uuid key

Hashes/UUIDs randomly distribute across the whole b-tree

B-Tree Indexes - temporal key

Keys with a temporal prefix (i.e. ObjectId) are right aligned

Migrations - hash/uuid shard key

Chunk 1k: 1 to 5

Chunk 2k: 6 to 9

Shard 1 Shard 2

Chunk 1k: 1 to 5

{k: 4, …}

{k: 8, …}

{k: 3, …}

{k: 7, …}

{k: 5, …}

{k: 6, …}

{k: 4, …}

{k: 3, …}

{k: 5, …}

Hash/uuid shard key

• Distributes read/write load evenly across nodes• Migrations cause random I/O and fragmentation

o Makes it harder to add new shards• Pre-split

o db.runCommand({split:"db.collection", middle:{_id:99}});

• Pre-moveo db.adminCommand({moveChunk:"db.collection", find:{_id:5}, to:"s2"});

• Turn off balancero db.settings.update({_id:"balancer"}, {$set:{stopped:true}}, true});

Migrations - temporal shard key

Chunk 1k: 1 to 5

Chunk 2k: 6 to 9

Shard 1 Shard 2

Chunk 1k: 1 to 5

{k: 3, …}

{k: 4, …}

{k: 5, …}

{k: 6, …}

{k: 7, …}

{k: 8, …}

{k: 3, …}

{k: 4, …}

{k: 5, …}

Temporal shard key

• Can cause hot chunks• Migrations are less destructive

o Makes it easier to add new shards• Include a temporal prefix in your shard key

o {day: ..., id: ...}• Choose prefix granularity based on insert rate

o low 100s of chunks (64MB) per "unit" of prefixo i.e. 10 GB per day => ~150 chunks per day

Optimizing Our Deployment

Hardware and Configuration

Elastic Compute Cloud

• Noisy Neighboro Used largest instance in a family (m1 or m2)

• Used m2 family for mongodso Best RAM to dollar ratio

• Used micros for arbiters and config servers

Elastic Block Storage

• Noisy Neighboro Netflix claims to only use 1TB disks

• RAID'ed our diskso Minimum of 4-8 diskso Recommended 8-16 diskso RAID0 for write heavy workloado RAID10 for read heavy workload

Pathological Test

• What happens when data far exceeds RAM?o 10:1 read/write ratioo Reads evenly distributed over entire key space

One Mongod

Index out of RAM

Index in RAM

• One mongod on the hosto Throughput drops more than 10x

Many Mongods

Index out of RAM

Index in RAM

• 16 mongods on the hosto Throughput drops less than 3xo Graph for one shard, multiply by 16x for total

Sharding within a node

• One read/write lock per mongod o Ticket for lock per collection - SERVER-1240o Ticket for lock per extent - SERVER-1241

• For in memory work loado Shard per core

• For out of memory work loado Shard per disk

• Warning: Must have shard key in every queryo Otherwise scatter gather across all shardso Requires manually managing secondary keys

• Less necessary in 2.0 with yield improvements

Reminder

These steps worked for us and our dataWe verified them by testing early and often

You should too

Questions?

@bdarflerhttp://bdarfler.com

optimizing mongodb: lessons learned at localytics

Technology

sparse index db

limit10 db

data documents

useful data

data extent

index new

overhead reduced data

reduced index size warning