Download - Capacity Planning For Your Growing MongoDB Cluster

Solution Architect, MongoDB

Sam Weaver

Capacity Planning:

Deploying MongoDB

#mongodb

Capacity Planning

• Why is it important?

• What is it?

• When is it important?

• How is it actually done?

Prepping for launch

• You’ve written your application

• The code is good

• You’re looking to launch soon

• How do I deploy?

Questions to ask yourself

• Instance types

– Standalone?

– Replica set?

– Sharded?

• Architecture

• Size of machines

– Machines cost money

– Size of machines may affect instance types required

• What are the consequences of not planning?

Why does it matter?

Why

• Once we launch, we don't want to have avoidable down time due to poorly selected HW

• As our success grows we want to stay in front of the demand curve

• We want to meet business' and users' expectations

• We want to keep our jobs

What is Capacity Planning?

Requirements

Resources

Requirements

• Availability

– Planning for a crash

– Planning for binary upgrades

– Planning for hardware maintenance

• Throughput

– X many users at any one time

– Bulk loads vs. random access

• Responsiveness

– SLA of x ms per page load

– Amazon, Google study

CPU

• Non-indexed Data

• Sorting

• Aggregation

– Map/Reduce

– Framework

• Data

– Fields

– Nesting

– Arrays/Embedded-Docs

Network

• Latency

– WriteConcern

– ReadPreference

– Batching

• Throughput

– Update/Write Patterns

– Reads/Queries

Understand memory usage for MongoDB

• Data & indexes memory mapped into virtual

address space

• Data accessed is paged into RAM

• OS evicts least recently used page

• More frequently used pages stay in RAM

Identify your working set

Number of active users on the system at any one time

Number of distinct pages accessed per second

=

Working Set

Working Set

4 distinct pages per second

RAM

Disk

Working Set


Working Set


Worst case 4 disk accesses

Working Set


Working Set


Worst case disk access on every op

Memory & Storage

MOPs

PFs

Memory

• Working set affected by

–Sorting

–Aggregation

–Connections

SORTS

Connections

Aggregations

Working Set Estimator

"workingSet" : {

"note" : "thisIsAnEstimate",

"pagesInMemory" : <num>,

"computationTimeMicros" : <num>,

"overSeconds" : num

}

Number of unique pages the server needed in the last

15 minutes. Use this to see if you are growing out

RAM

Storage• Different storage have different IOPs

– Spinning disk

• 7,500k SATA 75-100 IOPs

– SSD

• 9,000-120,000 IOPs

– EBS

• 100 IOPs

– Provisioned EBS

• 2,000 IOPs

• Work out how much data you need to write per time frame.

• MongoDB writes to a journal and datafiles flush to disk.

• Replication adds oplog considerations

Using this information

• Plan hardware to hold the working set + indexes

• Allow room to grow

• If working set is larger than RAM and you can’t

reasonably add more resources, then shard

– Don’t shard too early

– Lots of little instances vs. a few big instances

• Think about architecture

– Local disk or central storage

– Don’t be surprised with x copies of data with x number of

nodes

Development to production

• Don’t be surprised by:

– More data = more/larger indexes

– Indexes make your working set bigger

• Replication adds a network overhead

• Journal has different access patterns

What tools are there to help me?

IOStat

MongoStat

MongoPerf

• Measure amount of data written to device per

second

MongoDB Management Service

• Free Cloud or On-Premise based management tool

– Monitoring

– Automation

– Backup

Scaling for capacity – MMS automation

Capacity Planning: When

• When?

– Before it's too late!

– Iterative process

Start Launch Version 2

Repeat (continuously)

• Repeat Testing

• Repeat Evaluations

• Repeat Deployment

What is failure?

• We have failed at Capacity Planning when our

resources don’t meet our requirements

• Because our requirements can have many

dimensions, we may exceed our requirements in

one characteristic but not meet them in another

• This means that we can spend many $$$ and still

fail!

Models

• Load/Users

– Response Time/TTFB

• System Performance

– Peak Usage

– Min Usage

Starter Questions

• What is the working set?

– How does that equate to memory

– How much disk access will that require

• How efficient are the queries?

• What is the rate of data change?

• How big are the highs and lows?

Questions?

Solution Architect, MongoDB

Sam Weaver

Thank You

#mongodb

Download - Capacity Planning For Your Growing MongoDB Cluster

Top Related