hardware provisioning
TRANSCRIPT
Solution Architect, MongoDB
@ctindel
Chad Tindel
#MongoDBWorld
Hardware Provisioning
MongoDB is so easy for programmers….
Even a baby can write an application!
MongoDB is so easy to manage with MMS…
Even a baby can manage a cluster!
Hardware Selection for MongoDB is….
Not so easy!
Text Over Photo
A Cautionary Tale
The methodology (in theory)
Requirements – Step One
• It is impossible to properly size a MongoDB
cluster without first documenting your business
requirements
• Availability: what is your uptime requirement?
• Throughput
• Responsiveness
– what is acceptable latency?
– is higher latency during peak times acceptable?
Requirements – Step Two
• Understand your own resources available to you
– Storage
– Memory
– Network
– CPU
• Many customers limited to the options available in
AWS or presented by their own Enterprise
Virtualization team
Continuing Requirements – Step Three
• Once you deploy initially, it is common for requirements to
change– More users added to the application
• Causes more queries and a larger working set
– New functionality changes queries patterns
• New indexes added causes a larger working set
– What started as a read-intensive application can add more and more write-heavy workloads
• More write-locking increases reader queue depth
• You must monitor and collect metrics and update your
hardware selection as necessary (scale up / Add RAM? Add
more shards?)
Run a Proof of Concept
• Forces you to:– Do schema / index design– Understand query patterns
– Get a handle on Working Set size
• Start small on a single node– See how much performance you can get from one box
• Add replication, then add sharding– Understand how these affect performance in your use case
• POC can be done on a smaller scale to infer what will be
needed for production
POC – Requirements to Gather
Data Sizes
– Total Number of Documents
– Average Document Size
– Size of Data on Disk
– Size of Indexes on Disk
– Expected growth
– What is your document model?
• Ingestion
– Insertions / Updates / Deletes per second, peak &
average
– Bulk inserts / updates? If so, how large and how often?
POC – Requirements to Gather
• Query Patterns and Performance Expectations
– Read Response SLA
– Write Response SLA
– Range queries or single document queries?
– Sort conditions
– Is more recent data queried more frequently?
• Data Policies
– How long will you keep the data for?
– Replication Requirements
– Backup Requirements / Time to Recovery
POC – Requirements to Gather
• Multi-datacenter Requirements
– Number and location of datacenters
– Cross DC latency
– Active / Active or Active / Passive?
– Geographical / Data locality requirements?
• Security Requirements
– Encryption over the wire (SSL) ?
– Encryption of data at rest?
Resource Usage
• Storage
– IOPS
– Size
– Data & Loading Patterns
• Memory
– Working Set
• CPU
– Speed
– Cores
• Network
– Latency
– Throughput
Storage Capability
7,200 rpm SATA ~ 75-100 IOPS
15,000 rpm SAS ~ 175-210 IOPS
Amazon SSD EBS ~ 4000 PIOPS / Volume
~ 48,000 PIOPS / Instance
Intel X25-E (SLC) ~ 5,000 IOPS
Fusion IO ~ 135,000 IOPS
Violin Memory 6000 ~ 1,000,000 IOPS
Storage Measuring
Storage Measuring
Memory Measuring
• Added in 2.4– workingSet option on db.serverStatus()
> db.serverStatus( { workingSet: 1 } )
Network
• Latency
– WriteConcern
– ReadPreference
• Throughput
– Update/Write Patterns
– Reads/Queries
• Come to love netperf
CPU Usage
• Non-indexed Queries
• Sorting
• Aggregation
– Map/Reduce
– Framework
Case Studies (theory applied)
Case Study #1: A Spanish Bank
• Problem statement: want to store 6 months worth of
logs
• 18TB of total data (3 TB/month)
• Primarily analyzing the last month’s worth of logs, so
Working Set Size is 1 month’s worth of data (3TB)
plus indexes (1TB) = 4 TB Working Set
Case Study #1: Hardware Selection
• QA Environment– Did not want to mirror a full production cluster. Just
wanted to hold 2TB of data– 3 nodes / shard * 4 shards = 12 physical machines– 2 mongos – 3 config servers (virtual machines)
• Production Environment– 3 nodes / shard * 36 shards = 108 physical machines– 128GB/RAM * 36 = 4.6 TB RAM– 2 mongos – 3 config servers (virtual machines)
Case Study #2: A Large Online Retailer
• Problem statement: Moving their product catalog
from SQL Server to MongoDB as part of a larger
architectural overhaul to Open Source Software
• 2 main datacenters running active/active
• On Cyber Monday they peaked at 214 requests/sec,
so let’s budget for 400 requests/sec to give some
headroom
Case Study #2: The POC
• A POC yielded the following numbers:
– 4 million product SKUs, average JSON document size
30KB
• Need to service requests for:
– a specific product (by _id)
– Products in a specific category (i.e. “Desks” or “Hard
Drives”)
• Returns 72 documents, or 200 if it’s a google bot
crawling)
Case Study #2: The Math
• Want to partition (Shard) by category, and have
products that exist in multiple categories duplicated
– The average product appears in 2 categories, so we
actually need to store 8M SKU documents, not 4M
• 8M docs * 30KB/doc = 240GB of data
• 270 GB with indexes
• Working Set is 100% of all data + indexes as this is
a core functionality that must be fast at all times
Case Study #2: Our Recommendation
• MongoDB initial recommendation was to deploy a single
Replica Set with enough RAM in each server to hold all the
data (at least 384GB RAM/server)
• 4 node Replica Set (2 nodes in each DC, 1 arbiter in a 3rd DC)– Allows for a node in each DC to go down for maintenance or system
crash while still servicing the application centers in that datacenter
• Deploy using secondary reads (NEAREST read preference)
• This avoids the complexity of sharding, setting up mongos,
config servers, worrying about orphaned documents, etc.
Node 1
Primary
Node 2
Secondary
Node 3
Secondary
Node 3
Secondary
Datacenter 3
Arbiter
Datacenter 1 Datacenter 2
Case Study #2: Actual Provisioning
• Customer decided to deploy on their corporate
VMWare Cloud
• IT would not give them nodes any bigger than 64
GB RAM
• Decided to deploy 3 shards (4 nodes each + arbiter)
= 192 GB/RAM cluster wide into a staging
environment and add a fourth shard if staging
proves it would be worthwhile
Key Takeaways
• Document your performance requirements up front
• Conduct a Proof of Concept
• Always test with a real workload
• Constantly monitor and adjust based on changing
requirements
Solution Architect, MongoDB
Chad Tindel
#MongoDBWorld
Thank You