mongodb and amazon web services: storage options for mongodb deployments
DESCRIPTION
When using MongoDB and AWS, you want to design your infrastructure to avoid storage bottlenecks and make the best use of your available storage resources. AWS offers a myriad of storage options, including ephemeral disks, EBS, Provisioned IOPS, and ephemeral SSD's, each offering different performance and persistence characteristics. In this session, we’ll evaluate each of these options in the context of your MongoDB deployment, assessing the benefits and drawbacks of each.TRANSCRIPT
MongoDB and AWSStorage Configurations
Senior Solutions Architect, MongoDB Inc.
Sandeep Parikh
#mongodb
Quick Recap
• Deployment and Availability– MongoDB Basics– Deployment Configurations– Instance Types– Best Practices
• Slides and Recording:– http://www.mongodb.com/presentations/mongodb
-and-amazon-web-services-deploying-high-availability
Agenda
• Storage Options
• Simple Recommendations
• Backup and Restore
• Advanced Configurations
• Drawbacks/Tradeoffs
• Next Steps
Storage Options
AWS Storage Options
• Instance-based (ephemeral)
• Elastic Block Store (persistent)
• Simple Storage Service (S3)
• Glacier
MongoDB Storage Elements
• Data
• Journal
• Logs
• Snapshots
• Archived Backups
Instance
• Data• Log• Journal
EBS
• Data• Log• Journal• Snapsho
ts
S3
• Snapshots
• Archived Backups
Glacier
• Archived Backups
MongoDB Elements & AWS Storage
Data Lifecycle
Instance Storage
• Ephemeral– If you’re instance is stopped or terminated,
ephemeral storage is lost (!)
• Configurations– Single or multiple volumes per instance
• Management– LVM for RAID or snapshots
EBS
• Persistent– Allocated and attached to individual instances like
network-attached storage– Storage lifecycle independent of instances
• Configuration– Single or multiple volumes per instance
• Management– LVM or MD for RAID– EBS Snapshots (Console or API)
Standard EBS
Standard volumes are designed for applications with moderate I/O requirements. They are also well-suited for use as boot volumes or applications where I/O can be bursty.
• Performance is somewhat variable
• Average of 100 IOPS
• Possible to aggregate via RAID but underlying bursty nature still exists
Provisioned IOPS EBS
Provisioned IOPS volumes offer storage with consistent and low-latency performance, and are designed for applications with I/O-intensive workloads such as databases.
• Consistent volume I/O performance
• Available with 100-4000 IOPS per volume
• Launch with EBS-Optimized– Adds additional network bandwidth for EBS
volumes
Measuring IOPS
• Volumes are optimized for 4 KB per operation
• MongoDB document sizes and workload patterns will affect throughput
• Use mongoperf to test disk configuration– Threads– Data file size– Document size
Simple Recommendations
Multiple EBS Volumes
• Provisioned IOPS EBS
• EBS-optimized
• Separate volumes for– Data– Journal– Log
• Decrease disk contention during high load
Disk Configurations
• Mirror or stripe multiple disks (or both)– LVM– MDADM
• Different implications for each RAID level– Durability– Performance– Cost
Aggregating IOPS
• Single volumes capable of 4000 IOPS
• Stripe volumes to aggregate IOPS (RAID0, RAID10)
• Note: network bandwidth is the limiting factor
MongoDB on AWS Marketplace
MongoDB on AWS Marketplace
MongoDB Configurations
• Follows MongoDB best practices– Amazon Linux, MongoDB installed via yum– EBS PIOPS volumes per mount (data, log, journal)– Configured: ulimits, read ahead, keep alive
ConfigData Log Journal
Size IOPS Size IOPS Size IOPS
1000 IOPS
200 GB 1000 10 GB 100 25 GB 250
2000 IOPS
200 GB 2000 15 GB 150 25 GB 250
4000 IOPS
400 GB 4000 20 GB 200 25 GB 250
Backup and Restore
Data Safety
• What’s your backup plan?
• Have you tested restoring?
• Is your data highly available?
• How do you recover from disaster?
Protecting Your Data
• Replica Sets– Proper deployments provide HA and DR
• Manual backup/restore– Scriptable, tuneable
• MMS Backup– Continuous, secure backup
Manual Backup Procedures
EBS• EBS Snapshots• LVM Snapshots
Ephemeral• LVM Snapshots
Note:
• EBS snapshots can be done “hot” but for MongoDB it’s better to fsyncLock()
• LVM snapshots require enough free space on instance to store snapshot
Restore
• Boot new or use existing instance
• Create new volume from EBS snapshot and attach
or
• Copy over LVM snapshot and create/mount LV
LVM
• Copy snapshots to S3 bucket
• Create lifecycle rules to move data from bucket to Glacier
EBS
• Mount volume from snapshot
• Copy volume data to S3 bucket
• Create lifecycle rules to move data from bucket to Glacier
Archiving Backups
MongoDB Management Service
MMS Backup
Fully-managed, agent-based,
continuous backup
Custom snapshot scheduling and
retention
Point-in-time recovery and
consistent snapshots across sharded clusters
Performance impact similar to
Secondary
Encrypted data transfer
Restores require 2-factor
authentication
MMS Backup In-Depth
Advanced Configurations
Standard Ephemeral Storage
• Remember, it’s ephemeral
• Technically feasible
• Lack of persistence is a big negative
• Any benefits can’t outweigh the negatives
Ephemeral SSDs
• Performance ceiling might outweigh typical negatives
• Cost implications: SSD-backed instances are more expensive
• Does your workload truly need flash?– Profile early and often to make this determination
• How many drives do you need?– Drives instance choice
RAID
SSD and MongoDB Configurations
SSD
mongod
SSD
SSD
SSD
SSD
SSD
SSD
SSD
SSD
SSD
SSD
SSD
SSD
SSD
SSD
SSD
mongod
mongod
mongod
mongod
mongod
mongod
mongod
mongod
SSD Deployment Strategies
• SSD deployments– Replica Sets
and – MMS Backup
• High performance
• Highly available
• Continuous backup
mongodPrimary
mongodSecondar
y
mongodSecondar
y
MMS Backup Agent
SSD Deployment Considerations
• One Secondary could use EBS
• Will need to have an instance with – High network bandwidth and – Mutliple EBS volumes aggregated to approach
IOPS parity
• Key is avoiding significant replication lag because of IO performance dropoff
Drawbacks & Tradeoffs
Considerations
• Performance
• Consistency
• Safety
• Flexibility
• Scalability
Best Practices
• Prototype > Test > Scale
• IO on AWS is easy to scale
• AWS makes it easy to iterate deployment– Start small– Profile your workload– Remove all other bottlenecks– Add instance and IO capacity
Recommended Starting Points
• EBS-Optimized and PIOPS EBS
• M1.large is an effective starting point for profiling an early production deployment
• Use volumes with 250 or 500 IOPS for data to start– A dding more IOPS is as easy– Snapshot and recreate with more capacity
Questions?
Resources
• MMS Monitoring and Backup– http://mms.mongodb.com
• MongoDB on AWS best practices:– http://bit.ly/deploy-mongodb-ec2
• MongoDB on AWS Marketplace:– http://bit.ly/aws-marketplace-mongodb
• MongoDB docs– http://docs.mongodb.org
MongoDB WorldNew York City, June 23-25
#MongoDBWorld
See what’s next in MongoDB including • MongoDB 2.6• Sharding• Replication• Aggregation
http://world.mongodb.comSave 25% with discount code 25SandeepParikh