running mongodb on aws

Post on 09-Jul-2015

2.065 Views

Category:

Technology

5 Downloads

Preview:

Click to see full reader

DESCRIPTION

This presentation covers best practices for running MongoDB on AWS. We also discuss how to utilize the automation features of MMS to spin up new clusters in minutes on AWS.

TRANSCRIPT

Running MongoDB on AWS

Sandeep Parikh

Senior Solutions Architect

MongoDB, Inc.

Agenda

Background Deployment Automation

Management Integrations Resources

MongoDB

• Flexible document data model

• Rich ad-hoc queries and in-place updates

• Real-time aggregation

• Geospatial support

• Text search

• Built-in support for

– Redundancy and High Availability

– Auto-partitioning and scale out

Amazon Web Services

• Complete cloud infrastructure

– Compute

– Storage

– Database

– Analytics

– Processing

– Deployment

– Containers

• Multitude of configuration options

• Pricing flexibility

– On-demand, Spot instance, Reserved instance

Instance Selection

• General Purpose

• Compute-optimized

• GPU

• Memory-optimized

• Storage-optimized

• Micro

Instance Selection

• General Purpose (M3)

• Compute-optimized (C3)

• GPU (compute resources not needed)

• Memory-optimized (R3)

• Storage-optimized (I2, HS1)

• Micro (bursty, no sustained CPU)

Instance Characteristics

• Distinctions

– CPU, memory, storage, networking

• Networking

– EBS-optimized, enhanced networking, placement groups

• Availability

– Varies by region

Storage Configurations

S3

Blob storage

Static content

EBS

Magnetic

SSD, burst IOPS

OS root volume

PIOPS EBS

SSD-backed, predictable

performance

Cost scales up with size

and IOPS

Instance Store

SSD-backed

Blazing, ephemeral

Included in instance cost

Storage Configurations

S3

Blob storage

Static content

EBS

Magnetic

SSD, burst IOPS

OS root volume

PIOPS EBS

SSD-backed, predictable

performance

Cost scales up with size

and IOPS

Instance Store

SSD-backed

Blazing, ephemeral

Included in instance cost

Storage Configurations

• PIOPS EBS or Instance Store are best choices

• Instance Store offers best $/IOP

– Storage is ephemeral

– Must be used with MongoDB Replica Sets

• Can mix/match in a single deployment

– E.g. some Secondary nodes on EBS

– …But you’ll need several EBS volumes to maintain reasonable IOPS parity

Instance Configuration

• Use EXT4 or XFS along with appropriate attributes

• Tune block device read-ahead

• Tune TCP keep alive

• Disable NUMA

• Disable zone-reclaim mode

• Increase ulimits for processes and open files

High Availability

MongoDBPrimary

MongoDBSecondary

MongoDBSecondary

High Availability Across Zones

MongoDBPrimary

MongoDBSecondary

MongoDBSecondary

Zone 1 Zone 2

High Availability Across Regions

MongoDBPrimary

MongoDBSecondary

MongoDBSecondary

Region 1 Region 2

Sharding

MongoDBPrimary

MongoDBSecondary

MongoDBSecondary

MongoDBPrimary

MongoDBSecondary

MongoDBSecondary

MongoDBPrimary

MongoDBSecondary

MongoDBSecondary

Sharding Across Zones

MongoDBPrimary

MongoDBSecondary

MongoDBSecondary

MongoDBPrimary

MongoDBSecondary

MongoDBSecondary

MongoDBPrimary

MongoDBSecondary

MongoDBSecondary

Zone 1

Zone 2

Sharding Across Regions

MongoDBPrimary

MongoDBSecondary

MongoDBSecondary

MongoDBPrimary

MongoDBSecondary

MongoDBSecondary

MongoDBPrimary

MongoDBSecondary

MongoDBSecondary

Region 1

Region 2

Sharding Across Regions

MongoDBPrimary

MongoDBSecondary

MongoDBSecondary

MongoDBPrimary

MongoDBSecondary

MongoDBSecondary

MongoDBPrimary

MongoDBSecondary

MongoDBSecondary

Region 1 Region 2

Management Concerns

Upgrades

Maintenance

ScalingMonitoring

Backups

Automating MongoDB with MMS

MongoDB Management Service

• MMS is a web-based tool that starts you from the beginning of your MongoDB deployment lifecycle

• Use MMS to build and maintain your deployment and to manage its lifecycle (monitoring and backup)

MMS Changes

• Before, MMS was used to monitor and backup

• But MMS was “late to the party” – mistakes or misconfigurations had been applied to the initial deployment

• Monitoring was helpful but not in setting users down the right path

• Upgrade/maintenance tasks were non-trivial and very involved

Automation

Automation

Automation

Provision instances in

AWS

Deploy any version of MongoDB

Add replicas or shards

Update configuration at

any time

Push a button to upgrade MongoDB

Automation

Automation

Monitoring

Monitoring

Monitoring

Charting

MongoDB-specific metrics and

measurements

View complete cluster topology and metrics for

each component

Create custom dashboards for key metrics and nodes

Alerting

Create alerts for just about any metric value

change

Target some or all hosts

Customizable notifications including

SMS, HipChat, PagerDuty

Proactive Support

Our engineers monitor your deployment and

make suggestions

Offered to Subscription Customers

Backup

Backup

Customizable snapshot policy

Point-in-time recovery for replica

sets

Consistent shardedcluster snapshots

Low overhead, securely transferred

Continuous, incremental

backups

Backup

Mongodump File system MMS Backup

Initial complexity Medium High Low

Confidence in Backups

Medium Medium High

Point in timerecovery of replica set

Sort of ☺ No Yes

System Overhead High Can be low Low

Scalable No With work Yes

ConsistentSnapshot of Sharded System

Difficult Difficult Yes

Integrations

Compute Storage Persistent IPs DNS

HadoopData

WarehouseStream

ProcessingApp

Deployment

Orchestration Database App Services Caching

Integrations

Compute Storage Persistent IPs DNS

HadoopData

WarehouseStream

ProcessingApp

Deployment

Orchestration Database App Services Caching

Elastic MapReduce

• Background

– Quickly deploy and run Hadoop in AWS

– Tuned distributions to run on top of EC2

– Provision deployments with any number of nodes

– Supports spot and reserved pricing to minimize cost

• MongoDB

– MongoDB Connector for Hadoop

– https://github.com/mongodb/mongo-hadoop

– Bi-directional access

– MapReduce, Hive, Pig, Streaming, Spark

– MongoDB deployments or BSON backup files

CloudWatch

• Monitoring for AWS resources

• Supports custom metrics

• Use AWS CLI to pipe MongoDB metrics

aws cloudwatch put-metric-data--metric-name ResidentMemory--namespace MongoDB--timestamp 2014-01-01T00:00:00Z--value 32--unit Gigabytes

Redshift

• Fully managed petabyte scale data warehouse as a service

• MongoDB not natively supported as an input data source

• Use Data Pipeline and EMR to move data

http://docs.aws.amazon.com/datapipeline/latest/DeveloperGuide/what-is-datapipeline.html

Elastic Beanstalk

• Deploy and manage applications

• Handles provisioning, scaling, load balancing

• Built on EC2, S3, SNS, Auto Scaling

• Customize and configure software that your app needs

• Install packages, create files

• Execute commands

• Control system services

App Server

App Server

App Server

Security Group

Elastic Load Balancer

Auto Scaling Group

mongosmongosmongos

MongoDB

Route53

• Highly available and scalable DNS service

• Hostnames can be assigned to

– EC2 instances, ELB instances, S3 buckets

• DNS load balancing with weighted round robin

• Supports hostnames for non-AWS infrastructure

• Use hostnames for all MongoDB components

• With replica sets, hostnames can ease machine replacement

• With sharded clusters, hostnames can simplify config server maintenance

• Or use Automation!

Questions?

• MongoDB

– http://www.mongodb.org

• MongoDB Documentation

– http://docs.mongodb.org

• MongoDB Management Service

– http://mms.mongodb.com

top related