understanding aws database options (dat201) | aws re:invent 2013

53
© 2013 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified, or distributed in whole or in part without the express consent of Amazon.com, Inc. DAT201- Understanding AWS Database Options Sundar Raghavan – Amazon RDS Zac Sprackett – Vice President of Operations with SugarCRM Michael Thomas – Principal Software Engineer with Scopely November 13, 2013

Upload: amazon-web-services

Post on 26-Jan-2015

119 views

Category:

Technology


2 download

DESCRIPTION

With AWS you can choose the right database technology and software for the job. Given the myriad of choices, from relational databases to non-relational stores, this session provides details and examples of some of the choices available to you. This session also provides details about real-world deployments from customers using Amazon RDS, Amazon ElastiCache, Amazon DynamoDB, and Amazon Redshift.

TRANSCRIPT

Page 1: Understanding AWS Database Options (DAT201) | AWS re:Invent 2013

© 2013 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified, or distributed in whole or in part without the express consent of Amazon.com, Inc.

DAT201- Understanding AWS Database Options

Sundar Raghavan – Amazon RDS Zac Sprackett – Vice President of Operations with SugarCRM Michael Thomas – Principal Software Engineer with Scopely November 13, 2013

Page 2: Understanding AWS Database Options (DAT201) | AWS re:Invent 2013

Today’s discussion

AWS Database Options and Decision Factors

Best Practice Tips and Techniques SugarCRM

Scopely

Q & A

Page 3: Understanding AWS Database Options (DAT201) | AWS re:Invent 2013

Starting with the Customer

• How many of you use databases on AWS?

• How many of you use Amazon RDS, Amazon DynamoDB, Amazon Redshift, or Amazon ElastiCache?

• How many of you have a well defined DR strategy for your databases?

• How many of you are building geo-spatial and context sensitive applications?

• We suggest that you attend Werner’s keynote!

Page 4: Understanding AWS Database Options (DAT201) | AWS re:Invent 2013

US West x 2 (N. California and Oregon)

US East (Northern Virginia)

Europe West (Dublin)

Asia Pacific Region

(Singapore)

Asia Pacific Region (Tokyo)

9 AWS Regions including 25 Availability Zones and growing

46 world-wide points of presence

US GovCloud (US ITAR

Region -- Oregon)

LATAM (Sao

Paola)

>10 data centersIn US East alone

Australia Region

(Australia)

Introducing: Cross Region Support

• RDS Snapshot Copy • All engines

Page 5: Understanding AWS Database Options (DAT201) | AWS re:Invent 2013

Zoopla “We are very happy with RDS cross region snapshot copy feature as it gives us the ability to copy our data from one AWS region to another AWS region with minimal effort. Prior to this feature, it used to take 3 days and a number of manual steps to copy our snapshots. Now we have an automated process that helps us to achieve disaster recovery capabilities in just few steps.” Joel Callaway, IT Operations Manager Zoopla Property Group Ltd, UK

Page 6: Understanding AWS Database Options (DAT201) | AWS re:Invent 2013

Your Mission is Clear

1. Zero to App in ____ Minutes

2. Zero to Millions of users in ____ Days

3. Zero to “Hero” in ____ Months

Page 7: Understanding AWS Database Options (DAT201) | AWS re:Invent 2013

Focus on your App

Page 8: Understanding AWS Database Options (DAT201) | AWS re:Invent 2013

Application tier

Load balancer

Database tier

Your Stack

Page 9: Understanding AWS Database Options (DAT201) | AWS re:Invent 2013

Application tier

Load balancer

Database tier

Security, Innovation, Scale, Transactions, Performance, Durability, Availability, Skills..

Security, Innovation, Scale, Performance, Availability…

Security, Scale, Availability…

Your Stack of Worries

Page 10: Understanding AWS Database Options (DAT201) | AWS re:Invent 2013

Not available on AWS

Spectrum of Database Options

SQL NoSQL

Low Cost High Cost

Do-it Yourself Fully Managed

Page 11: Understanding AWS Database Options (DAT201) | AWS re:Invent 2013

Spectrum of Options

SQL NoSQL

Do-it Yourself Fully Managed

Page 12: Understanding AWS Database Options (DAT201) | AWS re:Invent 2013

MySQL, Oracle, SQL Server Amazon Redshift

Spectrum of Options

SQL NoSQL

Do-it Yourself Fully Managed

MySQL Oracle, SQL Server, MariaDB Vertica, Paraccell …

Page 13: Understanding AWS Database Options (DAT201) | AWS re:Invent 2013

Spectrum of Options

SQL NoSQL

Do-it Yourself Fully Managed

MongoDB Cassandra Redis Memcache

DynamoDB ElastiCache (Memcache) ElastiCache (Redis) SimpleDB

Page 14: Understanding AWS Database Options (DAT201) | AWS re:Invent 2013

Thinking About the Questions

Should I use SQL or NoSQL?

Should I use MySQL or

PostgreSQL?

Should I use Redis, Memcache, or ElastiCache?

? Should I use MongoDB,

Cassandra, or DynamoDB?

Page 15: Understanding AWS Database Options (DAT201) | AWS re:Invent 2013

Actually, Thinking About the Right Questions

What are my scale and latency

needs?

What are my transactional and

consistency needs?

What are my read/write, storage and IOPS needs?

What are my time to market and server control

needs?

?

Page 16: Understanding AWS Database Options (DAT201) | AWS re:Invent 2013

Factors to Consider Factors SQL NoSQL

Application • App with complex business logic? • Web app with lots of users?

Transactions • Complex txns, joins, updates? • Simple data model, updates, queries?

Scale • Developer managed • Automatic, on-demand scaling

Performance • Developer architected • Consistent, high performance at scale

Availability • Architected for fail-over • Seamless and transparent

Core Skills • SQL + Java/Ruby/Python/PhP • NoSQL + Java/Ruby/Python/PhP

Best of both worlds: Possible to Use SQL and NoSQL models in one App

Page 17: Understanding AWS Database Options (DAT201) | AWS re:Invent 2013

Factors to Consider Self-Managed Service

• Full control over the instance, db and OS parameters

• Upgrades, back-ups, fail-over are yours to manage

• All aspects of security is managed by you

• Complex replication topologies and data management

Managed Service • Off-load the infrastructure and

software management • Automate database life-cycle

with APIs • Focus on database access and

app security • Limited control over replication

topologies

Page 18: Understanding AWS Database Options (DAT201) | AWS re:Invent 2013

Pace of Innovation – a Bonus • SQL Server TDE, Version upgrade • Oracle TDE, Statspack, Fine grain access, 3TB/30K IOPS • Cross Region Snapshot Copy, Parallel replica, Chained replica • Multi-AZ SLA, Log access, VPC groups, …

RDS team launched 23+

features

• Redis engine support • Amazon DynamoDB Fine grain access control • Amazon DynamoDB local, Geospatial indexing library • Transaction library, Local secondary index, parallel scan

NoSQL team launched 10+

features

• Encryption with HSM support • Audit logging, SNS notification, snapshot sharing • COPY from Amazon EMR/HDFS/SSH • Faster resize, improved concurrency, distributed tables, …

Redshift team launched 20+

features

Page 19: Understanding AWS Database Options (DAT201) | AWS re:Invent 2013

Amazon RDS is a managed SQL database service.

Simple to deploy and scale

Without any operational burden Reliable and cost effective

Choice of Database engines

Page 20: Understanding AWS Database Options (DAT201) | AWS re:Invent 2013

Schema design

Frequent server upgrades Storage upgrades

Backup and recovery

Software upgrades

Patching

Hardware crash

Query construction

Query optimization Configuration

Migration

Off load the “administration”

Focus on the “innovation”

Optimizing for Developer Productivity

Page 21: Understanding AWS Database Options (DAT201) | AWS re:Invent 2013

Multiple databases per instance

Use MySQL tools & drivers

Quickly set up Read Replicas

High availability Multi-AZ option (99.95% SLA)

Ability to promote Read replicas, Rename as Master

Diagnostics

Native MySQL replication

SSL for encryption over the wire

Monitor metrics

Shell, super user or direct file system access (Think security!)

Optimizing for Developer Productivity MySQL Manual for Read Replica

OR Amazon RDS console

Page 22: Understanding AWS Database Options (DAT201) | AWS re:Invent 2013

ElastiCache is a managed caching service.

Easy to set up and operate cache clusters

Scale cache clusters with push button ease

Without any operational burden Ultra fast response time for read scaling

Supports Memcached and Redis engines

Page 23: Understanding AWS Database Options (DAT201) | AWS re:Invent 2013

Elastic Load Balancing EC2 App

Instances

RDS MySQL DB

Instance with PIOPS

Master App Reads

Clients Cache Updates

ElastiCache is a Performance Booster Read Replica (Redis) Serve most read queries

In-memory performance

Read/write queries SSD performance

Page 24: Understanding AWS Database Options (DAT201) | AWS re:Invent 2013

Amazon DynamoDB is a managed NoSQL database service.

Store and retrieve any amount of data Scale throughput to millions of IO

Without any operational burden Single digit millisecond latencies

Page 25: Understanding AWS Database Options (DAT201) | AWS re:Invent 2013

CreateTable

UpdateTable

DeleteTable

DescribeTable

ListTables

PutItem

GetItem

UpdateItem

DeleteItem

BatchGetItem

BatchWriteItem

Query

Scan

Manage tables

Query specific items OR scan the full table

“Select”, “insert”, “update” items

Bulk select or update (max 1MB)

Optimizing for Developer Productivity

Page 26: Understanding AWS Database Options (DAT201) | AWS re:Invent 2013

Amazon Redshift is a managed data warehouse service.

Fast response time (~10x that of typical relational stores)

Without any operational burden Under $1,000 per TB per year

Petabyte scale columnar database

Page 27: Understanding AWS Database Options (DAT201) | AWS re:Invent 2013

So, what are the tips and techniques for successful deployments?

Page 28: Understanding AWS Database Options (DAT201) | AWS re:Invent 2013

CRM Software

Thousands of Successful Deployments Two Highlights

Zac Sprackett

Gaming Platform

SugarCRM

Mike Thomas

Page 29: Understanding AWS Database Options (DAT201) | AWS re:Invent 2013

© 2013 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified, or distributed in whole or in part without the express consent of Amazon.com, Inc.

Crafting Loyal Customers with SugarCRM Every Customer. Every User. Every Time.

S. Zachariah Sprackett, VP of Operations, SugarCRM

November 13, 2013

Page 30: Understanding AWS Database Options (DAT201) | AWS re:Invent 2013

SugarCRM • Redefining Customer Relationship Management • Unique product bundling

– On Premise and Hosted offerings

• Manifest destiny – Source code access and SQL database per customer

• Scale – From one seat customers to multi thousand seat customers

• Globally distributed customer base

Page 31: Understanding AWS Database Options (DAT201) | AWS re:Invent 2013

Deployment Models

Traditional SaaS SugarCRM

Page 32: Understanding AWS Database Options (DAT201) | AWS re:Invent 2013

Application Stack

Shadow

Apache

PHP

MySQL

Elastic Search HTML5 & JavaScript

Linux Email Archiving

Background Jobs

Page 33: Understanding AWS Database Options (DAT201) | AWS re:Invent 2013

Cloud Stacks

Amazon S3

Amazon Glacier

ElastiCache

EC2 Elastic Search EC2 Job Servers EC2 Web Servers

Amazon SES

RDS DB Instance

RDS DB Instance Read

Replica

Cloud Provider

Page 34: Understanding AWS Database Options (DAT201) | AWS re:Invent 2013

Cloud Providers

Route 53

Managed Elastic IP

EC2 HA Proxy

EC2 HA Proxy

Cloud Stack

Page 35: Understanding AWS Database Options (DAT201) | AWS re:Invent 2013

Management Console

Globally Distributed Cloud Providers

Page 36: Understanding AWS Database Options (DAT201) | AWS re:Invent 2013

Delivering On Time and On Budget • Amazon lets you easily spin up testing environments

– Testing only works if you make use of it. Don’t make assumptions – Monitor everything

• Change in cost model can surprise finance – Planned capital expenditures versus after the fact operational expenditures – Use reserved instances – Third party tools such as Cloudability can help alert you of issues early

• Manage access keys effectively to control cost – Learn to love AWS Identity and Access Management (IAM)

Page 37: Understanding AWS Database Options (DAT201) | AWS re:Invent 2013

Things to Watch Out For • Understand your IO requirements

– Make effective use of each of instance backed, Amazon EBS and Provisioned IOPS file systems

• Use the heck out of read replicas • Snapshots are incredibly useful

– But not available from a read replica • Don’t use the default parameter group for Amazon RDS

– Unless you really like restarting databases • Cold Standby is not instant on

– Don’t get stuck waiting for deployments in a forced failover scenario • ElastiCache is not clustered across availability zones • Watch out for the SLA

– 99.95% for a region even across two AZ’s – This doesn’t include user error

• You still need DBAs and Ops but they get to do cooler stuff

Page 38: Understanding AWS Database Options (DAT201) | AWS re:Invent 2013

We’re Hiring

Email: [email protected] Free Trials: http://www.sugacrm.com/try-sugar

Page 39: Understanding AWS Database Options (DAT201) | AWS re:Invent 2013

© 2013 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified, or distributed in whole or in part without the express consent of Amazon.com, Inc.

Scopely

Michael Thomas – Principal Software Engineer with Scopely

November 13, 2013

Page 40: Understanding AWS Database Options (DAT201) | AWS re:Invent 2013

Our technical infrastructure allows developers to build games efficiently for both iOS and Android.

ABOUT SCOPELY

Millions of Users Billions of Turns

All titles have reached the Top 5 in the App Store, and the last

three have been #1.

Page 41: Understanding AWS Database Options (DAT201) | AWS re:Invent 2013

Challenges • Build a single platform to support many different

kinds of games – asynchronous turn based, single player, synchronous, etc.

• Scale up and down as games are tested, launched, grow, and are retired.

• We are not an infrastructure company – we must focus on building features that support game development.

Page 42: Understanding AWS Database Options (DAT201) | AWS re:Invent 2013

Platform Features • Accounts / authentication • Gameplay / state persistence • Chat / messaging • In game economy • Facebook integration • Gifting • Single Player state tracking • Promotion / cross-promotion system • Statistics • Tournaments • Achievements

• Email targeting • Suggested friends • In game news system • External partner integration • Invitation attribution • Push notifications • Content management • Generic storage API • Application / device configuration • AB Testing

Page 43: Understanding AWS Database Options (DAT201) | AWS re:Invent 2013

Different Features/Different Requirements • Dynamic scaling (game launches, promotions, tests) • High write/read ratio (playing turns) • Transactional consistency (real money purchases) • Indexed data (user accounts) • Complex, real-time data (leaderboards)

Page 44: Understanding AWS Database Options (DAT201) | AWS re:Invent 2013

Scopely Gaming Platform

DynamoDB RDS

ElastiCache

ElastiCache

Memcached for performance, scalability, and cost savings

Amazon DynamoDB for unbounded data with heavy write load.

Redis for fast, complex caching and message passing.

MySQL for bounded, transactional, queryable data.

Operational Data Storage

S3

Amazon S3 for asset and image storage.

Page 45: Understanding AWS Database Options (DAT201) | AWS re:Invent 2013

Analytics Data Pipeline

Scopely Gaming Platform

SQS: In-Flight Events EC2: Message Loader S3: Staged Messages

RDS: Process / Job Tracking S3: Processed Data

EC2: Redshift Loader Redshift Data Warehouse

EMR: Transformer

Page 46: Understanding AWS Database Options (DAT201) | AWS re:Invent 2013

Schema Mapping DSL from centipede.schema.table import Table from centipede.attributes import * class GemsTurn(Table): user_id = Integer, lambda message: message['Data']['GameData']['CurrentPlayerId'] current_turn = Integer, lambda message: message['Data']['Gamedata']['CurrentTurn'] end_date = Timestamp, lambda message: message['Data']['GameData']['EndDate'] expiration = Timestamp, lambda message: message['Data']['GameData']['Expiration'] game_id = Guid, lambda message: message['Data']['GameData']['GameId'] resigning_user_id = Integer, lambda message: message['Data']['GameData']['ResigningPlayerId'] start_context = Integer, lambda message: message['Data']['GameData']['StartContext'] start_date = Timestamp, lambda message: message['Data']['GameData']['StartDate'] status = Integer, lambda message: message['Data']['GameData']['Status'] tournament_id = Guid, lambda message: message['Data']['GameData']['TournamentId'] tournament_price_category = Integer, lambda message: message['Data']['GameData']['TournamentPriceCategory'] tournament_price_paid = Integer, lambda message: message['Data']['GameData']['TournamentPricePaid'] tutorial_type = Integer, lambda message: message['Data']['GameData']['TutorialType'] winning_user_id = Integer, lambda message: message['Data']['GameData']['WinningPlayerId'] awards = List, lambda message: message['Data']['GameData']['Players'][GemsTurnHelper.current_user_index(message)]['Awards'] coins_gathered = List, lambda message: message['Data']['GameData']['Players'][GemsTurnHelper.current_user_index(message)]['CoinsGathered'] custom_statistics = VarChar, lambda message: message['Data']['GameData']['Players'][GemsTurnHelper.current_user_index(message)]['CustomStatistics'] has_hidden_game = Boolean, lambda message: message['Data']['GameData']['Players'][GemsTurnHelper.current_user_index(message)]['HasHiddenGame'] last_nudge_date = Timestamp, lambda message: message['Data']['GameData']['Players'][GemsTurnHelper.current_user_index(message)]['LastNudgeDate'] score = Integer, lambda message: message['Data']['GameData']['Players'][GemsTurnHelper.current_user_index(message)]['Score'] score_for_award = Integer, lambda message: message['Data']['GameData']['Players'][GemsTurnHelper.current_user_index(message)]['ScoreForAward'] opponent_user_id = Integer, lambda message: message['Data']['GameData']['Players'][GemsTurnHelper.opponent_user_index(message)]['UserId']

Page 47: Understanding AWS Database Options (DAT201) | AWS re:Invent 2013

Use Case: Leaderboards • “What is my rank in today’s tournament?”

• Hard to cache since a single player getting a new high score

changes everyone’s rank

• Highly optimized schema required 4 m2.2xlarge RDS nodes

• Latency for “what is my rank” could be above 100ms

• Redis sorted sets provide exactly what we need. Two m2.xlarge instances are more than enough. Rank query is now in single digit milliseconds.

Redis

Page 48: Understanding AWS Database Options (DAT201) | AWS re:Invent 2013

Use Case: Game/Turn State • Extremely high throughput. Extremely large dataset.

• Semi-structured data – each game models “state” differently.

• Always queried by UserID or GameID.

• Maxed out an Amazon RDS instance – instead of spending time sharding /

optimizing Amazon RDS, we moved to Amazon DynamoDB.

• Saves operational time and development time by not having to worry about growing games/adding new games/traffic spikes.

DynamoDB

Page 49: Understanding AWS Database Options (DAT201) | AWS re:Invent 2013

Use Case: User Accounts • Need to maintain uniqueness across multiple

columns (email, username, etc.)

• Queryable on multiple facets (email, username, external identifier)

• Entire table needs to be scanned regularly (promotions)

• Bounded data size

MySQL (RDS)

Page 50: Understanding AWS Database Options (DAT201) | AWS re:Invent 2013

Use Case: Global Caching • Cache everything possible in Memcached

including both entities in Amazon DynamoDB and RDS.

• Single interface providing session caching, memcached caching, and Amazon DynamoDB access encourages consistent use of caching.

Memcached (ElastiCache)

Page 51: Understanding AWS Database Options (DAT201) | AWS re:Invent 2013

public class CoherentStorage { public Cache L1Cache { get; set; } public Cache L2Cache { get; set; } public DynamoClient Dynamo { get; set; } private readonly Games _game; public CoherentStorage(Games game) { _game = game; L1Cache = Cache.Request; L2Cache = Cache.GetMemcached(String.Format("{0}GameState", game)); Dynamo = DynamoClient.Instance; } public void Save(object instance) { } public void Delete(object instance) { } public T Get<T>(object id, bool skipCache = false, bool consistentRead = true) { } }

Use Case: Global Caching

Memcached (ElastiCache)

Page 52: Understanding AWS Database Options (DAT201) | AWS re:Invent 2013

Tips & Traps • Know your data – use reasonable heuristics for expected

data growth.

• Each data storage technology introduces some level of operational and engineering overhead. Choose wisely.

• Get creative with Amazon DynamoDB.

• Prepare for the unexpected with Metadata columns in MySQL.

Page 53: Understanding AWS Database Options (DAT201) | AWS re:Invent 2013

Please give us your feedback on this presentation

As a thank you, we will select prize winners daily for completed surveys!

DAT201