Transcript
Page 1: Architecting for Failure in AWS - PuppetConf 2013

ARCHITECTING IN AWSfor resilience & cost at scale

Jos Boumans - @jiboumanshttp://rafaykhan619.wix.com/downhouse

Thursday 22 August 13

Page 3: Architecting for Failure in AWS - PuppetConf 2013

CANONICAL

http://lukeroberts.deviantart.com/art/Destroy-Ubuntu-93235775

Engineering manager for Ubuntu Server 10.04 & 10.10

http://www.ubuntu.com/business/server/overview

Thursday 22 August 13

Page 5: Architecting for Failure in AWS - PuppetConf 2013

SOME OF OUR CUSTOMERS

Thursday 22 August 13

Page 6: Architecting for Failure in AWS - PuppetConf 2013

LOTS OF TRAFFIC

http://www.americapictures.net/buenos-aires-traffic-city-night-argentina.html

Thursday 22 August 13

Page 7: Architecting for Failure in AWS - PuppetConf 2013

AVERAGE REQUESTS* / SEC

http://mashable.com/2013/03/21/happy-7th-birthday-twitter/http://stats.wikimedia.org/EN/TablesPageViewsMonthlyCombined.htm

*Twitter : New tweets Wikipedia: Articles readKrux: New data points

0 3,750 7,500 11,250 15,000

Thursday 22 August 13

Page 8: Architecting for Failure in AWS - PuppetConf 2013

MONTHLY UNIQUE USERS

0 200,000,000 400,000,000 600,000,000 800,000,000

http://en.wikipedia.org/wiki/Wikipedia http://mashable.com/2013/03/21/happy-7th-birthday-twitter/

Thursday 22 August 13

Page 10: Architecting for Failure in AWS - PuppetConf 2013

THERE ARE DOWNSIDES

http://modernsavage.hubpages.com/hub/10-springfield-shopper-headlinesThursday 22 August 13

Page 11: Architecting for Failure in AWS - PuppetConf 2013

RESILIENCE & COST AT SCALE

Thursday 22 August 13

Page 14: Architecting for Failure in AWS - PuppetConf 2013

So#ware,)8)

Automa/on,)4)

Process,)14)

#"of"Issues"

Amazon"Cloud"Major"Outage"7"Issues"Categories"

ROOT CAUSE CATEGORIES

http://www.slideshare.net/rahultyagi50999/amazon-cloud-major-outages-analysis

Software bugs & human error

Thursday 22 August 13

Page 17: Architecting for Failure in AWS - PuppetConf 2013

RESILIENCE @ SCALEEmbrace Failure: Hardware will fail. Humans will make errors.

Nature will produce thunderstorms.http://blabitcanada.com/category/twitter-2/

Thursday 22 August 13

Page 18: Architecting for Failure in AWS - PuppetConf 2013

DEFINE 'AVAILABLE'Things will break, so choose your degraded state.

http://libcom.org/library/occupied-wall-street-some-tactical-thoughts-malcolm-harris

Thursday 22 August 13

Page 19: Architecting for Failure in AWS - PuppetConf 2013

BASIC API CALL3 potential points of failure

Thursday 22 August 13

Page 20: Architecting for Failure in AWS - PuppetConf 2013

FALLBACK PATTERNSThe cost of resilience should be accuracy or latency

http://redis.io/http://memcached.org/

http://varnish-cache.org/Thursday 22 August 13

Page 21: Architecting for Failure in AWS - PuppetConf 2013

FALLBACK PATTERNSThe cost of resilience should be accuracy or latency

http://redis.io/http://memcached.org/

http://varnish-cache.org/Thursday 22 August 13

Page 22: Architecting for Failure in AWS - PuppetConf 2013

FALLBACK PATTERNSThe cost of resilience should be accuracy or latency

http://redis.io/http://memcached.org/

http://varnish-cache.org/Thursday 22 August 13

Page 23: Architecting for Failure in AWS - PuppetConf 2013

FALLBACK PATTERNSThe cost of resilience should be accuracy or latency

http://redis.io/http://memcached.org/

http://varnish-cache.org/Thursday 22 August 13

Page 24: Architecting for Failure in AWS - PuppetConf 2013

FALLBACK PATTERNSThe cost of resilience should be accuracy or latency

http://redis.io/http://memcached.org/

http://varnish-cache.org/Thursday 22 August 13

Page 25: Architecting for Failure in AWS - PuppetConf 2013

USER EXPERIENCEMy tweet got posted

Thursday 22 August 13

Page 26: Architecting for Failure in AWS - PuppetConf 2013

RESILIENCE TOOLSStorage, Network & ACL

http://wordyou.ru/kolonki/my-teper-ne-na-avrore-a-na-titanike.html

Thursday 22 August 13

Page 27: Architecting for Failure in AWS - PuppetConf 2013

MANY SMALL NODES VERSUS A FEW LARGER NODES

The benefits of the many outweigh the benefits of the fewhttp://www.stealingfaith.com/2012/07/08/throw-off-the-tiny-ropes/

Thursday 22 August 13

Page 28: Architecting for Failure in AWS - PuppetConf 2013

DATABASESCAP Theorem applies.

Your choice: sacrifice availability or consistency. Orange is a lie.

RDBMSBigTable Based

Master / Slave based

CouchDBDynamo Based

http://ferd.ca/beating-the-cap-theorem-checklist.html

Thursday 22 August 13

Page 29: Architecting for Failure in AWS - PuppetConf 2013

SIMPLE STORAGE SERVICES3: Arguably AWS' best feature

http://www.iwallpaper.us/gold-star-fo-christmas-wallpaper-140/http://aws.amazon.com/s3/

https://forums.aws.amazon.com/message.jspa?messageID=182919#182919Thursday 22 August 13

Page 30: Architecting for Failure in AWS - PuppetConf 2013

CACHE WHAT YOU CANHTTP Responses, DB Queries, User content

Browsers have caches too!http://cruncht.com/95/drupal-caching/

http://redis.io/http://memcached.org/

http://varnish-cache.org/Thursday 22 August 13

Page 31: Architecting for Failure in AWS - PuppetConf 2013

CLIENT SIDE STORAGEKeep a copy of your users data locally

http://www.w3.org/2001/tag/2010/09/ClientSideStorage.htmlhttp://www.wired.com/gadgetlab/2012/03/badass-gadget-ammo-lunch-box/

Thursday 22 August 13

Page 32: Architecting for Failure in AWS - PuppetConf 2013

USE ELASTIC LOAD BALANCERSThey will save you more than once

http://wallpapers5.com/wallpaper/Balance-Green-Tree-Frog/

Thursday 22 August 13

Page 33: Architecting for Failure in AWS - PuppetConf 2013

USE GLOBAL LOAD BALANCINGFail over to the closest data center on region failure

Thursday 22 August 13

Page 34: Architecting for Failure in AWS - PuppetConf 2013

SHOUT OUT: DYNDNS for Bit.ly, Quora, Twitter, Wikia, Fastly, etc

http://dyn.com

Thursday 22 August 13

Page 35: Architecting for Failure in AWS - PuppetConf 2013

USE IAM ROLES FOR ACCESSHumans make mistakes, including your humans

Thursday 22 August 13

Page 36: Architecting for Failure in AWS - PuppetConf 2013

COST @ SCALEScaling without breaking the bank

http://mgx.com/blogs/wp-content/uploads/2013/07/piggybank.jpg

Thursday 22 August 13

Page 37: Architecting for Failure in AWS - PuppetConf 2013

EMR + SPOT INSTANCESOn demand rate: $0.165 / hour

http://aws.amazon.com/ec2/spot-instances/

Thursday 22 August 13

Page 38: Architecting for Failure in AWS - PuppetConf 2013

AMAZON REDSHIFTEconomical Business Intelligence

Scales with data sizehttp://www.flitemedia.com/music.php

http://aws.amazon.com/redshifthttp://www.tableausoftware.com/

Thursday 22 August 13

Page 39: Architecting for Failure in AWS - PuppetConf 2013

AMAZON GLACIER"Tapes for the Cloud Era"

Writes vastly cheaper than readshttp://aws.amazon.com/glacier/http://www.gorp.com/parks-guide/glacier-national-park-outdoor-pp2-guide-cid350021.html

Thursday 22 August 13

Page 40: Architecting for Failure in AWS - PuppetConf 2013

AWS SIMPLE EMAIL SERVICEDealing with email is boring and time consuming

http://aws.amazon.com/ses/http://bfsdaniels.copycop.com/blog/all-about-printing/hypertargeting-with-direct-mail/

Thursday 22 August 13

Page 41: Architecting for Failure in AWS - PuppetConf 2013

AWS SIMPLE QUEUE SERVICEExcellent for latency insensitive, small volume queues

http://www.toledoblade.com/Retail/2013/01/13/Disney-s-magic-bracelet-new-key-to-its-kingdom.htmlhttp://aws.amazon.com/sqs/

http://colby.id.au/benchmarking-sqsThursday 22 August 13

Page 42: Architecting for Failure in AWS - PuppetConf 2013

INSTANCE MARKETPLACEBuy & sell reserved instances

http://commons.wikimedia.org/wiki/File:Javanese_market_place.jpg http://aws.amazon.com/ec2/reserved-instances/marketplace/

Thursday 22 August 13

Page 43: Architecting for Failure in AWS - PuppetConf 2013

AWS DYNAMO DBExcellent for small keys & high read rates

at known & consistent IOPShttp://hlbike.en.ecplaza.net/2.jpg http://aws.amazon.com/dynamodb/

Thursday 22 August 13

Page 44: Architecting for Failure in AWS - PuppetConf 2013

MAXIMIZE IOPSRAID 0 Ephemeral drives

use m1.xlarge or c1.xlarge, or use ssds if you need >20k IOPShttp://calculator.s3.amazonaws.com/calc5.html

http://blog.scalyr.com/2012/10/16/a-systematic-look-at-ec2-io/http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/InstanceStorage.html#disk-performance

Thursday 22 August 13

Page 45: Architecting for Failure in AWS - PuppetConf 2013

RED FLAGSAnti-patterns to watch out for

http://grandprix247.com/2012/09/03/spa-pile-up-renews-focus-on-formula-1-safety-matters/Thursday 22 August 13

Page 46: Architecting for Failure in AWS - PuppetConf 2013

PROVISIONED IOPS EBSEphemeral storage on c1/m1.xlarge or SSD is betterIf you must: m*large or c1.xlarge for dedicated NIC

http://www.slideshare.net/AmazonWebServices/ebs-mongo-dbwebinarfinal-nnhttp://techblog.netflix.com/2012/07/benchmarking-high-performance-io-with.htmlhttp://navidoo.ru/interest/Nasha_jizn/17676.html

Thursday 22 August 13

Page 47: Architecting for Failure in AWS - PuppetConf 2013

AWS DYNAMO DBFor high write rates or

large/variable keyshttp://aws.amazon.com/dynamodb/http://www.walltowall.co.uk/program/standing-tall-worlds-tallest-people_93.aspx

Thursday 22 August 13

Page 48: Architecting for Failure in AWS - PuppetConf 2013

HIGH IO/DISK/RAM NODESUse them deliberately

http://elledecoration.co.za/2010/07/gigantic/

Thursday 22 August 13

Page 49: Architecting for Failure in AWS - PuppetConf 2013

AWS CLOUDWATCHMetric collection, Amazon style

Cost prohibitive & resolution too lowhttp://www.flickr.com/photos/65683080@N08/6893582132/ http://aws.amazon.com/cloudwatch/

Thursday 22 August 13

Page 50: Architecting for Failure in AWS - PuppetConf 2013

LOWER COST PER METRICUse graphite & statsd

http://graphite.wikidot.com/https://github.com/etsy/statsd

Thursday 22 August 13

Page 51: Architecting for Failure in AWS - PuppetConf 2013

HOSTED ALTERNATIVESCirconus: All the insights you ever wanted

StackDriver : Optimized for AWShttp://circonus.com

http://stackdriver.com

Thursday 22 August 13

Page 52: Architecting for Failure in AWS - PuppetConf 2013

AWS CLOUDFORMATIONTemplatize your entire stack

Harder to use as complexity increaseshttp://aws.amazon.com/cloudwatch/http://fullnfenil7.blogspot.com/2012/05/amazing-cloud-shapes-photos.html#.UhKrZmRgZHg

Thursday 22 August 13

Page 53: Architecting for Failure in AWS - PuppetConf 2013

RDS FOR ANALYTICS/REPORTSPaying OLTP prices for BI usageSharding will be a matter of time

http://nerds.airbnb.com/redshift-performance-costhttp://business901.com/blog1/understanding-your-customer-problem/

Thursday 22 August 13

Page 54: Architecting for Failure in AWS - PuppetConf 2013

Q & A

http://vickicaruana.blogspot.com/2011/01/are-you-afraid-to-raise-your-hand.html

@jiboumanshttp://slideshare.net/jiboumans

Thursday 22 August 13


Top Related