turtles all the way down: platform ops in public cloud

Post on 25-May-2015

659 Views

Category:

Technology

1 Downloads

Preview:

Click to see full reader

DESCRIPTION

When I joined a startup already in progress as their first ops hire, I got a crash course in cloud operations. Running databases in EC2 without being on bare metal presents its own challenges; we also began using Hadoop and HBase on EMR, with tragicomic results. What monitoring existed was a twisty maze of half-measures, so improving our Mean Time To Lost Sleep required trying new tools and alerting strategies. And scaling performance meant relying on best practices and gut-feeling hunches. This talk will have appeal for those curious about AWS, about using MapReduce in the cloud, and about whether MongoDB is really "web scale". (Spoiler alert: lolol.) Come for the EC2 trivia; stay for the table-flipping. Notes and image credits: http://bridgetkromhout.com/speaking/2014/beyondthecode/notes/

TRANSCRIPT

Turtles All the Way Down

Platform Ops in Public Cloud

Bridget Kromhout

@bridgetkromhout

@bridgetkromhout

We are the largest online video distributor of international televised content streaming the world's best movies, documentaries and TV shows on demand with professional subtitles.

@bridgetkromhout

Platform ops in public cloud?Do you mean Platform as a Service?

How is this different from Infrastructure as a Service?

@bridgetkromhout

@bridgetkromhout

@bridgetkromhout

(previous gig) SaaS Life

normal traffic

decision to turn off

decision to turnback on

accidental removal

@bridgetkromhout

Platform?

@bridgetkromhout

@bridgetkromhout

@bridgetkromhout

@bridgetkromhout

AWS Regions*(containing availability zones)

* for some values of regions: Beijing & Sydney too

StorageNo procurement delays!All the IOPS!No waiting!

Yes, cloud storage better than the bad old days in some ways, but with caveats.

@bridgetkromhout

Alphabet Soup: EBS, SSDs, pIOPSGo with SSDs for your Elastic Block Store.

EBS-optimized instances = faster network

Provisioned IOPS: guaranteed, but prevent bursting

@bridgetkromhout

Story Time!

Data stores and sadness (as a service)

@bridgetkromhout

@bridgetkromhout

wow. such nosql. very webscale.

@bridgetkromhout

“a single write operation holds the lock exclusively, and no other read or write operations may share the lock.”

@bridgetkromhout

It’s 4am. Do you know what your EMR cluster is doing?

@bridgetkromhout

StatsD

monitoring != alerting

@bridgetkromhout

@bridgetkromhout

If it moves, we track it. Sometimes we’ll draw a graph of something that isn’t moving yet, just in case it decides to make a run for it. -- Ian Malpass, Etsy

@bridgetkromhout

measure all the things

So, back to this platform stuff...

...how exactly do you build and deploy it?

@bridgetkromhout

@bridgetkromhout

@bridgetkromhout

orchestration & config management

Current: Future possibilities:

@bridgetkromhout

@bridgetkromhout

kitten, not unicorn

@bridgetkromhout

“the game

has changed”

@littleidea

@bridgetkromhout

@bridgetkromhout

Questions? (and we’re hiring!)

@bridgetkromhout

top related