turtles all the way down: platform ops in public cloud
DESCRIPTION
When I joined a startup already in progress as their first ops hire, I got a crash course in cloud operations. Running databases in EC2 without being on bare metal presents its own challenges; we also began using Hadoop and HBase on EMR, with tragicomic results. What monitoring existed was a twisty maze of half-measures, so improving our Mean Time To Lost Sleep required trying new tools and alerting strategies. And scaling performance meant relying on best practices and gut-feeling hunches. This talk will have appeal for those curious about AWS, about using MapReduce in the cloud, and about whether MongoDB is really "web scale". (Spoiler alert: lolol.) Come for the EC2 trivia; stay for the table-flipping. Notes and image credits: http://bridgetkromhout.com/speaking/2014/beyondthecode/notes/TRANSCRIPT
Turtles All the Way Down
Platform Ops in Public Cloud
Bridget Kromhout
@bridgetkromhout
@bridgetkromhout
We are the largest online video distributor of international televised content streaming the world's best movies, documentaries and TV shows on demand with professional subtitles.
@bridgetkromhout
Platform ops in public cloud?Do you mean Platform as a Service?
How is this different from Infrastructure as a Service?
@bridgetkromhout
@bridgetkromhout
@bridgetkromhout
(previous gig) SaaS Life
normal traffic
decision to turn off
decision to turnback on
accidental removal
@bridgetkromhout
Platform?
@bridgetkromhout
@bridgetkromhout
@bridgetkromhout
@bridgetkromhout
AWS Regions*(containing availability zones)
* for some values of regions: Beijing & Sydney too
StorageNo procurement delays!All the IOPS!No waiting!
Yes, cloud storage better than the bad old days in some ways, but with caveats.
@bridgetkromhout
Alphabet Soup: EBS, SSDs, pIOPSGo with SSDs for your Elastic Block Store.
EBS-optimized instances = faster network
Provisioned IOPS: guaranteed, but prevent bursting
@bridgetkromhout
Story Time!
Data stores and sadness (as a service)
@bridgetkromhout
@bridgetkromhout
wow. such nosql. very webscale.
@bridgetkromhout
“a single write operation holds the lock exclusively, and no other read or write operations may share the lock.”
@bridgetkromhout
It’s 4am. Do you know what your EMR cluster is doing?
@bridgetkromhout
StatsD
monitoring != alerting
@bridgetkromhout
@bridgetkromhout
If it moves, we track it. Sometimes we’ll draw a graph of something that isn’t moving yet, just in case it decides to make a run for it. -- Ian Malpass, Etsy
@bridgetkromhout
measure all the things
So, back to this platform stuff...
...how exactly do you build and deploy it?
@bridgetkromhout
@bridgetkromhout
@bridgetkromhout
orchestration & config management
Current: Future possibilities:
@bridgetkromhout
@bridgetkromhout
kitten, not unicorn
@bridgetkromhout
“the game
has changed”
@littleidea
@bridgetkromhout
@bridgetkromhout
Questions? (and we’re hiring!)
@bridgetkromhout