cloud native cost optimization
DESCRIPTION
Slightly updated version of AWS Re:Invent 2014 talk.TRANSCRIPT
Cloud Native Cost OptimizationAdrian Cockcroft @adrianco
Technology Fellow - Battery Ventures November 2014
@adrianco
What does everyone want?
@adrianco
Less Time Less Cost
@adrianco
Faster !
See talks by @adrianco Speed and Scale - QCon New York Fast Delivery - GOTO Copenhagen
@adrianco
Cheaper !
This brand new talk: How to use Cloud Native architecture to
reduce cost without slowing down
@adrianco
How is Cost Measured?
1
Bottom Up
2
Product
3
Top Down
@adrianco
Bottom Up Costs
Add up the cost to buy and operate every component
1
@adrianco
Product Cost
Cost of delivering and maintaining each product
2
@adrianco
Top Down Costs
Divide total budget by the number of components
3
@adrianco
Top Down vs. Bottom Up
Will never match! !
Hidden Subsidies vs. Hidden Costs
1
3
@adrianco
Agile Team Product Profit
Value minus costs Time to value ROI, NPV, MMF Profit center
2$
@adrianco
What about cloud costs?
@adrianco
Cloud Native Cost Optimization
Optimize for speed first Turn it off! Capacity on demand Consolidate and Reserve Plan for price cuts FOSS tooling
$ $ $
@adrianco
Do The Impossible Immediately
Ne#lix'Examples'
• European'Launch'using'AWS'Ireland'– No'employees'in'Ireland,'no'provisioning'delay,'everything'worked'
– No'need'to'do'detailed'capacity'planning'– OverAprovisioned'on'day'1,'shrunk'to'fit'aDer'a'few'days'– Capacity'grows'as'needed'for'addiGonal'country'launches'
• Brazilian'Proxy'Experiment'– No'employees'in'Brazil,'no'“meeGngs'with'IT”'
– Deployed'instances'into'two'zones'in'AWS'Brazil'
– Experimented'with'network'proxy'opGmizaGon'
– Decided'that'gain'wasn’t'enough,'shut'everything'down'
@adrianco
The Capacity Planning Problem
@adrianco
Best Case Waste
Product(Launch(Agility(2(Rightsized(
Pre2Launch(
Build2out(
Tes9ng(
Launch(
Growth(
Growth(
Demand(
Cloud(
Datacenter(
$(
Cloud capacity used is maybe half average DC capacity
@adrianco
Failure to LaunchProduct(Launch(-(Under-es1mated(
Pre-L
aunc
h
Build-
out
Testi
ng
Laun
ch
Growth
Growth
Mad scramble to add more DC capacity during launch phase outages
@adrianco
Over the Top LossesProduct(Launch(Agility(–(Over6es8mated(
Pre-L
aunc
h
Build-
out
Testi
ng
Laun
ch
Growth
Growth
$
Capacity wasted on failed launch magnifies the losses
@adrianco
Turning off Capacity
Off-peak production Test environments Dev out of hours Dormant Data ScienceWhen%you%turn%off%your%cloud%resources,%
you%actually%stop%paying%for%them%
@adrianco
Turn off Test Environments
Snapshot or freeze Fast restart needed Persistent storage 40 of 168 hrs/wk
@adrianco
Seasonal Savings
1 5 9 13 17 21 25 29 33 37 41 45 49
Web
Ser
vers
Week
Optimize during a year
50% Savings Weekly&CPU&Load&
@adrianco
Autoscale the Costs Away
50%+%Cost%Saving%Scale%up/down%
by%70%+%
Move%to%Load=Based%Scaling%
@adrianco
Daily Duty Cycle
Busine
ss'Throu
ghpu
t'Instan
ces'
Reactive Autoscaling saves around 50%
Predictive Autoscaling saves around 70% See Scryer on Netflix Tech Blog
@adrianco
Underutilized and UnusedAWS$Support$–$Trusted$Advisor$–$
Your$personal$cloud$assistant$
@adrianco
Clean Up the CrudOther&simple&op-miza-on&-ps&
• Don’t&forget&to…&– Disassociate&unused&EIPs&– Delete&unassociated&Amazon&EBS&volumes&
– Delete&older&Amazon&EBS&snapshots&
– Leverage&Amazon&S3&Object&Expira-on&
& Janitor&Monkey&cleans&up&unused&resources&
@adrianco
Total Cost of OrangesWhen%Comparing%TCO…!
Make!sure!that!you!are!including!all!the!cost!factors!into!considera4on!
Place%Power%Pipes%People%Pa6erns%
How much does Openstack or ESX datacenter automation software and support cost per instance?
@adrianco
When Do You Pay?
@adrianco
bill
NowNext
MonthAges Ago
Lease Building
Install AC etc
Rack & Stack
Private Cloud SW
Run My Stuff
Datacenter Up Front Costs
@adrianco
Reservation Reductions
On Demand Light Use Medium Use Heavy Use
$ No Upfront $172 upfront $286 upfront $337 upfront
$0.070/hr $0.050/hr $0.022/hr $0.015/hr
$1840/36mo $1486/36mo $864/36mo $731/36mo
Savings 22% 53% 60%
Prices on Nov 11th, for m3.medium (1 vCPU, 3.75G RAM, SSD) purely to show typical savings
@adrianco
Blended BenefitsMix$and$Match$Reserved$Types$and$On4Demand$Instan
ces(
Days(of(Month(
0
2
4
6
8
10
12
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28
Heavy&Utilization&Reserved Instances
Light&RI Light&RILight&RILight&RI
On8Demand
@adrianco
Consolidated Cost CuttingConsolidated+Billing:+Single+payer+for+a+group+of+accounts+
• One$Bill+for+mul7ple+accounts+
• Easy$Tracking$of+account+charges+(e.g.,+download+CSV+of+cost+data)+
• Volume$Discounts+can+be+reached+faster+with+combined+usage+
• Reserved$Instances$are+shared+across+accounts+(including+RDS+Reserved+DBs)+
@adrianco
Production PriorityOver%Reserve(the(Produc0on(Environment(
Produc0on(Env.(Account( 100#Reserved#
QA/Staging(Env.(Account( 0#Reserved##
Perf(Tes0ng(Env.(Account( 0#Reserved##
Development(Env.(Account( 0#Reserved#
Storage(Account( 0#Reserved#
Total#Capacity#
@adrianco
Actual Reservation UsageConsolidated+Billing+Borrows+Unused+Reserva4ons+
Produc4on+Env.+Account+ 68#Used#
QA/Staging+Env.+Account+ 10#Borrowed#
Perf+Tes4ng+Env.+Account+ 6#Borrowed##
Development+Env.+Account+ 12#Borrowed#
Storage+Account+ 4#Borrowed#
Total#Capacity#
@adrianco
Consolidated Reservations
Burst capacity guarantee Higher availability with lower cost Other accounts soak up any extra Monthly billing roll-up Capitalize reservation charges! But: Fixed location and instance type
@adrianco
Re-sell Reservations
Reserved'Instance'Marketplace'
Buy$a$smaller$term$instance$Buy$instance$with$different$OS$or$type$
Buy$a$Reserved$instance$in$different$region$
Sell$your$unused$Reserved$Instance$Sell$unwanted$or$over;bought$capacity$Further$reduce$costs$by$op>mizing$
@adrianco
Use EC2 Spot Instances
Cloud native dynamic autoscaled spot instances !Real world total savings up to 50%
@adrianco
Right Sizing InstancesFit the instance size to the workload
@adrianco
Technology Refresh
Older m1 and m2 families • Slower CPUs • Higher response times • Smaller caches (6MB) • Oldest m1.xlarge
• 15G/8.5ECU/35c 23ECU/$ • Old m2.xlarge
• 17G/6.5ECU/25c 26ECU/$
New m3 family • Faster CPUs • Lower response times • Larger caches (20MB) • Java perf ratio > ECU • New m3.xlarge
• 15G/13ECU/28c 46ECU/$ • 77% better ECU/$ • Deploy fewer instances
@adrianco
Data Science for Free
Follow%the%Customer%(Run%web%servers)%during%the%day%
Follow%the%Money%(Run%Hadoop%clusters)%at%night%
0
2
4
6
8
10
12
14
16
Mon Tue Wed Thur Fri Sat Sun
No#of#Instan
ces#R
unning#
Week
Auto7Scaling7Servers
Hadoop7Servers
No.%of%Reserved%Instances%
@adrianco
Real Reservation ReductionsSoaking(up(unused(reserva0ons(
Unused(reserved(instances(is(published(as(a(metric((Ne9lix(Data(Science(ETL(Workload(• Daily(business(metrics(rollAup(• Starts(aBer(midnight(• EMR(clusters(started(using(hundreds(of(instances(
Ne9lix(Movie(Encoding(Workload(• Long(queue(of(high(and(low(priority(encoding(jobs(• Can(soak(up(1000’s(of(addi0onal(unused(instances(
@adrianco
Six Ways to Cut Costs#1#Business#Agility#by#Rapid#Experimenta8on#=#Profit#
#2#Business>driven#Auto#Scaling#Architectures#=#Savings##
#3#Mix#and#Match#Reserved#Instances#with#On>Demand#=#Savings#
#4#Consolidated#Billing#and#Shared#Reserva8ons#=#Savings#
#5#Always>on#Instance#Type#Op8miza8on#=#Recurring#Savings#
Building#Cost>Aware#Cloud#Architectures#
#6#Follow#the#Customer#(Run#web#servers)#during#the#day#Follow#the#Money#(Run#Hadoop#clusters)#at#night#
@adrianco
Expect Prices to DropThree Years Halving Every 18mo = maybe 40% over-all savings
0
25
50
75
100
1 2 3 4 5 6 7 8 9 10 11 12 1 2 3 4 5 6 7 8 9 10 11 12 1 2 3 4 5 6 7 8 9 10 11 12
Data shown is purely illustrative
@adrianco
Combinations
@adrianco
Lift and Shift Compounding
0
25
50
75
100
Base Price Rightsized Seasonal Daily Scaling Reserved Tech Refresh Price Cuts
253030
707070
100 Traditional application using AWS heavy use reservations
Base price is for capacity bought up-front
@adrianco
Conservative Compounding
0
25
50
75
100
Base Price Rightsized Seasonal Daily Scaling Reserved Tech Refresh Price Cuts
15202535
50
70
100 Cloud native application partially optimized light use reservations
@adrianco
Agressive Compounding
0
25
50
75
100
Base Price Rightsized Seasonal Daily Scaling Reserved Tech Refresh Price Cuts
4681225
50
100 Cloud native application fully optimized autoscaling mixed reservation use costs 4% of base price over three years!
@adrianco
Cloud Native Patterns
● Business logic isolation in stateless micro-services
● Immutable code with instant rollback
● Auto-scaled capacity and deployment updates
● Distributed across availability zones and regions
● De-normalized single function NoSQL data stores
● See over 40 NetflixOSS projects at netflix.github.com
● Get “Technical Indigestion” trying to keep up with techblog.netflix.com
@adrianco
Cost Monitoring and Optimization
@adrianco
Final Thoughts
Turn off idle instances Clean up unused stuff Optimize for pricing model Assume prices will go down Go cloud native to save
@adrianco
Further Reading
See www.battery.com for a list of portfolio investments
● Battery Ventures Blog http://www.battery.com/powered ● Adrian’s Blog http://perfcap.blogspot.com and Twitter @adrianco ● Slideshare http://slideshare.com/adriancockcroft !
● Monitorama Opening Keynote Portland OR - May 7th, 2014 - Video available ● GOTO Chicago Opening Keynote May 20th, 2014 - Video available ● Qcon New York – Speed and Scale - June 11th, 2014 - Video available ● Structure - Cloud Trends - San Francisco - June 19th, 2014 - Video available ● GOTO Copenhagen/Aarhus – Denmark – Sept 25th, 2014 ● DevOps Enterprise Summit - San Francisco - Oct 21-23rd, 2014 - Videos available ● GOTO Berlin - Germany - Nov 6th, 2014 ● AWS Re:Invent - Las Vegas - Cloud Native Cost Optimization - November 14th, 2014 ● Dockercon Europe - Amsterdam - Microservices - December 4th, 2014