aws webinar - measuring your application performance and health

AWS$201$

Measuring$Your$Applica6on$Performance$and$Health$

Markku$Lepistö$A$Technology$Evangelist$@markkulepisto$

Housekeeping$

• Presenta6on$~40mins$• Post$Ques6ons$Online$• Q&A$at$the$end$using$the$online$chat$• Reminder$–$Fill$in$the$survey!$$

Why monitor?

Without Instrumentation You Are Flying Blind

Actionable insights of Historical, Current, and Predicted system state Data-driven decisions

Availability Performance Cost-optimization Release speed & quality …

Instrumentation Gives You

What to monitor?

Business KPIs Transactions total Customer QoS Customer QoE Revenue Cost …

Operational KPIs Transaction – success & error rate, latency

Throughput Load - system, service, node, component

Health Availability …

KPI = Key Performance Indicator, i.e metric

What are we actually measuring?

System Inputs, State Changes and Outputs

delta

What causes system changes? Inputs (customer traffic)

Code changes Manual operations (Ops ! Opps!)

Automated operations (Complex Adaptive System) OS packages & patches

Dependent services Underlying infrastructure

delta

When and where should we measure?

Everywhere - All the Time!

“Big$Data$is$what$happened$when$the$cost$of$storing$informa6on$became$less$than$the$cost$of$making$the$decision$to$

throw$it$away”!

George!Dyson,!!Author!of!“The!Digital!Universe”!

COLLECT$|$ANALYZE$|$DISPLAY$|$ACT$

Top

to B

otto

m: T

echn

olog

y S

tack

End-to-End: Client – Server / Service

When and Where to Measure & Collect?

When$to$Measure?$Throughout$Applica6on$Lifecycle$

test$

Con6nuous$Integra6on$

code$ build$plan$

Agile$Development$Source$h\p://www.collab.net$

deploy$ operate$

DevOps$

release$

Con6nuous$Delivery$


test$code$ build$plan$ deploy$ operate$

Commits$Lines$changed$Modules$changed$Issues$resolved$Features$implemented$

release$



Successful$builds$Failed$builds$Build$dura6on$vs!HW!resources!used!Images$(AMI)$built$

release$



Integra6on$test$success/failure$Performance$test$metrics$

$Throughput$as$a$func=on!of!virtual!HW!used!Stability$test$metrics$

$Memory$leak?$Filesystem$trends$–$fill/cleanup$etc?$$Degrada6on$of$any$KPI$over!=me?!

Security$test$metrics$–$PEN…$

release$



#$of$releases$#$of$deploys$#$of$rollbacks$Opera6onal$KPIs$

$Stability,$availability$$Performance,$security$$…$

release$



#$of$bugs$reported++$#$of$features$requested++$Performance$&$Cost$op6miza6on$A/B$test$results$

release$

Feedback$Loop$

Challenge:$DevOps$&$Cloud$Increase$Rate$of$Change$

Rare Releases – Static Servers “Waterfall”

Frequent Releases – Dynamic Instances “Agile, Lean, DevOps”

Time!

Change!

Time!

Change!

New$code,$on$bursts$of$new$instances$Instance$role$changes$

Dynamic,$recycled$IP$addresses$

LongAlived$servers$Sta6c$roles$

Sta6c$IP$addresses$

Top

to B

otto

m: T

echn

olog

y S

tack


When and Where to Measure and Collect?

Where$to$Measure?$EndAtoAEnd$

Client $$$$$$$$$$$$$$$$$Transport$Net/Services $$$$$Your$App/Service$$$$$$$3rd$Party$Services$

AWS$Services$


Test$Client$Agents$QoS,$QoE$KPIs$


AWS$Services$


Tcpdump$on$Client$and$App$Servers$Wireshark$for$Transport$QoS$KPIs$


AWS$Services$

Client/Server$QoS$with$Transport$Layer$Metrics$

Client$

Server$


AWS$Service$Health$Dashboard$AWS$CloudTrail$AWS$CloudWatch$


AWS$Services$

Monitoring$AWS$A$Service$Health$Dashboard$

Monitoring$AWS$Account$Ac6vi6es$A$AWS$CloudTrail$

You are making API

calls...

On a growing set of services

around the world…

CloudTrail is continuously recording API

calls…

And delivering log files to you

Partner CloudTrail Solutions

Monitoring$AWS$Resources$–$Amazon$CloudWatch$

AWS$Service$Measurements$

•  Auto$Scaling$groups$

•  AWS$es6mated$charges$

•  Amazon$DynamoDB$tables$

•  Amazon$EBS$volumes$

•  Amazon$EC2$instances$

•  Amazon$Elas6Cache$caches$

•  Elas6c$Load$Balancing$

•  Amazon$Elas6c$MapReduce$jobs$

•  Amazon$RDS$databases$

•  Amazon$SNS$no6fica6ons$

•  Amazon$SQS$queues$

•  AWS$Storage$Gateway$

$$$$$++$

CloudWatch+Alarms+

EC2:$$Tell$me$if$my$instance$needs$a\en6on$$$DynamoDB:$$Help$me$balance$cost$and$performance$$$Billing:$$Tell$me$when$my$bill$is$gemng$too$high$

$

Custom+Metrics+Example$–$Instance$Memory$


Request/Response$success/fail$Response$latency$


AWS$Services$

Measuring$External,$Dependent$Services$



AWS$Services$

Top

to B

otto

m: T

echn

olog

y S

tack


When and Where to Measure and Collect?

User$Applica6on$

Applica6on$Server$

Web$/$DB$Server$

Language$Interpreter$/$$JVM$

Guest$Opera6ng$System$&$Services$

EC2$Instance$

Measure$the$En6re$Stack,$Top$to$Bo\om$

Applica6on$Internal$Metrics$

$$$STORE$$$|$$ANALYZE$

Glacier$

S3$ EC2$

Redshir$DynamoDB$$

EMR$

Data$Pipeline$

Leverage$AWS$Big$Data$Services$

Kinesis$

METRICS+@ETSY+

Values$over$Time$$at$Sampling!Rate!

Visualiza6on$A$Graph$

Sampling+Rate+How$oCen$should$I$measure?$

Depends$on$what$you$measure$A$Depends$on$its$rate!of!change!(frequency)$

Nyquist$$Frequency$$

Original$signal$=$Red$Measured$signal$=$Blue$

You!should!measure!at!least!twice!as!oCen!as!your!value!changes!

System$Measurements$==$Signal$We$can$do$Digital$Signal$Processing$

Linear+Regression$–$trendline$predicts$filesystem$running$out$of$inodes$(cannot$create$files)$

System$Measurements$==$Signal$We$can$do$Digital$Signal$Processing$

Linear+regression+&+Fast+Fourier+TransformaAon+for$pa\erns,$anomalies$and$future$predic6ons$

Visualiza6on$–$Sca\er$Plot$

Visualiza6on$–$Box$Plot$

Including$outliers$&$ends$of$distribu6on$

Visualiza6on$–$Normal$Curve$&$Histogram$

opsly.com$

Manual$/$Human$Ac6ons$A$OODA$Loop$

Automated$Human$Ac6ons$$Amazon$CloudWatch,$Amazon$SNS$&$Pager$Duty$

Automatic resizing of compute clusters based on measurements, thresholds and actions

Trigger$autoAscaling$policy$

Feature+ Details+Control+ Define$minimum$and$maximum$instance$pool$

sizes$and$when$scaling$and$cool$down$occurs.$

Integrated+to+Amazon+CloudWatch+

Use$metrics$gathered$by$CloudWatch$to$drive$scaling.$

Instance+types+ Run$Auto$Scaling$for$OnADemand$and$Spot$Instances.$Compa6ble$with$VPC.$

as-create-auto-scaling-group MyGroup --launch-configuration MyConfig --availability-zones us-east-1a --min-size 4 --max-size 200

Amazon$CloudWatch$

Automated$Ac6ons$–$AWS$Auto$Scaling$

Automated$Ac6ons$A$PID$Controller$System$Reaches$Target$State$with$Calculated$Changes$and$Monitoring$Feedback$Loop$

Propor6onal,$$Integral,$$Deriva6ve$

Useful+Tools+and+Services+

Thank$you$

Markku$Lepistö$A$Technology$Evangelist$@markkulepisto$

Your$Feedback$is$Important$

Please$complete$the$Survey!$What’s!good,!what’s!not!

What!you!want!to!see!at!these!events!

What!you!want!AWS!to!deliver!for!you!

$

aws webinar - measuring your application performance and health

Technology

client server service

system changes

growing set of services

system inputs

key performance indicator

state changes

devops time

world cloudtrail