aws webinar - measuring your application performance and health
DESCRIPTION
AWS Webinar - Measuring and monitoring application performance and healthTRANSCRIPT
AWS$201$
Measuring$Your$Applica6on$Performance$and$Health$
Markku$Lepistö$A$Technology$Evangelist$@markkulepisto$
Housekeeping$
• Presenta6on$~40mins$• Post$Ques6ons$Online$• Q&A$at$the$end$using$the$online$chat$• Reminder$–$Fill$in$the$survey!$$
Why monitor?
Without Instrumentation You Are Flying Blind
Actionable insights of Historical, Current, and Predicted system state Data-driven decisions
Availability Performance Cost-optimization Release speed & quality …
Instrumentation Gives You
What to monitor?
Business KPIs Transactions total Customer QoS Customer QoE Revenue Cost …
Operational KPIs Transaction – success & error rate, latency
Throughput Load - system, service, node, component
Health Availability …
KPI = Key Performance Indicator, i.e metric
What are we actually measuring?
System Inputs, State Changes and Outputs
delta
What causes system changes? Inputs (customer traffic)
Code changes Manual operations (Ops ! Opps!)
Automated operations (Complex Adaptive System) OS packages & patches
Dependent services Underlying infrastructure
delta
When and where should we measure?
Everywhere - All the Time!
“Big$Data$is$what$happened$when$the$cost$of$storing$informa6on$became$less$than$the$cost$of$making$the$decision$to$
throw$it$away”!
George!Dyson,!!Author!of!“The!Digital!Universe”!
COLLECT$|$ANALYZE$|$DISPLAY$|$ACT$
COLLECT$|$ANALYZE$|$DISPLAY$|$ACT$
Top
to B
otto
m: T
echn
olog
y S
tack
End-to-End: Client – Server / Service
When and Where to Measure & Collect?
Top
to B
otto
m: T
echn
olog
y S
tack
End-to-End: Client – Server / Service
When and Where to Measure & Collect?
When$to$Measure?$Throughout$Applica6on$Lifecycle$
test$
Con6nuous$Integra6on$
code$ build$plan$
Agile$Development$Source$h\p://www.collab.net$
deploy$ operate$
DevOps$
release$
Con6nuous$Delivery$
When$to$Measure?$Throughout$Applica6on$Lifecycle$
test$code$ build$plan$ deploy$ operate$
Commits$Lines$changed$Modules$changed$Issues$resolved$Features$implemented$
release$
When$to$Measure?$Throughout$Applica6on$Lifecycle$
test$code$ build$plan$ deploy$ operate$
Successful$builds$Failed$builds$Build$dura6on$vs!HW!resources!used!Images$(AMI)$built$
release$
When$to$Measure?$Throughout$Applica6on$Lifecycle$
test$code$ build$plan$ deploy$ operate$
Integra6on$test$success/failure$Performance$test$metrics$
$Throughput$as$a$func=on!of!virtual!HW!used!Stability$test$metrics$
$Memory$leak?$Filesystem$trends$–$fill/cleanup$etc?$$Degrada6on$of$any$KPI$over!=me?!
Security$test$metrics$–$PEN…$
release$
When$to$Measure?$Throughout$Applica6on$Lifecycle$
test$code$ build$plan$ deploy$ operate$
#$of$releases$#$of$deploys$#$of$rollbacks$Opera6onal$KPIs$
$Stability,$availability$$Performance,$security$$…$
release$
When$to$Measure?$Throughout$Applica6on$Lifecycle$
test$code$ build$plan$ deploy$ operate$
#$of$bugs$reported++$#$of$features$requested++$Performance$&$Cost$op6miza6on$A/B$test$results$
release$
Feedback$Loop$
Challenge:$DevOps$&$Cloud$Increase$Rate$of$Change$
Rare Releases – Static Servers “Waterfall”
Frequent Releases – Dynamic Instances “Agile, Lean, DevOps”
Time!
Change!
Time!
Change!
New$code,$on$bursts$of$new$instances$Instance$role$changes$
Dynamic,$recycled$IP$addresses$
LongAlived$servers$Sta6c$roles$
Sta6c$IP$addresses$
Top
to B
otto
m: T
echn
olog
y S
tack
End-to-End: Client – Server / Service
When and Where to Measure and Collect?
Where$to$Measure?$EndAtoAEnd$
Client $$$$$$$$$$$$$$$$$Transport$Net/Services $$$$$Your$App/Service$$$$$$$3rd$Party$Services$
AWS$Services$
Where$to$Measure?$EndAtoAEnd$
Test$Client$Agents$QoS,$QoE$KPIs$
Client $$$$$$$$$$$$$$$$$Transport$Net/Services $$$$$Your$App/Service$$$$$$$3rd$Party$Services$
AWS$Services$
Where$to$Measure?$EndAtoAEnd$
Tcpdump$on$Client$and$App$Servers$Wireshark$for$Transport$QoS$KPIs$
Client $$$$$$$$$$$$$$$$$Transport$Net/Services $$$$$Your$App/Service$$$$$$$3rd$Party$Services$
AWS$Services$
Client/Server$QoS$with$Transport$Layer$Metrics$
Client$
Server$
Where$to$Measure?$EndAtoAEnd$
AWS$Service$Health$Dashboard$AWS$CloudTrail$AWS$CloudWatch$
Client $$$$$$$$$$$$$$$$$Transport$Net/Services $$$$$Your$App/Service$$$$$$$3rd$Party$Services$
AWS$Services$
Monitoring$AWS$A$Service$Health$Dashboard$
Monitoring$AWS$Account$Ac6vi6es$A$AWS$CloudTrail$
You are making API
calls...
On a growing set of services
around the world…
CloudTrail is continuously recording API
calls…
And delivering log files to you
Partner CloudTrail Solutions
Monitoring$AWS$Resources$–$Amazon$CloudWatch$
AWS$Service$Measurements$
• Auto$Scaling$groups$
• AWS$es6mated$charges$
• Amazon$DynamoDB$tables$
• Amazon$EBS$volumes$
• Amazon$EC2$instances$
• Amazon$Elas6Cache$caches$
• Elas6c$Load$Balancing$
• Amazon$Elas6c$MapReduce$jobs$
• Amazon$RDS$databases$
• Amazon$SNS$no6fica6ons$
• Amazon$SQS$queues$
• AWS$Storage$Gateway$
$$$$$++$
CloudWatch+Alarms+
EC2:$$Tell$me$if$my$instance$needs$a\en6on$$$DynamoDB:$$Help$me$balance$cost$and$performance$$$Billing:$$Tell$me$when$my$bill$is$gemng$too$high$
$
Custom+Metrics+Example$–$Instance$Memory$
Where$to$Measure?$EndAtoAEnd$
Request/Response$success/fail$Response$latency$
Client $$$$$$$$$$$$$$$$$Transport$Net/Services $$$$$Your$App/Service$$$$$$$3rd$Party$Services$
AWS$Services$
Measuring$External,$Dependent$Services$
Where$to$Measure?$EndAtoAEnd$
Client $$$$$$$$$$$$$$$$$Transport$Net/Services $$$$$Your$App/Service$$$$$$$3rd$Party$Services$
AWS$Services$
Top
to B
otto
m: T
echn
olog
y S
tack
End-to-End: Client – Server / Service
When and Where to Measure and Collect?
User$Applica6on$
Applica6on$Server$
Web$/$DB$Server$
Language$Interpreter$/$$JVM$
Guest$Opera6ng$System$&$Services$
EC2$Instance$
Measure$the$En6re$Stack,$Top$to$Bo\om$
Applica6on$Internal$Metrics$
COLLECT$|$ANALYZE$|$DISPLAY$|$ACT$
$$$STORE$$$|$$ANALYZE$
Glacier$
S3$ EC2$
Redshir$DynamoDB$$
EMR$
Data$Pipeline$
Leverage$AWS$Big$Data$Services$
Kinesis$
COLLECT$|$ANALYZE$|$DISPLAY$|$ACT$
METRICS+@ETSY+
Values$over$Time$$at$Sampling!Rate!
Visualiza6on$A$Graph$
Sampling+Rate+How$oCen$should$I$measure?$
Depends$on$what$you$measure$A$Depends$on$its$rate!of!change!(frequency)$
Nyquist$$Frequency$$
Original$signal$=$Red$Measured$signal$=$Blue$
You!should!measure!at!least!twice!as!oCen!as!your!value!changes!
System$Measurements$==$Signal$We$can$do$Digital$Signal$Processing$
Linear+Regression$–$trendline$predicts$filesystem$running$out$of$inodes$(cannot$create$files)$
System$Measurements$==$Signal$We$can$do$Digital$Signal$Processing$
Linear+regression+&+Fast+Fourier+TransformaAon+for$pa\erns,$anomalies$and$future$predic6ons$
Visualiza6on$–$Sca\er$Plot$
Visualiza6on$–$Box$Plot$
Including$outliers$&$ends$of$distribu6on$
Visualiza6on$–$Normal$Curve$&$Histogram$
opsly.com$
COLLECT$|$ANALYZE$|$DISPLAY$|$ACT$
Manual$/$Human$Ac6ons$A$OODA$Loop$
Automated$Human$Ac6ons$$Amazon$CloudWatch,$Amazon$SNS$&$Pager$Duty$
Automatic resizing of compute clusters based on measurements, thresholds and actions
Trigger$autoAscaling$policy$
Feature+ Details+Control+ Define$minimum$and$maximum$instance$pool$
sizes$and$when$scaling$and$cool$down$occurs.$
Integrated+to+Amazon+CloudWatch+
Use$metrics$gathered$by$CloudWatch$to$drive$scaling.$
Instance+types+ Run$Auto$Scaling$for$OnADemand$and$Spot$Instances.$Compa6ble$with$VPC.$
as-create-auto-scaling-group MyGroup --launch-configuration MyConfig --availability-zones us-east-1a --min-size 4 --max-size 200
Amazon$CloudWatch$
Automated$Ac6ons$–$AWS$Auto$Scaling$
Automated$Ac6ons$A$PID$Controller$System$Reaches$Target$State$with$Calculated$Changes$and$Monitoring$Feedback$Loop$
Propor6onal,$$Integral,$$Deriva6ve$
Useful+Tools+and+Services+
Thank$you$
Markku$Lepistö$A$Technology$Evangelist$@markkulepisto$
Your$Feedback$is$Important$
Please$complete$the$Survey!$What’s!good,!what’s!not!
What!you!want!to!see!at!these!events!
What!you!want!AWS!to!deliver!for!you!
$
Q&A