summit - automate best practices and operational health...

46
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Heitor Lessa, Solutions Architect @ AWS Stephen Gran, Senior Technical Architect @ Piksel June 28th Automate best practices and operational health for your AWS resources with Trusted Advisor and AWS Health

Upload: phunganh

Post on 11-Jun-2018

218 views

Category:

Documents


0 download

TRANSCRIPT

© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Heitor Lessa, Solutions Architect @ AWSStephen Gran, Senior Technical Architect @ Piksel

June 28th

Automate best practices and operational health for your AWS resources

with Trusted Advisor and AWS Health

What to expect from this session

• Learn about Trusted Advisor best practices and how to safely automate them in your environment.

• Get familiar with AWS Health and the Personal Health Dashboard (PHD).

• Learn how to automate remediation actions and customize Health alerts.

What’s in your AWS account(s)?

Availability Zone #1

www.example.com

Elastic Load Balancing

DatabaseEC2 instance

web appserver

Autoscaling Group #1

As you expand and change, entropy starts increasing:

Too much complexity! Time to optimize!

So what is Trusted Advisor (TA)?AWS Trusted Advisor provides best practices (or checks) in

four categories: cost optimization, security, fault tolerance, and performance improvement.

Red (action recommended)Yellow (investigation recommended)

Green (no problem detected)

AWS Trusted AdvisorOver 50 million recommendations provided to AWS customers resulted in $500m+ in cost savings for users of Trusted Advisor

How does it work?

“We estimate an average 33 percent monthly savings on our total AWS spend- Amit Vora, CTO for Hungama

How did Trusted Advisor help Hungama? It highlighted the three following things:

• Underutilized EC2 Instances

• Amazon EC2 Reserved Instances

• Underutilized EBS Volumes

Case Study – Hungama Digital Media

Building Automation

Using Trusted Advisor as a Web Service

AWS Trusted Advisor

AWS Lambda

Actions on AWS resources

AmazonCloudwatch

events

Notifications

With (not so) great automation come great risks

Production databases/instances could be considered idle.

- Low traffic period.

- Different system resource (e.g. memory) might be in use.

Database

Show me the money!

Turn idle instances off based on Trusted Advisor and Tags

Examples available in Githubhttps://github.com/aws/Trusted-Advisor-Tools

Trusted Advisor Best Practiceshttps://aws.amazon.com/premiumsupport/trustedadvisor/best-practices/

AWS Health and Personal Health Dashboard (PHD)AWS service health, notifications and automation

PHDAmazon

CloudWatchEvents

AWS Health and Personal Health Dashboard

Visibilityandtransparencyintoyourresources

Customnotificationsandautomatedactions

Remediation guidanceandknowledgearticles

Increased transparency and visibility

- Service Health Dashboard too generic- Increased transparency into underlying infrastructure- Remediation guidance for faster time-to-resolution- AWS Health API for easy integration- Custom notifications with predictable delivery- Automated actions for auto-remediation

AWS service integrations

Service-level insights into

healthAll AWS services

Amazon EC2

Amazon EBS

AmazonSES

Amazon VPC

AWS Direct Connect

Elastic Load Balancing

Amazon Elasticsearch Service

AmazonCognito

Amazon ElastiCache

AmazonRDS

Resource and service-level insights into

health

AWS Certificate Manager

AWSCloudTrail

AWS Personal Health Dashboard

Getting started with the Personal Health Dashboard

- From AWS Service Health Dashboard- From AWS website- From AWS Management Console navigation bar alert

How does the Personal Health Dashboard work?

AWSServicesandresourcesyouuse

Personal Health Dashboard

••describe-events••describe-event-details••describe-affected-entities••…

API

••Set Rules to extract events of interest••Set Targets for rules (Amazon

SNS, Amazon SQS, AWS Lambda, Amazon Kinesis)

Push notifications through

CloudWatch Events

HealthService

In-houseorthird-partymonitoringandevent

managementsystems

How does it work?

Examples

AWS Health Tools

aws/aws-health-tools

Automated actions in response to AWS Health events

Open source, community driven

Customized alerts in response to AWS Health events

AWS Health Tools - Examples

NotificationsviaSMS,SNS,Slack

Respondtoincidents:EC2storage,ELBScaling,PauseCodePipelinestages

aws/aws-health-tools

Notify Slack via AWS Lambda and Amazon Cloudwatch Events

Post alerts from AWS Health to a Slack Channel

Includes brief info about alert received

Quick access to PHD Consolehttps://git.io/vQspJ

Stop or terminate EC2 instances with instance store drive performance degraded

One or more physical storage drives affected

Instance storage performance degradation

Stop/Terminate EC2 Instances based on tagshttps://git.io/vQsVE

Disable AWS CodePipeline Stage Transition using AWS Lambda and Amazon Cloudwatch Events

Stop future deployments temporarily upon alerts

Prevents further stages to possibly fail

Manual intervention to re-enable after investigation https://git.io/vQsp1

Code Pipeline stage transition disabled

aws/aws-health-tools

PikselStephen Gran, Senior Technical Architect

Piksel – SaaS platform for video delivery

Piksel - Challenges

Maximize uptime and operational resource Minimize costs

Piksel - Automating Operational Health

ELB health check

Autoscaling groups

Cloudwatch metrics

Piksel – Some examples and their metrics

API server

Transcode worker

Container fleet

Piksel - Harder example

EC2 Health checks

Automatic disk attachment

Cloudwatch Auto Recover

Results

Lower transcode times

Higher confidence in the platform

Lower TCO - Staff and AWS spend

Piksel – Cost Savings

Turn off unused environments

Scheduled daily deletion/creation

Resilience as a bonus

It’s really, really hard to do what AWS does

More engineering capacity

Thank you!!

We are hiring!!

Recap

Minimize operational overhead

Improve platform resilience

Gain engineering excellence

AWS Trusted Advisor

AWS Health

Amazon Cloudwatch

Events

AWS Lambda

Automate Best Practices

OSS

Thank you!