advanced topics - session 4 - architecting for high availability

130
Ianni Vamvadelis, Solution Architect Architecting for high availability

Upload: amazon-web-services

Post on 10-Jul-2015

1.132 views

Category:

Technology


4 download

DESCRIPTION

AWS provides a platform that is ideally suited for building highly available systems, enabling you to build reliable, affordable, fault-tolerant systems that operate with a minimal amount of human interaction. This presentation covers many of the high-availability and fault-tolerance concepts and features of the various services that you can use to build highly reliable and highly available applications in the AWS Cloud: architectures involving multiple Availability Zones, including EC2 best practices and RDS Multi-AZ deployments; loosely coupled and self-healing systems involving SQS and Auto Scaling; networking best practices for high availability, including Elastic IP addresses, load balancing, and DNS; leveraging services that inherently are built with high-availability and fault tolerance in mind, including S3, Elastic Beanstalk and more. Ianni Vamvadelis, Manager, Solution Architecture, AWS Daniel Richardson, Director of Engineering, JustEat

TRANSCRIPT

Page 1: Advanced Topics - Session 4 - Architecting for High Availability

Ianni Vamvadelis, Solution Architect

Architecting for high

availability

Page 2: Advanced Topics - Session 4 - Architecting for High Availability

2 2

What is High Availability (HA)?

• Percentage of time an application operates

• Loss of availability is known as an outage or downtime

– Planned and unplanned

– App is offline, unreachable, or partially available

– App is unresponsive

Page 3: Advanced Topics - Session 4 - Architecting for High Availability

3 3

HA is related to …

• Scalability

– Often slow is indistinguishable from unavailable.

• Fault Tolerance

– Apps continue functioning when components fail

• Disaster Recovery

– Restoring service after a catastrophic event

Page 4: Advanced Topics - Session 4 - Architecting for High Availability

4 4

HA and DR

• A continuum

• business continuity plan

• Not all or nothing proposition

In the face of internal or external events, how do you…

– Keep your applications running 24x7

– Make sure you data is safe

– Get an application recovered after a major disaster

High Availability Disaster Recovery

Page 5: Advanced Topics - Session 4 - Architecting for High Availability

How does AWS Help

High Availability?

Page 6: Advanced Topics - Session 4 - Architecting for High Availability

US-WEST (Oregon) EU-WEST (Ireland)

ASIA PAC (Tokyo)

ASIA PAC

(Singapore)

US-WEST (N. California)

SOUTH AMERICA (Sao Paulo)

US-EAST (Virginia)

AWS GovCloud (US)

ASIA PAC (Sydney)

Page 7: Advanced Topics - Session 4 - Architecting for High Availability

US-WEST (Oregon)) EU-WEST (Ireland)

ASIA PAC (Tokyo)

ASIA PAC

(Singapore)

US-WEST (N. California)

SOUTH AMERICA (Sao Paulo)

US-EAST (Virginia)

AWS GovCloud (US)

ASIA PAC (Sydney)

Page 8: Advanced Topics - Session 4 - Architecting for High Availability

8 8

Automation

Page 9: Advanced Topics - Session 4 - Architecting for High Availability

AWS SERVICES

Inherently Highly Available and Fault Tolerant Services

Highly Available with the right architecture

Amazon S3

Amazon DynamoDB

Amazon CloudFront

Amazon Route53

Elastic Load Balancing

Amazon SQS

Amazon SNS

Amazon SES

Amazon SWF

Amazon EC2

Amazon EBS

Amazon RDS

Amazon VPC

Page 10: Advanced Topics - Session 4 - Architecting for High Availability

AWS

Principles for HA

Page 11: Advanced Topics - Session 4 - Architecting for High Availability

1. DESIGN FOR FAILURE

2. MULTIPLE AVAILABILITY ZONES

3. SCALING

4. SELF-HEALING

5. LOOSE COUPLING

Page 12: Advanced Topics - Session 4 - Architecting for High Availability

LET’S BUILD A

HIGHLY AVAILABLE SYSTEM

Page 13: Advanced Topics - Session 4 - Architecting for High Availability
Page 14: Advanced Topics - Session 4 - Architecting for High Availability
Page 15: Advanced Topics - Session 4 - Architecting for High Availability
Page 16: Advanced Topics - Session 4 - Architecting for High Availability
Page 17: Advanced Topics - Session 4 - Architecting for High Availability

#1 DESIGN FOR FAILURE

●○○○○

Page 18: Advanced Topics - Session 4 - Architecting for High Availability

« Everything fails all the time »

Werner Vogels

CTO of Amazon

Page 19: Advanced Topics - Session 4 - Architecting for High Availability

AVOID SINGLE POINTS OF FAILURE

Page 20: Advanced Topics - Session 4 - Architecting for High Availability

AVOID SINGLE POINTS OF FAILURE

ASSUME EVERYTHING FAILS,

AND WORK BACKWARDS

Page 21: Advanced Topics - Session 4 - Architecting for High Availability

YOUR GOAL

Applications should continue to function

Page 22: Advanced Topics - Session 4 - Architecting for High Availability
Page 23: Advanced Topics - Session 4 - Architecting for High Availability
Page 24: Advanced Topics - Session 4 - Architecting for High Availability
Page 25: Advanced Topics - Session 4 - Architecting for High Availability

AMAZON EBS ELASTIC BLOCK STORE

Page 26: Advanced Topics - Session 4 - Architecting for High Availability
Page 27: Advanced Topics - Session 4 - Architecting for High Availability
Page 28: Advanced Topics - Session 4 - Architecting for High Availability
Page 29: Advanced Topics - Session 4 - Architecting for High Availability
Page 30: Advanced Topics - Session 4 - Architecting for High Availability

AMAZON ELB ELASTIC LOAD BALANCING

Page 31: Advanced Topics - Session 4 - Architecting for High Availability
Page 32: Advanced Topics - Session 4 - Architecting for High Availability
Page 33: Advanced Topics - Session 4 - Architecting for High Availability

HEALTH CHECKS

Page 34: Advanced Topics - Session 4 - Architecting for High Availability
Page 35: Advanced Topics - Session 4 - Architecting for High Availability
Page 36: Advanced Topics - Session 4 - Architecting for High Availability
Page 37: Advanced Topics - Session 4 - Architecting for High Availability
Page 38: Advanced Topics - Session 4 - Architecting for High Availability
Page 39: Advanced Topics - Session 4 - Architecting for High Availability

#2 MULTIPLE

AVAILABILITY ZONES ●●○○○

Page 40: Advanced Topics - Session 4 - Architecting for High Availability

AMAZON RDS

MULTI-AZ

Page 41: Advanced Topics - Session 4 - Architecting for High Availability
Page 42: Advanced Topics - Session 4 - Architecting for High Availability
Page 43: Advanced Topics - Session 4 - Architecting for High Availability
Page 44: Advanced Topics - Session 4 - Architecting for High Availability
Page 45: Advanced Topics - Session 4 - Architecting for High Availability
Page 46: Advanced Topics - Session 4 - Architecting for High Availability

AMAZON ELB AND

MULTIPLE AZs

Page 47: Advanced Topics - Session 4 - Architecting for High Availability
Page 48: Advanced Topics - Session 4 - Architecting for High Availability
Page 49: Advanced Topics - Session 4 - Architecting for High Availability

#3 SCALING

●●●○○

Page 50: Advanced Topics - Session 4 - Architecting for High Availability
Page 51: Advanced Topics - Session 4 - Architecting for High Availability
Page 52: Advanced Topics - Session 4 - Architecting for High Availability

AUTO SCALING SCALE UP/DOWN EC2 CAPACITY

Page 53: Advanced Topics - Session 4 - Architecting for High Availability
Page 54: Advanced Topics - Session 4 - Architecting for High Availability
Page 55: Advanced Topics - Session 4 - Architecting for High Availability
Page 56: Advanced Topics - Session 4 - Architecting for High Availability
Page 57: Advanced Topics - Session 4 - Architecting for High Availability
Page 58: Advanced Topics - Session 4 - Architecting for High Availability
Page 59: Advanced Topics - Session 4 - Architecting for High Availability
Page 60: Advanced Topics - Session 4 - Architecting for High Availability
Page 61: Advanced Topics - Session 4 - Architecting for High Availability
Page 62: Advanced Topics - Session 4 - Architecting for High Availability
Page 63: Advanced Topics - Session 4 - Architecting for High Availability

#4 SELF-HEALING

●●●●○

Page 64: Advanced Topics - Session 4 - Architecting for High Availability

HEALTH CHECKS

+ AUTO SCALING

Page 65: Advanced Topics - Session 4 - Architecting for High Availability
Page 66: Advanced Topics - Session 4 - Architecting for High Availability
Page 67: Advanced Topics - Session 4 - Architecting for High Availability
Page 68: Advanced Topics - Session 4 - Architecting for High Availability
Page 69: Advanced Topics - Session 4 - Architecting for High Availability

HEALTH CHECKS

+ AUTO SCALING

=

SELF-HEALING

Page 70: Advanced Topics - Session 4 - Architecting for High Availability

DEGRADED MODE

Page 71: Advanced Topics - Session 4 - Architecting for High Availability

AMAZON S3 STATIC WEBSITE

+ AMAZON ROUTE 53

WEIGHTED RESOLUTION

Page 72: Advanced Topics - Session 4 - Architecting for High Availability
Page 73: Advanced Topics - Session 4 - Architecting for High Availability
Page 74: Advanced Topics - Session 4 - Architecting for High Availability
Page 75: Advanced Topics - Session 4 - Architecting for High Availability

#5 LOOSE

COUPLING ●●●●●

Page 76: Advanced Topics - Session 4 - Architecting for High Availability

BUILD LOOSELY COUPLED SYSTEMS

The looser they are coupled, the bigger they scale,

the more fault tolerant they get…

Page 77: Advanced Topics - Session 4 - Architecting for High Availability

AMAZON SQS SIMPLE QUEUE SERVICE

Page 78: Advanced Topics - Session 4 - Architecting for High Availability

PUBLISH& NOTIFY

RECEIVE TRANSCODE

Page 79: Advanced Topics - Session 4 - Architecting for High Availability

PUBLISH& NOTIFY

RECEIVE TRANSCODE

Page 80: Advanced Topics - Session 4 - Architecting for High Availability
Page 81: Advanced Topics - Session 4 - Architecting for High Availability
Page 82: Advanced Topics - Session 4 - Architecting for High Availability
Page 83: Advanced Topics - Session 4 - Architecting for High Availability
Page 84: Advanced Topics - Session 4 - Architecting for High Availability
Page 85: Advanced Topics - Session 4 - Architecting for High Availability
Page 86: Advanced Topics - Session 4 - Architecting for High Availability

VISIBILITY TIMEOUT

Page 87: Advanced Topics - Session 4 - Architecting for High Availability
Page 88: Advanced Topics - Session 4 - Architecting for High Availability
Page 89: Advanced Topics - Session 4 - Architecting for High Availability
Page 90: Advanced Topics - Session 4 - Architecting for High Availability
Page 91: Advanced Topics - Session 4 - Architecting for High Availability
Page 92: Advanced Topics - Session 4 - Architecting for High Availability
Page 93: Advanced Topics - Session 4 - Architecting for High Availability
Page 94: Advanced Topics - Session 4 - Architecting for High Availability

BUFFERING

Page 95: Advanced Topics - Session 4 - Architecting for High Availability
Page 96: Advanced Topics - Session 4 - Architecting for High Availability
Page 97: Advanced Topics - Session 4 - Architecting for High Availability
Page 98: Advanced Topics - Session 4 - Architecting for High Availability
Page 99: Advanced Topics - Session 4 - Architecting for High Availability
Page 100: Advanced Topics - Session 4 - Architecting for High Availability
Page 101: Advanced Topics - Session 4 - Architecting for High Availability

CLOUDWATCH METRICS FOR AMAZON SQS

+ AUTO SCALING

Page 102: Advanced Topics - Session 4 - Architecting for High Availability
Page 103: Advanced Topics - Session 4 - Architecting for High Availability
Page 104: Advanced Topics - Session 4 - Architecting for High Availability
Page 105: Advanced Topics - Session 4 - Architecting for High Availability
Page 106: Advanced Topics - Session 4 - Architecting for High Availability
Page 107: Advanced Topics - Session 4 - Architecting for High Availability
Page 108: Advanced Topics - Session 4 - Architecting for High Availability
Page 109: Advanced Topics - Session 4 - Architecting for High Availability

1. DESIGN FOR FAILURE

2. MULTIPLE AVAILABILITY ZONES

3. SCALING

4. SELF-HEALING

5. LOOSE COUPLING

Page 110: Advanced Topics - Session 4 - Architecting for High Availability

1. DESIGN FOR FAILURE

2. MULTIPLE AVAILABILITY ZONES

3. SCALING

4. SELF-HEALING

5. LOOSE COUPLING

Page 111: Advanced Topics - Session 4 - Architecting for High Availability

1. DESIGN FOR FAILURE

2. MULTIPLE AVAILABILITY ZONES

3. SCALING

4. SELF-HEALING

5. LOOSE COUPLING

Page 112: Advanced Topics - Session 4 - Architecting for High Availability

1. DESIGN FOR FAILURE

2. MULTIPLE AVAILABILITY ZONES

3. SCALING

4. SELF-HEALING

5. LOOSE COUPLING

Page 113: Advanced Topics - Session 4 - Architecting for High Availability

1. DESIGN FOR FAILURE

2. MULTIPLE AVAILABILITY ZONES

3. SCALING

4. SELF-HEALING

5. LOOSE COUPLING

Page 114: Advanced Topics - Session 4 - Architecting for High Availability

1. DESIGN FOR FAILURE

2. MULTIPLE AVAILABILITY ZONES

3. SCALING

4. SELF-HEALING

5. LOOSE COUPLING

Page 115: Advanced Topics - Session 4 - Architecting for High Availability

YOUR GOAL

Applications should continue to function

Page 116: Advanced Topics - Session 4 - Architecting for High Availability

IT’S ALL ABOUT

CHOICE BALANCE COST & HIGH AVAILABILITY

Page 117: Advanced Topics - Session 4 - Architecting for High Availability

117 117

Summary

Leverage AWS Services

Apply 5 principles for HA

Automate

Test your HA implementation

Page 118: Advanced Topics - Session 4 - Architecting for High Availability

118 118

aws.amazon.com/architecture

Page 119: Advanced Topics - Session 4 - Architecting for High Availability

JUST EAT HIGH AVAILABILITY WITH AWS

Page 120: Advanced Topics - Session 4 - Architecting for High Availability

120

JUST EAT

13 countries

34,000+ restaurants

8m+ members

Over 50m orders

16,000+ restaurants in UK, 8m visits a month

Page 121: Advanced Topics - Session 4 - Architecting for High Availability

121

PLATFORM Devices in restaurants

Consumer Website

Public API

Order API Ratings API Search API …

Restaurant Services

SQL Server Networking Monitoring

Customer Care Tools

Emails

Common Infrastructure

Apps and External Services

APIs

Page 122: Advanced Topics - Session 4 - Architecting for High Availability

122

DESIGN FOR FAILURE

Device Service

Auto scaling Group

eu-west-1a

Orders queue

Orders data

Devices in restaurants

eu-west-1b

eu-west-1c

Web Service

Auto scaling Group

eu-west-1a

eu-west-1b

eu-west-1c

Web Service

Web Service

JCT Service Device Service

Page 123: Advanced Topics - Session 4 - Architecting for High Availability

123

SCALING - PROACTIVE

123

Page 124: Advanced Topics - Session 4 - Architecting for High Availability

124

SCALING - PROACTIVE

Web servers in data center

Page 125: Advanced Topics - Session 4 - Architecting for High Availability

125

SCALING – PROACTIVE

Web servers in data center

Web EC2 instances

Page 126: Advanced Topics - Session 4 - Architecting for High Availability

126

SCALING – REACTIVE

Web servers in data center

Web EC2 instances

Page 127: Advanced Topics - Session 4 - Architecting for High Availability

127

EVERYTHING MULTI AZ – CONSUMER WEBSITE

Auto scaling Group

eu-west-1a eu-west-1b eu-west-1c

Monitor to keep resource usage at max of 66% of capacity in each AZ

when everything’s available.

66% 66% 66% 99% 99%

Page 128: Advanced Topics - Session 4 - Architecting for High Availability

128 128

EVERYTHING MULTI AZ – INTERNAL APIS

Auto scaling Group

eu-west-1a eu-west-1b eu-west-1c

Alarms tell us that performance has been degraded – but platform will

self heal as new instances are launched.

Applications assume that internal APIs will fail or run slowly. So can cope with the loss of an AZ

or instances – will just degrade gracefully.

80% 80% 80% 100% 100%

Page 129: Advanced Topics - Session 4 - Architecting for High Availability

129 129 129

EVERYTHING MULTI AZ – SQL SERVER 2012

eu-west-1a eu-west-1b eu-west-1c

Connection strings simply contain both primary and secondary servers –

no code changes required.

Primary Witness Secondary Alarms tell us that failover has

occurred, but it happens without manual intervention.

Page 130: Advanced Topics - Session 4 - Architecting for High Availability

DANIEL RICHARDSON

DIRECTOR OF ENGINEERING, JUST EAT

[email protected]

130

www.just-eat.com/jobs

twitter.com/JustEatUK

www.facebook.com/justeat