architectures in aws highly available database · i.e. every weekend, traffic triples so add 2 new...

56
Santa Clara, California | April 23th – 25th, 2018 Mike Benshoof, Technical Account Manager, Percona Highly Available Database Architectures in AWS

Upload: others

Post on 23-May-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Architectures in AWS Highly Available Database · I.e. every weekend, traffic triples so add 2 new replicas each Friday evening, terminate each Monday morning (saves $) Application

Santa Clara, California | April 23th – 25th, 2018Mike Benshoof, Technical Account Manager, Percona

Highly Available Database Architectures in AWS

Page 2: Architectures in AWS Highly Available Database · I.e. every weekend, traffic triples so add 2 new replicas each Friday evening, terminate each Monday morning (saves $) Application

2

Hello, Percona Live Attendees!

What this talk is meant to be...• High level overview of a highly available

(HA) database solution - What is it and why do we need it?- General concepts

• Examples of HA architectures using different AWS components- EC2, RDS, Aurora, and ProxySQL

• General best practices from a design and application standpoint- High level considerations of issues and

planning for failure

Page 3: Architectures in AWS Highly Available Database · I.e. every weekend, traffic triples so add 2 new replicas each Friday evening, terminate each Monday morning (saves $) Application

3

Hello, Percona Live Attendees!

What this talk is meant to be...• High level overview of a highly available

(HA) database solution - What is it and why do we need it?- General concepts

• Examples of HA architectures using different AWS components- EC2, RDS, Aurora, and ProxySQL

• General best practices from a design and application standpoint- High level considerations of issues and

planning for failure

What this talk is not meant to be...• A deep dive into AWS or MySQL

internals- Won’t be any mention of provisioned

IOPS or buffer pool size• A listing of several benchmarks with a

recommendation of which is “best”- Benchmarks can be misleading, your

application is unique• A description of a “silver bullet”

architecture that will fit every use case- There is no single solution

Page 4: Architectures in AWS Highly Available Database · I.e. every weekend, traffic triples so add 2 new replicas each Friday evening, terminate each Monday morning (saves $) Application

4

So let’s dig in...

What is a highly available database solution?

An architecture that is designed to continue to function normally in the event of hardware or network failure within the system

Page 5: Architectures in AWS Highly Available Database · I.e. every weekend, traffic triples so add 2 new replicas each Friday evening, terminate each Monday morning (saves $) Application

5

So let’s dig in...

In practice, this generally translates to some level of automatic failover that generally results in some level (however brief) of downtime.

Page 6: Architectures in AWS Highly Available Database · I.e. every weekend, traffic triples so add 2 new replicas each Friday evening, terminate each Monday morning (saves $) Application

6

What does it look like?

● Application servers sending R/W traffic to primary database

● Failover database in the background - unused

● Some synchronization mechanism between primary and failover

Page 7: Architectures in AWS Highly Available Database · I.e. every weekend, traffic triples so add 2 new replicas each Friday evening, terminate each Monday morning (saves $) Application

7

What does it look like?

● Primary database fails!!

Page 8: Architectures in AWS Highly Available Database · I.e. every weekend, traffic triples so add 2 new replicas each Friday evening, terminate each Monday morning (saves $) Application

8

What does it look like?

● R/W traffic is re-routed to the failover node

● No application changes are needed, but some level of retry logic is recommended

Page 9: Architectures in AWS Highly Available Database · I.e. every weekend, traffic triples so add 2 new replicas each Friday evening, terminate each Monday morning (saves $) Application

9

Some general concepts...

● Virtual Endpoint○ Application connects to an alias and not the physical servers○ This allows the endpoint to handle the routing to backend resources○ Some examples

■ Load balancer (physical or logical)■ DNS■ Floating IP address

Page 10: Architectures in AWS Highly Available Database · I.e. every weekend, traffic triples so add 2 new replicas each Friday evening, terminate each Monday morning (saves $) Application

10

Some general concepts...

● Virtual Endpoint○ Application connects to an alias and not the physical servers○ This allows the endpoint to handle the routing to backend resources○ Some examples

■ Load balancer (physical or logical)■ DNS■ Floating IP address

● Synchronization○ Data is kept in sync between primary and failover resources ○ Can be synchronous or asynchronous, but done automatically in real-time○ Some examples

■ MySQL Replication (async)■ Block level replication (sync)■ Clustering solution - i.e. Galera (sync)

Page 11: Architectures in AWS Highly Available Database · I.e. every weekend, traffic triples so add 2 new replicas each Friday evening, terminate each Monday morning (saves $) Application

11

Let’s take this to the cloud...

AWS Components at our disposal• Elastic Compute Cloud (EC2)- Self managed MySQL instances, generally built on Linux AMI- Highly customizable / flexible

Page 12: Architectures in AWS Highly Available Database · I.e. every weekend, traffic triples so add 2 new replicas each Friday evening, terminate each Monday morning (saves $) Application

12

Let’s take this to the cloud...

AWS Components at our disposal• Elastic Compute Cloud (EC2)- Self managed MySQL instances, generally built on Linux AMI- Highly customizable / flexible

• Relational Database Service (RDS)- Can run MySQL native or Aurora (or other engines such as SQL Server, Postgres, Oracle)- Less flexible, but fully managed (point-and-click snapshots, replicas, etc)

Page 13: Architectures in AWS Highly Available Database · I.e. every weekend, traffic triples so add 2 new replicas each Friday evening, terminate each Monday morning (saves $) Application

13

Let’s take this to the cloud...

AWS Components at our disposal• Elastic Compute Cloud (EC2)- Self managed MySQL instances, generally built on Linux AMI- Highly customizable / flexible

• Relational Database Service (RDS)- Can run MySQL native or Aurora (or other engines such as SQL Server, Postgres, Oracle)- Less flexible, but fully managed (point-and-click snapshots, replicas, etc)

• Miscellaneous Building Blocks- Elastic Load Balancer (ELB)- Route 53 (DNS failover strategies)- Elastic IP (virtual IP that can be assigned to EC2 instances)

Page 14: Architectures in AWS Highly Available Database · I.e. every weekend, traffic triples so add 2 new replicas each Friday evening, terminate each Monday morning (saves $) Application

14

So Many Choices!

● The options are endless!● Here are the solutions we’ll discuss

○ Percona XtraDB Cluster on EC2○ RDS for MySQL○ Amazon Aurora

Page 15: Architectures in AWS Highly Available Database · I.e. every weekend, traffic triples so add 2 new replicas each Friday evening, terminate each Monday morning (saves $) Application

Percona XtraDB Cluster

Page 16: Architectures in AWS Highly Available Database · I.e. every weekend, traffic triples so add 2 new replicas each Friday evening, terminate each Monday morning (saves $) Application

16

Percona XtraDB Cluster (PXC)

Percona XtraDB Cluster• Percona Server for MySQL• Galera Cluster (for replication)- Synchronous replication- Transaction based replication

• Transaction is verified locally• Certified as valid on other nodes before local commit

• Can read/write to any node in the cluster- Preferred architecture

• Write to single node, read from any node• Software load balancer for HA

Page 17: Architectures in AWS Highly Available Database · I.e. every weekend, traffic triples so add 2 new replicas each Friday evening, terminate each Monday morning (saves $) Application

17

PXC Use Cases

● Need the ability for multi-node writing○ Ideally architected to avoid collisions○ I.e. each nodes writes to dedicated schema/tables

Page 18: Architectures in AWS Highly Available Database · I.e. every weekend, traffic triples so add 2 new replicas each Friday evening, terminate each Monday morning (saves $) Application

18

PXC Use Cases

● Need the ability for multi-node writing○ Ideally architected to avoid collisions○ I.e. each nodes writes to dedicated schema/tables

● Require consistent reads○ Application requires additional read replicas○ Application cannot tolerate any replica lag

Page 19: Architectures in AWS Highly Available Database · I.e. every weekend, traffic triples so add 2 new replicas each Friday evening, terminate each Monday morning (saves $) Application

19

PXC Use Cases

● Need the ability for multi-node writing○ Ideally architected to avoid collisions○ I.e. each nodes writes to dedicated schema/tables

● Require consistent reads○ Application requires additional read replicas○ Application cannot tolerate any replica lag

● Maximum data durability○ Guarantee transactions are remotely received

Page 20: Architectures in AWS Highly Available Database · I.e. every weekend, traffic triples so add 2 new replicas each Friday evening, terminate each Monday morning (saves $) Application

20

PXC Use Cases

● Need the ability for multi-node writing○ Ideally architected to avoid collisions○ I.e. each nodes writes to dedicated schema/tables

● Require consistent reads○ Application requires additional read replicas○ Application cannot tolerate any replica lag

● Maximum data durability○ Guarantee transactions are remotely received

● Require cross-WAN (region) synchronous replication○ Will add latency to writes (business decision)

Page 21: Architectures in AWS Highly Available Database · I.e. every weekend, traffic triples so add 2 new replicas each Friday evening, terminate each Monday morning (saves $) Application

21

PXC in AWS

EC2 Based deployment● 3 base Linux AMI instances

Page 22: Architectures in AWS Highly Available Database · I.e. every weekend, traffic triples so add 2 new replicas each Friday evening, terminate each Monday morning (saves $) Application

22

PXC in AWS

EC2 Based deployment● 3 base Linux AMI instances● Nodes located in different AZs

○ Mitigates split-brain from AZ failure

Page 23: Architectures in AWS Highly Available Database · I.e. every weekend, traffic triples so add 2 new replicas each Friday evening, terminate each Monday morning (saves $) Application

23

PXC in AWS

EC2 Based deployment● 3 base Linux AMI instances● Nodes located in different AZs

○ Mitigates split-brain from AZ failure● Provisioned IOPs or local storage

○ I3 instances with local NVMe■ Note - relies on PXC for redundancy

○ GP2 not suitable for high throughput

Page 24: Architectures in AWS Highly Available Database · I.e. every weekend, traffic triples so add 2 new replicas each Friday evening, terminate each Monday morning (saves $) Application

24

PXC in AWS

EC2 Based deployment● 3 base Linux AMI instances● Nodes located in different AZs

○ Mitigates split-brain from AZ failure● Provisioned IOPs or local storage

○ I3 instances with local NVMe■ Note - relies on PXC for redundancy

○ GP2 not suitable for high throughput● Cross region supported, higher write latency

○ Same for multiple VPCs - supported, but with potential latency increase

Page 25: Architectures in AWS Highly Available Database · I.e. every weekend, traffic triples so add 2 new replicas each Friday evening, terminate each Monday morning (saves $) Application

25

So how do we route??

Enter ProxySQL…● Layer 7 software load balancer

Page 26: Architectures in AWS Highly Available Database · I.e. every weekend, traffic triples so add 2 new replicas each Friday evening, terminate each Monday morning (saves $) Application

26

So how do we route??

Enter ProxySQL…● Layer 7 software load balancer● Monitors backend nodes

○ Handles failed nodes transparently○ Configurable retries

Page 27: Architectures in AWS Highly Available Database · I.e. every weekend, traffic triples so add 2 new replicas each Friday evening, terminate each Monday morning (saves $) Application

27

So how do we route??

Enter ProxySQL…● Layer 7 software load balancer● Monitors backend nodes

○ Handles failed nodes transparently○ Configurable retries

● Potential for advanced routing○ Read/write splitting○ Table/schema based routing

Page 28: Architectures in AWS Highly Available Database · I.e. every weekend, traffic triples so add 2 new replicas each Friday evening, terminate each Monday morning (saves $) Application

28

So how do we route??

Enter ProxySQL…● Layer 7 software load balancer● Monitors backend nodes

○ Handles failed nodes transparently○ Configurable retries

● Potential for advanced routing○ Read/write splitting○ Table/schema based routing

● Run locally or own layer○ Local preferred for fewer app servers (< 10)○ Use ELB for HA when separate layer

Page 29: Architectures in AWS Highly Available Database · I.e. every weekend, traffic triples so add 2 new replicas each Friday evening, terminate each Monday morning (saves $) Application

29

And finally the full stack...

● App servers point to ProxySQL behind ELB

● ProxySQL configured with○ Writes pointed to single PXC node ○ Reads pointed to all three nodes in

the cluster● In the event of primary failure:

○ Write traffic shifted to another PXC node

○ Reads continue to be sent to all healthy nodes

Page 30: Architectures in AWS Highly Available Database · I.e. every weekend, traffic triples so add 2 new replicas each Friday evening, terminate each Monday morning (saves $) Application

RDS for MySQL / Amazon Aurora

Page 31: Architectures in AWS Highly Available Database · I.e. every weekend, traffic triples so add 2 new replicas each Friday evening, terminate each Monday morning (saves $) Application

31

Relational Database Service (RDS)

● Fully managed RDBMS, built on AWS components○ EC2 instances○ EBS volumes

Page 32: Architectures in AWS Highly Available Database · I.e. every weekend, traffic triples so add 2 new replicas each Friday evening, terminate each Monday morning (saves $) Application

32

Relational Database Service (RDS)

● Fully managed RDBMS, built on AWS components○ EC2 instances○ EBS volumes

● Operational features○ Snapshots (restoring from snapshots)○ Point-in-time recovery○ On-demand replicas

Page 33: Architectures in AWS Highly Available Database · I.e. every weekend, traffic triples so add 2 new replicas each Friday evening, terminate each Monday morning (saves $) Application

33

Relational Database Service (RDS)

● Fully managed RDBMS, built on AWS components○ EC2 instances○ EBS volumes

● Operational features○ Snapshots (restoring from snapshots)○ Point-in-time recovery○ On-demand replicas

● Availability features○ Multi A/Z with failover (MySQL)○ Automatic replica promotion (Aurora)○ Master DNS endpoint (Virtual endpoint)

Page 34: Architectures in AWS Highly Available Database · I.e. every weekend, traffic triples so add 2 new replicas each Friday evening, terminate each Monday morning (saves $) Application

34

RDS Use Cases

● Desire (or need) fully managed DBaaS○ Limited DBA staff○ Developer/Application focused DBA staff

Page 35: Architectures in AWS Highly Available Database · I.e. every weekend, traffic triples so add 2 new replicas each Friday evening, terminate each Monday morning (saves $) Application

35

RDS Use Cases

● Desire (or need) fully managed DBaaS○ Limited DBA staff○ Developer/Application focused DBA staff

● Variable (but predictable) traffic patterns○ Add/remove replicas quickly as needed○ I.e. every weekend, traffic triples so add 2 new replicas each Friday evening,

terminate each Monday morning (saves $)

Page 36: Architectures in AWS Highly Available Database · I.e. every weekend, traffic triples so add 2 new replicas each Friday evening, terminate each Monday morning (saves $) Application

36

RDS Use Cases

● Desire (or need) fully managed DBaaS○ Limited DBA staff○ Developer/Application focused DBA staff

● Variable (but predictable) traffic patterns○ Add/remove replicas quickly as needed○ I.e. every weekend, traffic triples so add 2 new replicas each Friday evening,

terminate each Monday morning (saves $)● Application tolerates stale reads

○ Can’t guarantee consistent reads, replication lag is possible

Page 37: Architectures in AWS Highly Available Database · I.e. every weekend, traffic triples so add 2 new replicas each Friday evening, terminate each Monday morning (saves $) Application

37

RDS Use Cases

● Desire (or need) fully managed DBaaS○ Limited DBA staff○ Developer/Application focused DBA staff

● Variable (but predictable) traffic patterns○ Add/remove replicas quickly as needed○ I.e. every weekend, traffic triples so add 2 new replicas each Friday evening,

terminate each Monday morning (saves $)● Application tolerates stale reads

○ Can’t guarantee consistent reads, replication lag is possible● Non-complex design required

○ Standard read/write 90/10 traffic pattern, single node writes, read pool

Page 38: Architectures in AWS Highly Available Database · I.e. every weekend, traffic triples so add 2 new replicas each Friday evening, terminate each Monday morning (saves $) Application

38

Aurora vs RDS for MySQL

Aurora● Shared backend storage

○ Replicated across multiple A/Z○ Workload optimized by role (write

master vs read replica)● Cluster endpoints provided

○ Writer endpoint (DNS failover)○ Reader endpoint (DNS

round-robin)● Hot-spare (replica) promoted to

master is accessible

Page 39: Architectures in AWS Highly Available Database · I.e. every weekend, traffic triples so add 2 new replicas each Friday evening, terminate each Monday morning (saves $) Application

39

Aurora vs RDS for MySQL

Aurora● Shared backend storage

○ Replicated across multiple A/Z○ Workload optimized by role (write

master vs read replica)● Cluster endpoints provided

○ Writer endpoint (DNS failover)○ Reader endpoint (DNS

round-robin)● Hot-spare (replica) promoted to

master is accessible

RDS for MySQL● Independent storage

○ Block level replication for standby○ MySQL async for replica

● Writer endpoint managed by DNS

○ DNS Failover to hot-spare in different A/Z

● Hot-spare is EBS block level replication and not accessible

Page 40: Architectures in AWS Highly Available Database · I.e. every weekend, traffic triples so add 2 new replicas each Friday evening, terminate each Monday morning (saves $) Application

40

High Availability in RDS / Aurora

● Endpoints are unique DNS entries ○ Provided by RDS○ Can be fronted by Route 53 CNAME as needed

Page 41: Architectures in AWS Highly Available Database · I.e. every weekend, traffic triples so add 2 new replicas each Friday evening, terminate each Monday morning (saves $) Application

41

High Availability in RDS / Aurora

● Endpoints are unique DNS entries ○ Provided by RDS○ Can be fronted by Route 53 CNAME as needed

● RDS for MySQL○ Writer DNS is failover only to hot-spare○ Hot-spare isn’t reachable under normal operation

■ I.e. can’t use as a slave for long queries, etc○ On master failure, potential for slight (~1 minute) data loss

Page 42: Architectures in AWS Highly Available Database · I.e. every weekend, traffic triples so add 2 new replicas each Friday evening, terminate each Monday morning (saves $) Application

42

High Availability in RDS / Aurora

● Endpoints are unique DNS entries ○ Provided by RDS○ Can be fronted by Route 53 CNAME as needed

● RDS for MySQL○ Writer DNS is failover only to hot-spare○ Hot-spare isn’t reachable under normal operation

■ I.e. can’t use as a slave for long queries, etc○ On master failure, potential for slight (~1 minute) data loss

● Aurora○ Writer DNS is failover to active replica○ Replica can be normal read-only replica in pool○ On master failure, near instant failover with no data loss

Page 43: Architectures in AWS Highly Available Database · I.e. every weekend, traffic triples so add 2 new replicas each Friday evening, terminate each Monday morning (saves $) Application

43

RDS for MySQL Architecture

● App servers point to ProxySQL behind ELB

● ProxySQL configured with○ Writes pointed DNS of primary○ Reads pointed to pool of read

replicas● In the event of primary failure,

write traffic shifted to RDS standby, read traffic still sent to replica pool

Page 44: Architectures in AWS Highly Available Database · I.e. every weekend, traffic triples so add 2 new replicas each Friday evening, terminate each Monday morning (saves $) Application

44

Aurora Architecture

● App servers point to ProxySQL behind ELB*

● ProxySQL configured with○ Writes pointed to primary○ Reads pointed to pool of read

replicas● In the event of primary failure,

write traffic shifted to replica automatically promoted, read traffic still sent to replica pool

* Note: Requires ProxySQL 2.0+ due to Aurora using innodb_read_only for replicas

Page 45: Architectures in AWS Highly Available Database · I.e. every weekend, traffic triples so add 2 new replicas each Friday evening, terminate each Monday morning (saves $) Application

45

So why do we add ProxySQL to the RDS stack when DNS already handles failover??

Page 46: Architectures in AWS Highly Available Database · I.e. every weekend, traffic triples so add 2 new replicas each Friday evening, terminate each Monday morning (saves $) Application

46

More than just failover...

ProxySQL adds flexibility to the stack far beyond DNS failover:● Rule based Read/Write splitting

○ Inspect queries, send plain selects to read-only hostgroup○ Simplifies application logic

Page 47: Architectures in AWS Highly Available Database · I.e. every weekend, traffic triples so add 2 new replicas each Friday evening, terminate each Monday morning (saves $) Application

47

More than just failover...

ProxySQL adds flexibility to the stack far beyond DNS failover:● Rule based Read/Write splitting

○ Inspect queries, send plain selects to read-only hostgroup○ Simplifies application logic

● Seamlessly handles query retry on failover - i.e.2018-04-12 21:37:16 MySQL_Session.cpp:2816:handler(): [ERROR] Detected a broken connection during query on (11,10.1.3.162,3306) , FD (Conn:50 , MyDS:50) : 2013, Lost connection to MySQL server during query

2018-04-12 21:37:16 MySQL_Session.cpp:2824:handler(): [WARNING] Retrying query.

Page 48: Architectures in AWS Highly Available Database · I.e. every weekend, traffic triples so add 2 new replicas each Friday evening, terminate each Monday morning (saves $) Application

48

More than just failover...

ProxySQL adds flexibility to the stack far beyond DNS failover:● Rule based Read/Write splitting

○ Inspect queries, send plain selects to read-only hostgroup○ Simplifies application logic

● Seamlessly handles query retry on failover - i.e.2018-04-12 21:37:16 MySQL_Session.cpp:2816:handler(): [ERROR] Detected a broken connection during query on (11,10.1.3.162,3306) , FD (Conn:50 , MyDS:50) : 2013, Lost connection to MySQL server during query

2018-04-12 21:37:16 MySQL_Session.cpp:2824:handler(): [WARNING] Retrying query.

● Route queries for sharding○ Inspect query for schema/table name, route to specific hostgroup

Page 49: Architectures in AWS Highly Available Database · I.e. every weekend, traffic triples so add 2 new replicas each Friday evening, terminate each Monday morning (saves $) Application

49

More than just failover...

● Dynamic rewrites / routing○ At runtime, transparently rewrite or route problem queries○ Leverages the query rules engine

Page 50: Architectures in AWS Highly Available Database · I.e. every weekend, traffic triples so add 2 new replicas each Friday evening, terminate each Monday morning (saves $) Application

50

More than just failover...

● Dynamic rewrites / routing○ At runtime, transparently rewrite or route problem queries○ Leverages the query rules engine

● Route queries for sharding○ Inspect query for schema/table name, route to specific hostgroup○ Can be done dynamically if new shards are added

Page 51: Architectures in AWS Highly Available Database · I.e. every weekend, traffic triples so add 2 new replicas each Friday evening, terminate each Monday morning (saves $) Application

51

More than just failover...

● Dynamic rewrites / routing○ At runtime, transparently rewrite or route problem queries○ Leverages the query rules engine

● Route queries for sharding○ Inspect query for schema/table name, route to specific hostgroup○ Can be done dynamically if new shards are added

● More intelligent Query Cache○ Selectively cache queries by fingerprint○ Query cache purging outside of mysql in background○ No application changes needed to call external cache (i.e. memcache)

Page 52: Architectures in AWS Highly Available Database · I.e. every weekend, traffic triples so add 2 new replicas each Friday evening, terminate each Monday morning (saves $) Application

52

For your reading pleasure...

RDS / Aurora Details:● https://aws.amazon.com/rds/aurora/faqs/● https://aws.amazon.com/rds/mysql/faqs/

PXC● https://www.percona.com/doc/percona-xtradb-cluster/LATEST/howtos/centos_howto.html● https://www.percona.com/doc/percona-xtradb-cluster/LATEST/howtos/proxysql.html

ProxySQL● https://github.com/sysown/proxysql/wiki● https://www.percona.com/blog/2018/04/03/how-to-implement-proxysql-with-aws-aurora/

Page 53: Architectures in AWS Highly Available Database · I.e. every weekend, traffic triples so add 2 new replicas each Friday evening, terminate each Monday morning (saves $) Application

53

Summary

● Multiple options for HA in AWS○ PXC on EC2 (consistent reads, data durability, multi-region support)*○ RDS / Aurora (fully managed, general workload/pattern, elastic)

● Need HA at each level○ Application, load balancer (routing), database○ Several AWS components available

● ProxySQL acts as intelligent (layer 7) load balancer○ Dynamic rules/routing○ Query / connection retry logic○ Run locally or behind ELB

Page 54: Architectures in AWS Highly Available Database · I.e. every weekend, traffic triples so add 2 new replicas each Friday evening, terminate each Monday morning (saves $) Application

54

Thank You Sponsors!!

Page 55: Architectures in AWS Highly Available Database · I.e. every weekend, traffic triples so add 2 new replicas each Friday evening, terminate each Monday morning (saves $) Application

55

Rate My Session

Page 56: Architectures in AWS Highly Available Database · I.e. every weekend, traffic triples so add 2 new replicas each Friday evening, terminate each Monday morning (saves $) Application

Thank You!