how rightscale architects its own databases for worldwide scale, ha, and dr scenarios

25
How RightScale Architects its Databases (for World-wide Scale, HA and DR scenarios) Josep Blanquer Senior Systems Architect, RightScale

Upload: rightscale

Post on 12-Jul-2015

2.532 views

Category:

Technology


0 download

TRANSCRIPT

Page 1: How RightScale Architects Its Own Databases for Worldwide Scale, HA, and DR Scenarios

How RightScale Architects its Databases(for World-wide Scale, HA and DR scenarios)

Josep Blanquer

Senior Systems Architect, RightScale

Page 2: How RightScale Architects Its Own Databases for Worldwide Scale, HA, and DR Scenarios

#2#2

Talk with the Experts.

Menu

Intro

Data Taxonomy

Data Storage DesignScale, HA and DR

Conclusion

Page 3: How RightScale Architects Its Own Databases for Worldwide Scale, HA, and DR Scenarios

#3#3

Talk with the Experts.

Intro: Expectations and scope

What this is and what is not

• IS a talk about:

• how RightScale has designed and implemented its backing datastores

• …for a few of the most representative internal systems

• …with the rationale behind it

• Is NOT a talk about

• RightScale’s overall architecture

• Nodes or hosts, it’s about Systems

• RightScale’s data modeling

Note: Most of the design is implemented and in production but some of the

most advanced things that are still in beta, or we’re still being worked on

Page 4: How RightScale Architects Its Own Databases for Worldwide Scale, HA, and DR Scenarios

#4#4

Talk with the Experts.

Intro: Tools and Technologies

• RightScale uses a mix of RDBMS and NoSQL technologies:

• MySQL , Cassandra and S3 (for backups and archiving)

• Transactionality:

• MySQL: strong ACID properties

• Cassandra: no Atomicity, eventually Consistent, some Isolation, Durable

• Availability:

• MySQL: async replication. Master-SlaveN or Master-Master

• Cassandra: Distributed, master-less, highly-replicated (multi-DC)

• Sharding:

• MySQL: no explicit inter-node tools. (Sharding done by application)

• Cassandra: partitions data internally across nodes.

Page 5: How RightScale Architects Its Own Databases for Worldwide Scale, HA, and DR Scenarios

#5#5

Talk with the Experts.

Taxonomy of RightScale’s Data

Representative systems with different data semantics:

Global Objects

Marketplace Assets

Dashboard Objects

Audits

Tags

Recent Events

Cloud Polling Data

Routing Data

Monitoring/Syslog

Page 6: How RightScale Architects Its Own Databases for Worldwide Scale, HA, and DR Scenarios

#6#6

Talk with the Experts.

Taxonomy of RightScale’s Data

Representative systems with different data semantics:

Global Objects

Marketplace Assets

Dashboard Objects

Audits

Tags

Recent Events

Cloud Polling Data

Routing Data

Monitoring/Syslog

Common across accounts:

Users

Plans

Settings

MultiCloud Marketplace: Published Assets

Sharing Groups

Page 7: How RightScale Architects Its Own Databases for Worldwide Scale, HA, and DR Scenarios

#7#7

Talk with the Experts.

Taxonomy of RightScale’s Data

Representative systems with different data semantics:

Global Objects

Marketplace Assets

Dashboard Objects

Audits

Tags

Recent Events

Cloud Polling Data

Routing Data

Monitoring/Syslog

Private to each account:

Deployments

Imported assets

Alert Specifications

Server Inputs Audit

Tags

User Events

Page 8: How RightScale Architects Its Own Databases for Worldwide Scale, HA, and DR Scenarios

#8#8

Talk with the Experts.

Taxonomy of RightScale’s Data

Representative systems with different data semantics:

Global Objects

Marketplace Assets

Dashboard Objects

Audits

Tags

Recent Events

Cloud Polling Data

Routing Data

Monitoring/Syslog

Private to each account:

Cloud resource states (cache)

Cloud credentials

Page 9: How RightScale Architects Its Own Databases for Worldwide Scale, HA, and DR Scenarios

#9#9

Talk with the Experts.

Taxonomy of RightScale’s Data

Representative systems with different data semantics:

Global Objects

Marketplace Assets

Dashboard Objects

Audits

Tags

Recent Events

Cloud Polling Data

Routing Data

Monitoring/Syslog

Private to each account:

Instance agents location

Core agents location

Agent action registry

Page 10: How RightScale Architects Its Own Databases for Worldwide Scale, HA, and DR Scenarios

#10#10

Talk with the Experts.

Taxonomy of RightScale’s Data

Representative systems with different data semantics:

Global Objects

Marketplace Assets

Dashboard Objects

Audits

Tags

Recent Events

Cloud Polling Data

Routing Data

Monitoring/Syslog

Private to each account:

Collected metric data

Collected syslog data

Page 11: How RightScale Architects Its Own Databases for Worldwide Scale, HA, and DR Scenarios

#11#11

Talk with the Experts.

Use

rs

Taxonomy of RightScale’s Data

Insta

nce

s

Global Objects

Marketplace Assets

Dashboard Objects

Audits

Tags

Recent Events

Cloud Polling Data

Routing Data

Monitoring/Syslog

Who uses the data?• Users through the Dash/API

• Instances from the Cloud

Data close to the Users

Data close to the Cloud

Data Placement

Page 12: How RightScale Architects Its Own Databases for Worldwide Scale, HA, and DR Scenarios

#12#12

Talk with the Experts.

Taxonomy of RightScale’s DataX

-acct

Acco

un

t

Global Objects

Marketplace Assets

Dashboard Objects

Audits

Tags

Recent Events

Cloud Polling Data

Routing Data

Monitoring/Syslog

Which data do we need?• Data for all accounts

• Data for a single account

Data shared between accounts

Data required within scope

of a single account

Data scope and containment

Page 13: How RightScale Architects Its Own Databases for Worldwide Scale, HA, and DR Scenarios

#13#13

Talk with the Experts.

Use

rs

Taxonomy of RightScale’s Data

Insta

nce

s

X-a

cct

Acco

un

t

Global Objects

Marketplace Assets

Dashboard Objects

Audits

Tags

Recent Events

Cloud Polling Data

Routing Data

Monitoring/Syslog

Who uses the data? Proximity to User vs. Cloud

Which data do we need? Scope of data available

Close to cloud resources

Account-shardable* data

Close to user

Account-shardable data

Close to user

Globally accessible data

Page 14: How RightScale Architects Its Own Databases for Worldwide Scale, HA, and DR Scenarios

#14

Talk with the Experts.

Use

rsIn

sta

nce

s

AccountX-Account

Page 15: How RightScale Architects Its Own Databases for Worldwide Scale, HA, and DR Scenarios

#15

Talk with the Experts.

Use

rsIn

sta

nce

s

global

X-Account

Custom replication

Why custom? More control• Multiple sources

• Individual columns

• Apply transformations

• Smart re-sync features

Global: MySQL• ACID semantics

• Master-Slave replication

Page 16: How RightScale Architects Its Own Databases for Worldwide Scale, HA, and DR Scenarios

#16

Talk with the Experts.

Use

rsIn

sta

nce

s

Account

global dash

S3

events

tags

audit

X-Account

Dashboard: MySQL• ACID semantics

• Master-SlaveN replication

• Slave reads

• Rows tagged by account

Other systems: Cassandra• Simpler Key-Value access

• Great scalability

• Great replica control

• High write availability

• Time-to-live expiration as cache

• Rows tagged by account

Data archive: S3• Low read rate

• Globally accessible

Page 17: How RightScale Architects Its Own Databases for Worldwide Scale, HA, and DR Scenarios

#17

Talk with the Experts.

Use

rsIn

sta

nce

s

Account

global dash

S3

events

tags

audit

X-Account

dash

events

tags

audit

So we can horizontally scale our

dashboard by partitioning objects

based on account groups:

Clusters

Page 18: How RightScale Architects Its Own Databases for Worldwide Scale, HA, and DR Scenarios

#18

Talk with the Experts.

Use

rs

AccountC

luste

r 1

dash

S3

events

tags

audit

Clu

ste

r N

dash

S3

events

tags

audit

Account Set 1 Account Set 2

RightScale Accounts

Clu

ste

r 3

dash

S3

events

tags

audit …

Features:• 1 cluster: N accounts

• 1 account: 1 home

• Migratable accounts

Benefits:• Great horizontal growth

• Better failure isolation

• Independent scale

• Load rebalancing

• Versionable code

• Differentiated service

Page 19: How RightScale Architects Its Own Databases for Worldwide Scale, HA, and DR Scenarios

#19

Talk with the Experts.

dash

events

tags

audit

Use

rsIn

sta

nce

s

Account

global dash

S3

events

tags

audit

routing

gateway

monitor

X-Account

Page 20: How RightScale Architects Its Own Databases for Worldwide Scale, HA, and DR Scenarios

#20

Talk with the Experts.

routing

gateway

monitor

routing

gateway

monitor

Use

rsIn

sta

nce

s

Account

global dash

S3

events

tags

audit

X-Account

And partition our cloud objects based on the cloud

the instances of an account run on:

Islands

Page 21: How RightScale Architects Its Own Databases for Worldwide Scale, HA, and DR Scenarios

#21

Talk with the Experts.

Cloud 1 Cloud 2 Cloud N

Account

Insta

nce

s

Services co-located

with resourcesServices co-located

with resources

Services co-located

with resources

Features:• 1 instance: 1 home island

• 1 Island can serve N clouds

• Core Agents: global data

Benefits:• Close to cloud resources

• Good failure isolation

• As good as cloud

• Good scale: global replicas

across cassandra DCs

routing

gateway

monitor

Isla

nd 1

Isla

nd 2

Isla

nd N

routing

gateway

monitor

routing

gateway

monitor

routing

gateway

monitor

routing

gateway

monitor

routing

gateway

monitor

Isla

nd 1

Isla

nd 2

Isla

nd N

Gateway: MySQL• Master-Slave replication

• Can port to NoSQL easily

• Mostly a resource cache

• But cloud partitionable

Monitoring: Custom• Replicated files

• Backup to S3

• Archive to S3

Routing: Cassandra• Simpler Key-Value access

• Very high availability

• Great scalability

• Great replica control

• Plus cross DC replication*

Page 22: How RightScale Architects Its Own Databases for Worldwide Scale, HA, and DR Scenarios

#22

Talk with the Experts.

Use

rs

AccountC

luste

r 1

dash

S3

events

tags

audit

Clu

ste

r N

dash

S3

events

tags

audit

Clu

ste

r 3

dash

S3

events

tags

audit …

routing

gateway

monitor

routing

gateway

monitor

routing

gateway

monitor

Isla

nd 1

Isla

nd 2

Isla

nd N

Insta

nce

s

Different Geographies

Different Clouds

What if the cloud

where the cluster

is deployed on…

Fails?

Page 23: How RightScale Architects Its Own Databases for Worldwide Scale, HA, and DR Scenarios

#23

Talk with the Experts.

Use

rs

AccountC

luste

r 1

dash

S3

events

tags

audit

Clu

ste

r N

dash

S3

events

tags

audit

Clu

ste

r 3

dash

S3

events

tags

audit …

routing

gateway

monitor

routing

gateway

monitor

routing

gateway

monitor

Isla

nd 1

Isla

nd 2

Isla

nd N

Insta

nce

s

Sister Clusters

Full replica

Features:• Each master has an extra remote slave

• Each cluster in a pair is a DC replica of the other’s

localring

At Disaster Recovery time:• Apps are told to start serving an extra shard

• No need to provision more infrastructure to recover

(try to avoid since everybody is on the same boat)

• New resources can be allocated over time to help

offload existing ones

Page 24: How RightScale Architects Its Own Databases for Worldwide Scale, HA, and DR Scenarios

#24#24

Talk with the Experts.

Conclusions

• Shown that RightScale uses multiple database technologies:

• RDBMS – MySQL for the ACID semantics and ‘queryability’• Using a Master to N-Slaves for RO scale, and quick failure recovery

• And ReadOnly Provisioning – To increase RO availability and scale remote systems

• NoSQL: Cassandra for Availability and Scalability• for higher Read/Write availability within a cluster

• For fully replicated regions across the globe (for Read/Write!)

• Shown how RightScale uses them in different techniques

• It partitions resource data into Islands based on cloud proximity• Can achieve in-cloud polling,and keep monitoring/syslog data storage next to instances

• Can provide routing availability, colocated with instances for any world region

• It partitions core data into Clusters based on account groups• To scale the core horizontally, and independently and achieve account isolation/differentiation

• Enhances fault isolation: Assigning accounts to Clusters deployed away their cloud resources

• It maintains cluster pairs (sister sites)• To recover from full cloud region failures

• It doesn’t require massive amounts of new resources to recover

Page 25: How RightScale Architects Its Own Databases for Worldwide Scale, HA, and DR Scenarios

Talk with the Experts.

Questions?