rightscale webinar: how rightscale architects its databases (for worldwide scale, ha and dr...

Download RightScale Webinar: How RightScale Architects Its Databases (for Worldwide Scale, HA and DR Scenarios)

Post on 20-Aug-2015




1 download

Embed Size (px)


  1. 1. How RightScale ArchitectsIts Databases(for Worldwide Scale, HA and DR Scenarios)January 30, 2013Watch the recording of this webinar#rightscale
  2. 2. 2#Your Panel TodayPresenting Rafael H. Saavedra, VP Engineering, RightScale Josep Blanquer, Chief Architect, RightScaleQ&A Jared Marcell, Account Manager, RightScale David Manriquez, Account Manager, RightScalePlease use the Questions window to ask questions any time! #rightscale
  3. 3. 3# MenuIntroData TaxonomyData Storage Design Scale, HA and DRConclusion#rightscale
  4. 4. 4#Intro: Expectations and scopeWhat this is and what is not IS a talk about: how RightScale has designed and implemented its backing datastores for a few of the most representative internal systems with the rationale behind it Is NOT a talk about RightScales overall architecture Nodes or hosts, its about Systems RightScales data modeling Note: Most of the design is implemented and in production but some of the most advanced things that are still in beta, or are still being worked on#rightscale
  5. 5. 5#Intro: Tools and Technologies RightScale uses a mix of RDBMS and NoSQL technologies: MySQL , Cassandra and S3 (for backups and archiving) Transactionality: MySQL: strong ACID properties Cassandra: no Atomicity, eventually Consistent, some Isolation, Durable Availability: MySQL: async replication. Master-SlaveN or Master-Master Cassandra: Distributed, master-less, highly-replicated (multi-DC) Sharding: MySQL: no explicit inter-node tools. (Sharding done by application) Cassandra: partitions data internally across nodes. #rightscale
  6. 6. 6#Glossary: Examples we will useMarketplace Assets Configuration data objects that are RightScripts user-generated, private or shared ServerTemplates Resource data that drives automation and Tags reporting Data used to communicate recent events andEvents news feeds to users Data that records actions and states of external Cloud Polling and Gateway API-linked services Data used to locate and transport messages Routing across instances and/or our services Infrastructure monitoring data recorded andMonitoring presented on behalf of users #rightscale
  7. 7. 7#Taxonomy of RightScales DataRepresentative systems with different data semantics:Global Objects Marketplace AssetsDashboard Objects Audits Tags Recent EventsCloud Polling DataRouting DataMonitoring/Syslog#rightscale
  8. 8. 8#Taxonomy of RightScales DataRepresentative systems with different data semantics:Global Objects Marketplace Assets Common across accounts: UsersDashboard Objects Account Plans Audits Settings MultiCloud Marketplace: Tags Published Assets Recent Events Sharing Groups Cloud Polling DataRouting DataMonitoring/Syslog#rightscale
  9. 9. 9#Taxonomy of RightScales DataRepresentative systems with different data semantics:Global Objects Marketplace AssetsDashboard Objects Audits Private to each account: Tags Deployments Imported assets Recent Events Alert Specifications Server InputsCloud Polling Data AuditRouting Data Tags User EventsMonitoring/Syslog #rightscale
  10. 10. 10#Taxonomy of RightScales DataRepresentative systems with different data semantics:Global Objects Marketplace AssetsDashboard Objects Audits Private to each account: Tags Cloud resource states (cache) Cloud credentials Recent EventsCloud Polling DataRouting DataMonitoring/Syslog#rightscale
  11. 11. 11#Taxonomy of RightScales DataRepresentative systems with different data semantics:Global Objects Marketplace AssetsDashboard Objects AuditsPrivate to each account: Tags Instance agents location Recent Events Core agents location Agent action registryCloud Polling Data Routing DataMonitoring/Syslog #rightscale
  12. 12. 12#Taxonomy of RightScales DataRepresentative systems with different data semantics:Global Objects Marketplace AssetsDashboard Objects Audits Tags Recent EventsPrivate to each account:Cloud Polling Data Collected metric data Collected syslog dataRouting Data Monitoring/Syslog #rightscale
  13. 13. 13#Taxonomy of RightScales DataWhich data do we need?Global ObjectsX-acct Data for all accounts Marketplace Assets Data for a single accountDashboard Objects Data shared between accounts Audits Tags Data scope and containmentAccount Recent EventsCloud Polling DataData required within scopeof a single accountRouting DataMonitoring/Syslog #rightscale
  14. 14. 14#Taxonomy of RightScales Data Who uses the data?Global Objects Users through the Dash/API Marketplace Assets Instances from the CloudUsersDashboard Objects Audits Data close to the Users Tags Recent Events Data PlacementCloud Polling DataInstancesRouting Data Data close to the CloudMonitoring/Syslog #rightscale
  15. 15. 15#Taxonomy of RightScales DataWho uses the data? Proximity to User vs. Cloud Which data do we need? Scope of data availableGlobal ObjectsX-acct Close to user Marketplace Assets Globally accessible dataUsersDashboard Objects Audits Close to user Tags Account-shardable dataAccount Recent EventsCloud Polling DataInstances Close to cloud resourcesRouting Data Account-shardable* dataMonitoring/Syslog #rightscale
  16. 16. 16#X-Account AccountUsersInstances#rightscale
  17. 17. 17#X-Account Why custom? More control Multiple sourcesUsersglobal Individual columnsCustom replication Apply transformations Smart re-sync features Global: MySQL ACID semantics Master-Slave replicationInstances#rightscale
  18. 18. 18#X-AccountAccountData archive: S3S3 Low read rate tags Globally accessibleUsersglobaldashaudit Other systems: Cassandraevents Simpler Key-Value access Great scalability Great replica control High write availabilityDashboard: MySQL Time-to-live expiration as cache ACID semantics Rows tagged by account Master-SlaveN replication Slave readsInstances Rows tagged by account #rightscale
  19. 19. 19#X-AccountAccountS3tags tagsUsersglobaldash auditdashaudit events eventsSo we can horizontally scale ourdashboard by partitioning objects based on account groups:ClustersInstances#rightscale
  20. 20. 20# AccountS3S3S3 tags tags tagsCluster 1 Cluster 3Clusterdashauditdash audit dash audit N eventeventevent sss Features:Users 1 cluster: N accounts 1 account: 1 home RightScale Accounts Migratable accounts Benefits: Great horizontal growthAccount Set 2 Better failure isolationAccount Set 1 Independent scale Load rebalancing Versionable code Differentiated service #rightscale
  21. 21. 21#X-AccountAccount S3tagstagsUsersglobaldash auditdash audit eventsevents pollingInstances monitor routing #rightscale
  22. 22. 22#X-AccountAccount S3tagsdash And partition our cloud objects based on the cloudUsersglobal audit the instances of an account run on: eventsIslands polling pollingInstances monitor monitor routing routing #rightscale
  23. 23. 23#Accountpollingpolling polling pollingPolling Clouds: MySQLMonitoring: CustomMaster-Slave replication Replicated filesIsland NIsland 1Island 2Can port to NoSQL easily monitor monitor monitor Backup to S3monitorMostly a resource cache Archive to S3But cloud partitionableroutingrouting routing routingInstancesFeatures: 1 instance: 1 home island 1 Island can serve N clouds Core Agents: global dataRouting: Cassandra Simpler Key-Value accessBenefits: Very high availability Close to cloud resources Services co-locatedServices co-locatedGreat scalability Good failure isolation Services co-located with resources with resourcesGreat replica control As good resourceswith as cloud Plus cross DC replication* Good scale: global replicasacross Cassandra DCs Cloud 1 Cloud 2Cloud N #rightscale
  24. 24. 24#Account S3 S3 S3 tagstags tagsCluster 1Cluster 3Users Clusterdashauditdash audit dash auditNeventeventeventsssDifferentif the cloud What Geographieswhere the cluster is deployed onpollingFails? pollingpollingInstances Island N Island 1 Island 2 monitormonitor monitorroutingroutingroutingDifferent Clouds#rightscale
  25. 25. 25#AccountSister Clusters S3 S3 S3 tags tagstagsCluster 1Cluster 3Users Clusteraudit Full replicadash dash audit dash auditNevent event events s s Features: Each master has an extra remote slave Each cluster in a pair is a DC replica of the otherspolling pollingpollingInstanceslocalring Island N Island 1 Island 2 monitor At Disaster Recovery time: monitormonitor Apps are told to start serving an extra shardrouting No need to provision more infrastructure to recoverroutingrouting(try to avoid since everybody is on the same boat) New resources can be allocated over time to helpoffload existing ones#rightscale
  26. 26. 26#Conclusions Shown that RightScale uses multiple database technologies RDBMS MySQL for the ACID semantics and queryability Using a Master to N-Slaves for RO scale, and quick failure recovery And ReadOnly Provisioning To increase RO availability and scale remote systems NoSQL: Cassandra for Availability and Scalability for higher Read/Write availability within a cluster For fully replicated regions across the globe (for Read/Write!) Shown how RightScale uses them in different techniques It partitions resource data into Islands based on cloud proximity Can achieve in-cloud polling,and keep monitoring/syslog data storage next to instances Can provide routing availability, colocated with instances for any world region It partitions core data into Clusters based on account groups To scale the core horizontally, and independently and achieve account isolation/differentiation Enhances fault isolation: Assigning accounts to Clusters deployed away their cloud resources It maintains cluster pairs (sister sites) To recover from full cloud region failures It doesnt require massive amounts of new resources to recover#rightscale
  27. 27. 27#Next Steps Contact RightScale (866) 720-02081. Learn: Building Scalable Applicationssales@rightscale.com in the Cloud W