Download - Cloud Databases in Research and Practice
Felix Gessert(29.04.2014)
Slides: baqend.com/nosql.pdf
About me
PhD student (database group university of hamburg)
About me
PhD student (database group university of hamburg)
Research Project for PhD
About me
PhD student (database group university of hamburg)
Research Project for PhD
Cloud Database Startup
Outline
• Categories of Cloud Databases
• Properties
What are Cloud Databases?
Cloud Databases in the wild
Research Perspectives
Wrap-up andliterature
2003: GFS (Google)
2006: BigTable (Google)
2007: Dynamo (Amazon)
2007: S3 (Amazon)
2008: SimpleDB (Amazon)
2010: SQL Azure
2011: Parse
2012: DynamoDB (Amazon)
2013: Redshift (Amazon)
Context: What kinds of Cloud databases are there?
DBaaS
Infrastructure-as-a-Service
Platform-as-a-Service
…Database-as-a-ServiceCloud SQL
Amazon RDS
SQL Azure
Cloudant
MongoHQ
Parse
Orestes
Google F1
DynamoDB BigQuery
EMR GCS
S3
DBaaS
Infrastructure-as-a-Service
Platform-as-a-Service
…Database-as-a-ServiceCloud SQL
Amazon RDS
SQL Azure
Cloudant
MongoHQ
Parse
Orestes
Google F1
DynamoDB
ManagedRDBMSs
BigQuery
EMR GCS
S3
DBaaS
Infrastructure-as-a-Service
Platform-as-a-Service
…Database-as-a-ServiceCloud SQL
Amazon RDS
SQL Azure
Cloudant
MongoHQ
Parse
Orestes
Google F1
DynamoDB
ManagedRDBMSs
Cloud-Deploymentof DBMSs
BigQuery
EMR GCS
S3
DBaaS
Infrastructure-as-a-Service
Platform-as-a-Service
…Database-as-a-ServiceCloud SQL
Amazon RDS
SQL Azure
Cloudant
MongoHQ
Parse
Orestes
Google F1
DynamoDB
ManagedNoSQL DBs
ManagedRDBMSs
Cloud-Deploymentof DBMSs
BigQuery
EMR GCS
S3
DBaaS
Infrastructure-as-a-Service
Platform-as-a-Service
…Database-as-a-ServiceCloud SQL
Amazon RDS
SQL Azure
Cloudant
MongoHQ
Parse
Orestes
Google F1
DynamoDB
ManagedNoSQL DBs
ManagedRDBMSs
Cloud-Deploymentof DBMSs
Cloud-onlyDBaaS-Systems
BigQuery
EMR GCS
S3
DBaaS
Infrastructure-as-a-Service
Platform-as-a-Service
…Database-as-a-ServiceCloud SQL
Amazon RDS
SQL Azure
Cloudant
MongoHQ
Parse
Orestes
Google F1
DynamoDB
ManagedNoSQL DBs
ManagedRDBMSs
Cloud-Deploymentof DBMSs
Cloud-onlyDBaaS-Systems
BigQuery
EMR
Analytics-as-a-Service
GCS
S3
DBaaS
Infrastructure-as-a-Service
Platform-as-a-Service
…Database-as-a-ServiceCloud SQL
Amazon RDS
SQL Azure
Cloudant
MongoHQ
Parse
Orestes
Google F1
DynamoDB
ManagedNoSQL DBs
ManagedRDBMSs
Cloud-Deploymentof DBMSs
Cloud-onlyDBaaS-Systems
BigQuery
EMR
Analytics-as-a-Service
GCS
S3
ObjectStores
DBaaS
Infrastructure-as-a-Service
Platform-as-a-Service
…Database-as-a-ServiceCloud SQL
Amazon RDS
SQL Azure
Cloudant
MongoHQ
Parse
Orestes
Google F1
DynamoDB
ManagedNoSQL DBs
ManagedRDBMSs
Storage APIs
Cloud-Deploymentof DBMSs
Cloud-onlyDBaaS-Systems
BigQuery
EMR
Analytics-as-a-Service
GCS
S3
ObjectStores
DBaaS
Infrastructure-as-a-Service
Platform-as-a-Service
…Database-as-a-ServiceCloud SQL
Amazon RDS
SQL Azure
Cloudant
MongoHQ
Parse
Orestes
Google F1
DynamoDB
ManagedNoSQL DBs
ManagedRDBMSs
Backend-as-a-Service
Storage APIs
Cloud-Deploymentof DBMSs
Cloud-onlyDBaaS-Systems
BigQuery
EMR
Analytics-as-a-Service
GCS
S3
ObjectStores
Cloud-deployeddatabase
Data-Analytics-as-a-Service
Database-as-a-Service
SQL
ManagedRDBMS
ManagedDWH
NoSQL
ManagedNoSQL DB
Backend-as-a-Service
Proprietary or
Polyglot
Files Object Store
Cloud Databases
Cloud-deployeddatabase
Data-Analytics-as-a-Service
Database-as-a-Service
SQL
ManagedRDBMS
ManagedDWH
NoSQL
ManagedNoSQL DB
Backend-as-a-Service
Proprietary or
Polyglot
Files Object Store
Cloud Databases
Standard Interface
Cloud-deployeddatabase
Data-Analytics-as-a-Service
Database-as-a-Service
SQL
ManagedRDBMS
ManagedDWH
NoSQL
ManagedNoSQL DB
Backend-as-a-Service
Proprietary or
Polyglot
Files Object Store
Cloud Databases
Standard Interface
Vendor-specific
Interface
Application Architecture <-> Cloud Database Category
Architectures
Applications
Data Warehouse
Operative Database
Reporting Data MiningAnalytics
Data
Manag
emen
tData
Analy
tics
Application Architecture <-> Cloud Database Category
Architectures
Applications
Data Warehouse
Operative Database
Reporting Data MiningAnalytics
Data
Manag
emen
tData
Analy
tics
DBaaS
Application Architecture <-> Cloud Database Category
Architectures
Applications
Data Warehouse
Operative Database
Reporting Data MiningAnalytics
Data
Manag
emen
tData
Analy
tics
DBaaSProbaby not.
Cloud-Deployed Database
IaaS-Cloud
Cloud-Deployed Database
IaaS-Cloud
Cloud-deploy yourfavourite database system
Cloud-Deployed Database
IaaS-Cloud
Cloud-deploy yourfavourite database system
Does not solve:Provisioning, Backups, Security, Scaling, Elasticity, Performance Tuning, Failover, Replication, ...
Managed RDBMS/DWH/NoSQL DB
IaaS-Cloud
DBaaS-Provider
Managed RDBMS/DWH/NoSQL DB
IaaS-Cloud
RDBMS DWH NoSQL DB
DBaaS-Provider
Provisioning, Backups, Security, Scaling, Elasticity, Performance Tuning, Failover, Replication, ...
Managed RDBMS/DWH/NoSQL DB
IaaS-Cloud
RDBMS DWH NoSQL DB
DBaaS-Provider
Managed RDBMS/DWH/NoSQL DB
IaaS-Cloud
RDBMS DWH NoSQL DB
DBaaS-Provider
SQL Azure
Cloud SQL
RD
BM
S
Managed RDBMS/DWH/NoSQL DB
IaaS-Cloud
RDBMS DWH NoSQL DB
DBaaS-Provider
SQL Azure
Cloud SQL
RD
BM
SN
oSQ
LD
B
Managed RDBMS/DWH/NoSQL DB
IaaS-Cloud
RDBMS DWH NoSQL DB
DBaaS-Provider
Amazon Redshift
SQL Azure
Cloud SQL
RD
BM
SN
oSQ
LD
BD
WH
Proprietary DB/Object Store
Cloud
Black-Box Databaseor file system
Managed byCloud Provider
Provid
er‘sA
PI
Proprietary DB/Object Store
Cloud
Black-Box Databaseor file system
Managed byCloud Provider
Provid
er‘sA
PI
Amazon
SimpleDB
Google Cloud
DatastoreAzure Tables
Database.com
BigTable, Megastore, Spanner, F1, Dynamo,
PNuts, Relational Cloud, …
Prop
rietaryD
B
Proprietary DB/Object Store
Cloud
Black-Box Databaseor file system
Managed byCloud Provider
Provid
er‘sA
PI
Amazon
SimpleDB
Google Cloud
Storage
Azure Blob
Storage
Google Cloud
DatastoreAzure Tables
Openstack
Swift
Database.com
BigTable, Megastore, Spanner, F1, Dynamo,
PNuts, Relational Cloud, …
Prop
rietaryD
BO
bject
Store
Backend-as-a-Service, Polyglot PersistenceService
IaaS-Cloud
Backend API
Service-Layer
Data API
Backend-as-a-Service, Polyglot PersistenceService
IaaS-Cloud
Backend API
Service-Layer
Data API
Authentication, Users, Validation,etc.
Maps to (different) databases
Backend-as-a-Service, Polyglot PersistenceService
IaaS-Cloud
Backend API
Service-Layer
Data API
Realtim
e B
aaS
Backend-as-a-Service, Polyglot PersistenceService
IaaS-Cloud
Backend API
Service-Layer
Data API
Realtim
e B
aaS
BaaS
AppCelerator
Cloud
Analytics-as-a-Service
Cloud
Analytics Cluster
Provisioning, Data Ingest
Analytics-as-a-Service
Cloud
Analytics Cluster
Provisioning, Data Ingest
Azure
HDInsight
Amazon Elastic
MapReduce
Had
oo
p
Analytics-as-a-Service
Cloud
Analytics Cluster
Provisioning, Data Ingest
Azure
HDInsight
BigQuery
Prediction API
Amazon Elastic
MapReduce
Had
oo
pC
usto
m
DBaaS: Common Aspects
#1 Metric: Total Cost
Daniela Florescu and Donald Kossmann “Rethinking cost and performance of database systems”, SIGMOD Rec. 2009.
DBaaS: Common Aspects
#1 Metric: Total Cost
Maximum utilization ofavailable hardware:
Multi-Tenancy
Daniela Florescu and Donald Kossmann “Rethinking cost and performance of database systems”, SIGMOD Rec. 2009.
DBaaS: Common Aspects
Multi-Tenancy - four common approaches:
T. Kiefer, W. Lehner “Private table database virtualization for dbaas” UCC, 2011
Private OS Private Process/DB Private Schema Shared Schema
DBaaS: Common Aspects
Multi-Tenancy - four common approaches:
T. Kiefer, W. Lehner “Private table database virtualization for dbaas” UCC, 2011
Private OS
VM
Hardware Resources
Database Process
Database
Schema
Private Process/DB Private Schema Shared Schema
e.g. Amazon RDS
DBaaS: Common Aspects
Multi-Tenancy - four common approaches:
T. Kiefer, W. Lehner “Private table database virtualization for dbaas” UCC, 2011
Private OS
VM
Hardware Resources
Database Process
Database
Schema
Private Process/DB Private Schema
VM
Hardware Resources
Database Process
Database
Schema
Shared Schema
e.g. Amazon RDS e.g. MongoHQ
DBaaS: Common Aspects
Multi-Tenancy - four common approaches:
T. Kiefer, W. Lehner “Private table database virtualization for dbaas” UCC, 2011
Private OS
VM
Hardware Resources
Database Process
Database
Schema
Private Process/DB Private Schema
VM
Hardware Resources
Database Process
Database
Schema
VM
Hardware Resources
Database Process
Database
Schema
Shared Schema
e.g. Amazon RDS e.g. MongoHQ e.g. Google DataStore
DBaaS: Common Aspects
Multi-Tenancy - four common approaches:
T. Kiefer, W. Lehner “Private table database virtualization for dbaas” UCC, 2011
Private OS
VM
Hardware Resources
Database Process
Database
Schema
Private Process/DB Private Schema
VM
Hardware Resources
Database Process
Database
Schema
VM
Hardware Resources
Database Process
Database
Schema
Shared Schema
VM
Hardware Resources
Database Process
Database
Schema
Virtual Schema
e.g. Amazon RDS e.g. MongoHQ e.g. Google DataStore Most SaaS Apps
DBaaS: Common Aspects
Multi-Tenancy - four common approaches:
W. Lehner, U. Sattler “Web-scale Data Management for the Cloud” Springer, 2013
Private OS
Private Process/DB
Private Schema
Shared Schema
App.indep.
IsolationRessource
Util.Maintenance,Provisioning
Billing Models:
DBaaS: Common Aspects
Usage
Account
Billing Models:
DBaaS: Common Aspects
Usage
Account
Pay-per-useParameters: Network, Bandwidth, Storage, CPU, Requests, etc.Payment: Pre-Paid, Post-PaidVariants: On-Demand, Auction, Reserved
e.g. DynamoDB
Billing Models:
DBaaS: Common Aspects
Usage
Account
End ofmonth
Plan-basedParameters: Allocated Plan (e.g. 2 instances + X GB storage)
e.g. MongoHQ
Billing Models:
DBaaS: Common Aspects
Usage
Account
End ofmonth
Plan-basedParameters: Allocated Plan (e.g. 2 instances + X GB storage)
Free Tier: free plan or free initial account credit
e.g. MongoHQ
DBaaS: Common Aspects
Database-a-
a-Service
Authentication
Authorization
API
DBaaS: Common Aspects
Database-a-
a-Service
Authentication
Authorization
API
Authenticate
DBaaS: Common Aspects
Internal Schemes External IdentityProvider
Federated Identity (SSO)
e.g. Amazon IAM e.g. OpenID e.g. SAML
Database-a-
a-Service
Authentication
Authorization
API
Authenticate
DBaaS: Common Aspects
Internal Schemes External IdentityProvider
Federated Identity (SSO)
e.g. Amazon IAM e.g. OpenID e.g. SAML
Used extensively
Database-a-
a-Service
Authentication
Authorization
API
Authenticate
DBaaS: Common Aspects
Internal Schemes External IdentityProvider
Federated Identity (SSO)
e.g. Amazon IAM e.g. OpenID e.g. SAML
Used extensively
Database-a-
a-Service
Authentication
Authorization
API
Authenticate
Token
DBaaS: Common Aspects
Internal Schemes External IdentityProvider
Federated Identity (SSO)
e.g. Amazon IAM e.g. OpenID e.g. SAML
Used extensively
Database-a-
a-Service
Authentication
Authorization
API
Authenticate
Token
Authenticated Request
DBaaS: Common Aspects
Internal Schemes External IdentityProvider
Federated Identity (SSO)
e.g. Amazon IAM e.g. OpenID e.g. SAML
Used extensively
User-based Access Control
Role-based Access Control
Policies
e.g. Amazon S3 ACLs e.g. Amazon IAM e.g. XACML
Database-a-
a-Service
Authentication
Authorization
API
Authenticate
Token
Authenticated Request
DBaaS: Common Aspects
Internal Schemes External IdentityProvider
Federated Identity (SSO)
e.g. Amazon IAM e.g. OpenID e.g. SAML
Used extensively
User-based Access Control
Role-based Access Control
Policies
e.g. Amazon S3 ACLs e.g. Amazon IAM e.g. XACML
Database-a-
a-Service
Authentication
Authorization
API
Authenticate
Token
Authenticated Request
Response
DBaaS: Common Aspects
Internal Schemes External IdentityProvider
Federated Identity (SSO)
e.g. Amazon IAM e.g. OpenID e.g. SAML
Used extensively
User-based Access Control
Role-based Access Control
Policies
e.g. Amazon S3 ACLs e.g. Amazon IAM e.g. XACML
Database-a-
a-Service
Authentication
Authorization
API
Authenticate
Token
Authenticated Request
Response
DBaaS: Common Aspects
Internal Schemes External IdentityProvider
Federated Identity (SSO)
e.g. Amazon IAM e.g. OpenID e.g. SAML
Used extensively
User-based Access Control
Role-based Access Control
Policies
e.g. Amazon S3 ACLs e.g. Amazon IAM e.g. XACML
Database-a-
a-Service
Authentication
Authorization
API
Authenticate
Token
Authenticated Request
Response
Federated ACLs
M. Decat, B. Lagaisse, et al. “Toward efficient and confidentiality-aware federation of access control policies “, DOA Trusted Cloud 2013
• Imagine ACL: „Patient data can only beaccessed by treating physician“
• Idea: decompose policy and evaluate parts neardata owner
Service Level Agreements
DBaaS: Common Aspects
SLA
Service Level Agreements
DBaaS: Common Aspects
SLA
Legal Part1. Fees2. Penalties
Technical Part1. SLO2. SLO3. SLO
Service Level Agreements
DBaaS: Common Aspects
SLA
Legal Part1. Fees2. Penalties
Technical Part1. SLO2. SLO3. SLO
Service Level Objectives:• Availability• Durability• Consistency/Staleness• Query Response Time
SLAs – achieved through Workload Management
DBaaS: Common Aspects
W. Lehner, U. Sattler “Web-scale Data Management for the Cloud” Springer, 2013
SLAs – achieved through Workload Management
DBaaS: Common Aspects
W. Lehner, U. Sattler “Web-scale Data Management for the Cloud” Springer, 2013
SLAs – achieved through Workload Management
DBaaS: Common Aspects
W. Lehner, U. Sattler “Web-scale Data Management for the Cloud” Springer, 2013
SLAs – achieved through Workload Management
DBaaS: Common Aspects
W. Lehner, U. Sattler “Web-scale Data Management for the Cloud” Springer, 2013
SLAs – achieved through Workload Management
DBaaS: Common Aspects
W. Lehner, U. Sattler “Web-scale Data Management for the Cloud” Springer, 2013
Maximize:
SLAs – achieved through Workload Management
DBaaS: Common Aspects
W. Lehner, U. Sattler “Web-scale Data Management for the Cloud” Springer, 2013
SLAs – achieved through Workload Management
DBaaS: Common Aspects
W. Lehner, U. Sattler “Web-scale Data Management for the Cloud” Springer, 2013
QOS for NoSQL DBs
Y. Zhu et al. “Scheduling with Freshness and Performance Guarantees for Web Applications in the Cloud“, CRPIT
Old: Workload management in RDBMs (DB2 andOracle)
New:Use well-known scheduling algorithms for queriesin replicated DBs
Resource Provisionig
Goal: Resources ⇔ SLAs
DBaaS: Common Aspects
T. Lorido-Botran, J. Miguel-Alonso et al.: “Auto-scaling Techniques forElastic Applications in Cloud Environments”. Technical Report, 2013
Resources
Time
Resource Provisionig
Goal: Resources ⇔ SLAs
DBaaS: Common Aspects
T. Lorido-Botran, J. Miguel-Alonso et al.: “Auto-scaling Techniques forElastic Applications in Cloud Environments”. Technical Report, 2013
Resources
Time
Expected Load
Resource Provisionig
Goal: Resources ⇔ SLAs
DBaaS: Common Aspects
T. Lorido-Botran, J. Miguel-Alonso et al.: “Auto-scaling Techniques forElastic Applications in Cloud Environments”. Technical Report, 2013
Resources
Time
Expected Load
Provisioned Resources:• #No of Shard- or Replica
servers• Computing, Storage,
Network Capacities
Resource Provisionig
Goal: Resources ⇔ SLAs
DBaaS: Common Aspects
T. Lorido-Botran, J. Miguel-Alonso et al.: “Auto-scaling Techniques forElastic Applications in Cloud Environments”. Technical Report, 2013
Resources
Time
Actual Load
Resource Provisionig
Goal: Resources ⇔ SLAs
DBaaS: Common Aspects
T. Lorido-Botran, J. Miguel-Alonso et al.: “Auto-scaling Techniques forElastic Applications in Cloud Environments”. Technical Report, 2013
Resources
Time
Actual Load
Overprovisioning:• SLAs met• Excess Capacities
Resource Provisionig
Goal: Resources ⇔ SLAs
DBaaS: Common Aspects
T. Lorido-Botran, J. Miguel-Alonso et al.: “Auto-scaling Techniques forElastic Applications in Cloud Environments”. Technical Report, 2013
Resources
Time
Actual Load
Overprovisioning:• SLAs met• Excess Capacities
Underprovisioning:• SLAs violated• Usage maximized
Resource Provisionig
Goal: Resources ⇔ SLAs
DBaaS: Common Aspects
T. Lorido-Botran, J. Miguel-Alonso et al.: “Auto-scaling Techniques forElastic Applications in Cloud Environments”. Technical Report, 2013
Resources
Time
Actual Load
Overprovisioning:• SLAs met• Excess Capacities
Underprovisioning:• SLAs violated• Usage maximized
SmartSLA
P. Xiong: “Intelligent management of virtualized resources for database systems in cloud environment”, ICDE 2011
Solution: machine learning (regression + boosting) for prediction choose allocation that minimizesSLA penalties
Resourceallocation
Databaseperformance
LearnMapping
Functional Requirements
Scan-Querys
Conditional Updates
Transactions
Query by Example
Joins
Analytics
Elasticity
Consistency
Read-Latency
Write-Latency
Write-Throughput
Scalability of Data Volume
Read Scalability
Read-Availability
Write-Availability
Non-Functional Requirements
Durability
Write Scalability
DBaaS: General Considerations
Functional Requirements
Scan-Querys
Conditional Updates
Transactions
Query by Example
Joins
Analytics
Elasticity
Consistency
Read-Latency
Write-Latency
Write-Throughput
Scalability of Data Volume
Read Scalability
Read-Availability
Write-Availability
Non-Functional Requirements
Durability
Write Scalability
DBaaS: General Considerations
Functional Requirements
Scan-Querys
Conditional Updates
Transactions
Query by Example
Joins
Analytics
Elasticity
Consistency
Read-Latency
Write-Latency
Write-Throughput
Scalability of Data Volume
Read Scalability
Read-Availability
Write-Availability
Non-Functional Requirements
Durability
Write Scalability
DBaaS: General Considerations
Functional Requirements
Scan-Querys
Conditional Updates
Transactions
Query by Example
Joins
Analytics
Elasticity
Consistency
Read-Latency
Write-Latency
Write-Throughput
Scalability of Data Volume
Read Scalability
Read-Availability
Write-Availability
Non-Functional Requirements
Durability
Write Scalability
DBaaS: General Considerations
aaS
Functional Requirements
Scan-Querys
Conditional Updates
Transactions
Query by Example
Joins
Analytics
Elasticity
Consistency
Read-Latency
Write-Latency
Write-Throughput
Scalability of Data Volume
Read Scalability
Read-Availability
Write-Availability
Non-Functional Requirements
Durability
Write Scalability
DBaaS: General Considerations
aaS
Questions to ask: • Which requirements are met by the DB?• Which are met by the provider? SLAs
Outline
Examples of different clouddatabase systems:• Cloud-deployed• Managed DBMS
• SQL• NoSQL
• Proprietary• BaaS
What are Cloud Databases?
Cloud Databases in the wild
Research Perspectives
Wrap-up andliterature
Idea: Run (mostly) unmodified DB on IaaS
Cloud-Deployed DB
Method I: DIY
Method II: Deployment Tools
Method III: Marketplaces
Idea: Run (mostly) unmodified DB on IaaS
Cloud-Deployed DB
Method I: DIY
Method II: Deployment Tools
Method III: Marketplaces
1. Provision VM(s)
Idea: Run (mostly) unmodified DB on IaaS
Cloud-Deployed DB
Method I: DIY
Method II: Deployment Tools
Method III: Marketplaces
1. Provision VM(s) 2. Install DBMS (manual, script, Chef, Puppet)
Idea: Run (mostly) unmodified DB on IaaS
Cloud-Deployed DB
Method I: DIY
Method II: Deployment Tools
Method III: Marketplaces
> whirr launch-cluster --confighbase.properties
Login, cluster-size etc. Amazon EC2
1. Provision VM(s) 2. Install DBMS (manual, script, Chef, Puppet)
Idea: Run (mostly) unmodified DB on IaaS
Cloud-Deployed DB
Method I: DIY
Method II: Deployment Tools
Method III: Marketplaces
> whirr launch-cluster --confighbase.properties
Login, cluster-size etc. Amazon EC2
1. Provision VM(s) 2. Install DBMS (manual, script, Chef, Puppet)
Idea: Run preconfigured DB on IaaS
AWS Marketplace AWS
Marketplace
Model:
Cloud-Deployed
Pricing:
Instance +
Volume +
License
Underlying DB:
Choosable
API:
DB-specific
Idea: Run preconfigured DB on IaaS
AWS Marketplace AWS
Marketplace
Model:
Cloud-Deployed
Pricing:
Instance +
Volume +
License
Underlying DB:
Choosable
API:
DB-specific
Idea: Run preconfigured DB on IaaS
AWS Marketplace AWS
Marketplace
Model:
Cloud-Deployed
Pricing:
Instance +
Volume +
License
Underlying DB:
Choosable
API:
DB-specific
Idea: Run preconfigured DB on IaaS
AWS Marketplace AWS
Marketplace
Model:
Cloud-Deployed
Pricing:
Instance +
Volume +
License
Underlying DB:
Choosable
API:
DB-specific
Idea: Run preconfigured DB on IaaS
AWS Marketplace AWS
Marketplace
Model:
Cloud-Deployed
Pricing:
Instance +
Volume +
License
Underlying DB:
Choosable
API:
DB-specific
Idea: Run preconfigured DB on IaaS
AWS Marketplace AWS
Marketplace
Model:
Cloud-Deployed
Pricing:
Instance +
Volume +
License
Underlying DB:
Choosable
API:
DB-specific
Idea: Run preconfigured DB on IaaS
AWS Marketplace AWS
Marketplace
Model:
Cloud-Deployed
Pricing:
Instance +
Volume +
License
Underlying DB:
Choosable
API:
DB-specific
Idea: Run preconfigured DB on IaaS
AWS Marketplace AWS
Marketplace
Model:
Cloud-Deployed
Pricing:
Instance +
Volume +
License
Underlying DB:
Choosable
API:
DB-specific
Bad: • No Clusters• Not managed (automatic Updates, Snapshots, etc.)• Private OS Multi-Tenancy Bad Resource Usage
Good:
• Easy to get started
Amazon Elastic MapReduce EMR
Model:
Analytics-aaS
Pricing:
Infrastructure
API:
Hadoop
Amazon Elastic
MapReduce
W. Lehner, U. Sattler “Web-scale Data Management for the Cloud” Springer, 2013
Amazon Elastic MapReduce EMR
Model:
Analytics-aaS
Pricing:
Infrastructure
API:
Hadoop
Amazon Elastic
MapReduce
Provisions
W. Lehner, U. Sattler “Web-scale Data Management for the Cloud” Springer, 2013
Amazon Elastic MapReduce EMR
Model:
Analytics-aaS
Pricing:
Infrastructure
API:
Hadoop
Amazon Elastic
MapReduce
Job Tracker
Task Tracker + HDFS Data Node
Task Tracker
Provisions
W. Lehner, U. Sattler “Web-scale Data Management for the Cloud” Springer, 2013
Amazon Elastic MapReduce EMR
Model:
Analytics-aaS
Pricing:
Infrastructure
API:
Hadoop
Amazon Elastic
MapReduce
Data Source and Sink
Job Tracker
Task Tracker + HDFS Data Node
Task Tracker
Provisions
W. Lehner, U. Sattler “Web-scale Data Management for the Cloud” Springer, 2013
Amazon Elastic MapReduce EMR
Model:
Analytics-aaS
Pricing:
Infrastructure
API:
Hadoop
Amazon Elastic
MapReduce
Submits Hadoop Jobs as:• JAR• Streaming• Cascading• Pig• Hive• Impala
Data Source and Sink
Job Tracker
Task Tracker + HDFS Data Node
Task Tracker
Provisions
W. Lehner, U. Sattler “Web-scale Data Management for the Cloud” Springer, 2013
Amazon Elastic MapReduce EMR
Model:
Analytics-aaS
Pricing:
Infrastructure
API:
Hadoop
Amazon Elastic
MapReduce
Submits Hadoop Jobs as:• JAR• Streaming• Cascading• Pig• Hive• Impala
Data Source and Sink
Job Tracker
Task Tracker + HDFS Data Node
Task Tracker
Provisions
W. Lehner, U. Sattler “Web-scale Data Management for the Cloud” Springer, 2013
• No data locality with S3• AWS Import/Export: send your HDD• HBase Integration• Compatible with Spot and Reserved Instances• Similar: Azure HDInsight
Idea: Web-scale analysis of nested data
Google BigQuery BigQuery
Model:
Analytics-aaS
Pricing:
Storage + GBs
Processed
API:
REST
BigQuery
Idea: Web-scale analysis of nested data
Google BigQuery BigQuery
Model:
Analytics-aaS
Pricing:
Storage + GBs
Processed
API:
REST
BigQuery
Idea: Web-scale analysis of nested data
Google BigQuery BigQuery
Model:
Analytics-aaS
Pricing:
Storage + GBs
Processed
API:
REST
BigQuery
Dremel
Melnik et al. “Dremel: Interactive analysis of web-scale datasets”, VLDB 2010
Idea:Multi-Level execution tree on nested columnar data format(≥100 nodes)
Idea: Web-scale analysis of nested data
Google BigQuery BigQuery
Model:
Analytics-aaS
Pricing:
Storage + GBs
Processed
API:
REST
BigQuery
Dremel
Melnik et al. “Dremel: Interactive analysis of web-scale datasets”, VLDB 2010
Idea:Multi-Level execution tree on nested columnar data format(≥100 nodes)
• SLA: 99.9% uptime / month• Fundamentally different from relational DWHs
and MapReduce• Design copied by Apache Drill, Impala, Shark
Relational Database Service
Amazon RDS RDS
Model:
Managed RDBMS
Pricing:
Instance + Volume
+ License
Underlying DB:
MySQL, Postgres,
MSSQL, Oracle
API:
DB-specific
Relational Database Service
Amazon RDS RDS
Model:
Managed RDBMS
Pricing:
Instance + Volume
+ License
Underlying DB:
MySQL, Postgres,
MSSQL, Oracle
API:
DB-specific
Relational Database Service
Amazon RDS RDS
Model:
Managed RDBMS
Pricing:
Instance + Volume
+ License
Underlying DB:
MySQL, Postgres,
MSSQL, Oracle
API:
DB-specific
• Synchronous Replication• Automatic Failover
Relational Database Service
Amazon RDS RDS
Model:
Managed RDBMS
Pricing:
Instance + Volume
+ License
Underlying DB:
MySQL, Postgres,
MSSQL, Oracle
API:
DB-specific
• Synchronous Replication• Automatic Failover
99,95% uptime SLA
Relational Database Service
Amazon RDS RDS
Model:
Managed RDBMS
Pricing:
Instance + Volume
+ License
Underlying DB:
MySQL, Postgres,
MSSQL, Oracle
API:
DB-specific
• Synchronous Replication• Automatic Failover
99,95% uptime SLA
Provisioned IOPS: access to EBS volumes network-optimized (up to 4000 IOPS)
Relational Database Service
Amazon RDS RDS
Model:
Managed RDBMS
Pricing:
Instance + Volume
+ License
Underlying DB:
MySQL, Postgres,
MSSQL, Oracle
API:
DB-specific
Relational Database Service
Amazon RDS RDS
Model:
Managed RDBMS
Pricing:
Instance + Volume
+ License
Underlying DB:
MySQL, Postgres,
MSSQL, Oracle
API:
DB-specific
EC2 instances: Up to 32 Cores, 244 GB RAM, 10 GbE
Relational Database Service
Amazon RDS RDS
Model:
Managed RDBMS
Pricing:
Instance + Volume
+ License
Underlying DB:
MySQL, Postgres,
MSSQL, Oracle
API:
DB-specific
EC2 instances: Up to 32 Cores, 244 GB RAM, 10 GbE
Minor Version Upgrades are performed without downtime
Relational Database Service
Amazon RDS RDS
Model:
Managed RDBMS
Pricing:
Instance + Volume
+ License
Underlying DB:
MySQL, Postgres,
MSSQL, Oracle
API:
DB-specific
Relational Database Service
Amazon RDS RDS
Model:
Managed RDBMS
Pricing:
Instance + Volume
+ License
Underlying DB:
MySQL, Postgres,
MSSQL, Oracle
API:
DB-specific
Backups are automated and scheduled
Relational Database Service
Amazon RDS RDS
Model:
Managed RDBMS
Pricing:
Instance + Volume
+ License
Underlying DB:
MySQL, Postgres,
MSSQL, Oracle
API:
DB-specific
Backups are automated and scheduled
• Support for (asynchronous) Read Replicas• Administration: Web-based or SDKs• Only RDBMSs• “Analytic Brother“ of RDS: RedShift (PDWH)
Similar to RDS
Microsoft SQL Azure SQL Azure
Model:
Managed RDBMS
Pricing:
Database size
Underlying DB:
MSSQL Server
API:
T-SQL/TDS
SQL Azure
Similar to RDS
Microsoft SQL Azure SQL Azure
Model:
Managed RDBMS
Pricing:
Database size
Underlying DB:
MSSQL Server
API:
T-SQL/TDS
SQL Azure
Similar to RDS
Microsoft SQL Azure SQL Azure
Model:
Managed RDBMS
Pricing:
Database size
Underlying DB:
MSSQL Server
API:
T-SQL/TDS
SQL Azure
Similar to RDS
Microsoft SQL Azure SQL Azure
Model:
Managed RDBMS
Pricing:
Database size
Underlying DB:
MSSQL Server
API:
T-SQL/TDS
SQL Azure
Similar to RDS
Microsoft SQL Azure SQL Azure
Model:
Managed RDBMS
Pricing:
Database size
Underlying DB:
MSSQL Server
API:
T-SQL/TDS
SQL Azure
Similar to RDS
Microsoft SQL Azure SQL Azure
Model:
Managed RDBMS
Pricing:
Database size
Underlying DB:
MSSQL Server
API:
T-SQL/TDS
SQL Azure
Cloud SQL Server
P. Bernstein et al. “Adapting Microsoft SQL server for cloud computing”, ICDE 2011
• Multi-Tenant MSSQL• Paxos-like commit protocol for
consistent replication
Similar to RDS
Microsoft SQL Azure SQL Azure
Model:
Managed RDBMS
Pricing:
Database size
Underlying DB:
MSSQL Server
API:
T-SQL/TDS
SQL Azure
Keyless Table Group: regular databaseKeyed Table Group: partitioned by row key
Cloud SQL Server
P. Bernstein et al. “Adapting Microsoft SQL server for cloud computing”, ICDE 2011
• Multi-Tenant MSSQL• Paxos-like commit protocol for
consistent replication
Similar to RDS
Microsoft SQL Azure SQL Azure
Model:
Managed RDBMS
Pricing:
Database size
Underlying DB:
MSSQL Server
API:
T-SQL/TDS
SQL Azure
Keyless Table Group: regular databaseKeyed Table Group: partitioned by row key
Consistency unit (ACID boundary)
Cloud SQL Server
P. Bernstein et al. “Adapting Microsoft SQL server for cloud computing”, ICDE 2011
• Multi-Tenant MSSQL• Paxos-like commit protocol for
consistent replication
Similar to RDS
Microsoft SQL Azure SQL Azure
Model:
Managed RDBMS
Pricing:
Database size
Underlying DB:
MSSQL Server
API:
T-SQL/TDS
SQL Azure
Keyless Table Group: regular databaseKeyed Table Group: partitioned by row key
Consistency unit (ACID boundary)
Automatic Partitioning for Keyed Table Groups
Cloud SQL Server
P. Bernstein et al. “Adapting Microsoft SQL server for cloud computing”, ICDE 2011
• Multi-Tenant MSSQL• Paxos-like commit protocol for
consistent replication
Similar to RDS
Microsoft SQL Azure SQL Azure
Model:
Managed RDBMS
Pricing:
Database size
Underlying DB:
MSSQL Server
API:
T-SQL/TDS
SQL Azure
Keyless Table Group: regular databaseKeyed Table Group: partitioned by row key
Consistency unit (ACID boundary)
Automatic Partitioning for Keyed Table Groups
Cloud SQL Server
P. Bernstein et al. “Adapting Microsoft SQL server for cloud computing”, ICDE 2011
• Multi-Tenant MSSQL• Paxos-like commit protocol for
consistent replication
• SLA: 99.9% uptime / month• Usually Cheaper than RDS (Multi-Tenancy)• Smaller Databases (max. 150 GB)• Rich MSSQL server tooling• Keyed Table Group internal feature only
MySQL for Google App Engine PaaS
Support for: patching, replication, backup
SLA: 99,95 % uptime / month
Other RDBMS servicesGoogle
Cloud SQL Google Cloud SQL
Pricing:
Database size
Underlying DB:
MySQL
MySQL for Google App Engine PaaS
Support for: patching, replication, backup
SLA: 99,95 % uptime / month
Postgres for Heroku PaaS
Hosted on EC2
No SLAs
Other RDBMS servicesGoogle
Cloud SQL Google Cloud SQL
Pricing:
Database size
Underlying DB:
MySQL
Heroku Postgres
Pricing:
Plan based
Underlying DB:
Postgres
MySQL for Google App Engine PaaS
Support for: patching, replication, backup
SLA: 99,95 % uptime / month
Postgres for Heroku PaaS
Hosted on EC2
No SLAs
MySQL for OpenStack (Icehouse)
Under development (HP driven)
VM ⇔ Tenant
Other RDBMS servicesGoogle
Cloud SQL Google Cloud SQL
Pricing:
Database size
Underlying DB:
MySQL
Heroku Postgres
Pricing:
Plan based
Underlying DB:
Postgres
Trove
Pricing:
Own Hardware
Underlying DB:
MySQL
Trove
MySQL for Google App Engine PaaS
Support for: patching, replication, backup
SLA: 99,95 % uptime / month
Postgres for Heroku PaaS
Hosted on EC2
No SLAs
MySQL for OpenStack (Icehouse)
Under development (HP driven)
VM ⇔ Tenant
Other RDBMS servicesGoogle
Cloud SQL Google Cloud SQL
Pricing:
Database size
Underlying DB:
MySQL
Heroku Postgres
Pricing:
Plan based
Underlying DB:
Postgres
Trove
Pricing:
Own Hardware
Underlying DB:
MySQL
Trove
Evaluation of Cloud RDBMSs
D. Kossmann,T. Kraska: An evaluation of alternative architectures for transaction processing in the cloud”, Sigmod 2010
TPC-W Benchmark (Online Shop), 2010, Emulated Browsers / RPS:
HBase Wide-Column
CP Over Row Key
~700 1/4 Apache
(EMR)
MongoDB Doc-ument
CP yes >100<500
4/4 GPL
Riak Key-Value
AP ~60 3/4 Apache
(Softlayer)
Cassandra Wide-Column
AP WithComp. Index
>300<1000
2/4 Apache
Redis Key-Value
CA Through Lists, etc.
manual N/A 4/4 BSD
Managed NoSQL services
Model CAP ScansSec.
IndicesLargestCluster
Lic.Lear-ning DBaaS
HBase Wide-Column
CP Over Row Key
~700 1/4 Apache
(EMR)
MongoDB Doc-ument
CP yes >100<500
4/4 GPL
Riak Key-Value
AP ~60 3/4 Apache
(Softlayer)
Cassandra Wide-Column
AP WithComp. Index
>300<1000
2/4 Apache
Redis Key-Value
CA Through Lists, etc.
manual N/A 4/4 BSD
Managed NoSQL services
Model CAP ScansSec.
IndicesLargestCluster
Lic.Lear-ning DBaaS
And there are many more:• CouchDB (e.g. Cloudant)• CouchBase (e.g. KuroBase Beta)• ElasticSearch(e.g. Bonsai)• Solr (e.g. WebSolr)• …
MongoHQ MongoHQ
Model:
Managed NoSQL
Pricing:
Plan-based
Underlying DB:
MongoDB
API:
Mongo, REST (beta)
EC2-based MongoDB-as-a-Serivce
MongoHQ MongoHQ
Model:
Managed NoSQL
Pricing:
Plan-based
Underlying DB:
MongoDB
API:
Mongo, REST (beta)
EC2-based MongoDB-as-a-Serivce
MongoHQ MongoHQ
Model:
Managed NoSQL
Pricing:
Plan-based
Underlying DB:
MongoDB
API:
Mongo, REST (beta)
EC2-based MongoDB-as-a-Serivce
MongoHQ MongoHQ
Model:
Managed NoSQL
Pricing:
Plan-based
Underlying DB:
MongoDB
API:
Mongo, REST (beta)
EC2-based MongoDB-as-a-Serivce
MongoHQ MongoHQ
Model:
Managed NoSQL
Pricing:
Plan-based
Underlying DB:
MongoDB
API:
Mongo, REST (beta)
EC2-based MongoDB-as-a-Serivce
MongoHQ MongoHQ
Model:
Managed NoSQL
Pricing:
Plan-based
Underlying DB:
MongoDB
API:
Mongo, REST (beta)
EC2-based MongoDB-as-a-Serivce
Private Process Multi-TenancyScale-Up Strategy: RAM, IOPs and CPU increased at runtimeMaximum Size: 1TB
Free Tier
MongoHQ MongoHQ
Model:
Managed NoSQL
Pricing:
Plan-based
Underlying DB:
MongoDB
API:
Mongo, REST (beta)
EC2-based MongoDB-as-a-Serivce
Private Process Multi-TenancyScale-Up Strategy: RAM, IOPs and CPU increased at runtimeMaximum Size: 1TB
Free Tier
VM-Deployment (EC2):M1.large on EC2: $128.10on EC2 with 1-yr-Res.: $30With MongoHQ: $637
MongoHQ MongoHQ
Model:
Managed NoSQL
Pricing:
Plan-based
Underlying DB:
MongoDB
API:
Mongo, REST (beta)
EC2-based MongoDB-as-a-Serivce
#1 thing that should never happen:
MongoHQ
Problem: Scalability
Client
Client
configconfigconfig
mongos
Replica Set
Master
Slave
Slave
mongos
MongoHQ
Problem: Scalability
Client
Client
configconfigconfig
mongos
Replica Set
Master
Slave
Slave
mongos
What if Writes / second or data volume become bottleneck?
MongoHQ
Problem: Scalability
Client
Client
configconfigconfig
mongos
Replica Set
Replica Set
Master
Slave
Slave
Master
Slave
Slave
mongos
Sharding (Scale Out)• Dynamic Scaling: Tenant adds
Replica Set• Elastic Scaling: Provider
adds/removes Replica SetMongoHQ: Only manual shardingon “contact us” basis
MongoHQ
Problem: Scalability
Client
Client
configconfigconfig
mongos
Replica Set
Replica Set
Master
Slave
Slave
Master
Slave
Slave
mongos
Sharding (Scale Out)• Dynamic Scaling: Tenant adds
Replica Set• Elastic Scaling: Provider
adds/removes Replica SetMongoHQ: Only manual shardingon “contact us” basis
• Bad: no SLAs, no horizontal scaling
• Good: Solid Dashboard, Backups, Replication, integration with Heroku
• Competitors: MongoLab, ObjectRocket, MongoSoup
ElastiCache ElastiCache
Model:
Managed NoSQL
Pricing:
Infrastructure
Underlying DB:
Memcache, Redis
API:
DB, REST
(management)
„RDS for Memcache and Redis“
ElastiCache ElastiCache
Model:
Managed NoSQL
Pricing:
Infrastructure
Underlying DB:
Memcache, Redis
API:
DB, REST
(management)
„RDS for Memcache and Redis“
Memcache or RedisMemcache can be run as a cluster (client-side sharding)
Limited choice of instance types
ElastiCache ElastiCache
Model:
Managed NoSQL
Pricing:
Infrastructure
Underlying DB:
Memcache, Redis
API:
DB, REST
(management)
„RDS for Memcache and Redis“Announced last Saturday: Snapshots
ElastiCache ElastiCache
Model:
Managed NoSQL
Pricing:
Infrastructure
Underlying DB:
Memcache, Redis
API:
DB, REST
(management)
„RDS for Memcache and Redis“
ElastiCache ElastiCache
Model:
Managed NoSQL
Pricing:
Infrastructure
Underlying DB:
Memcache, Redis
API:
DB, REST
(management)
„RDS for Memcache and Redis“
$ elasticache-modify-cache-parameter-group xy
AWS CLI tools and SDKsoffer a superset of Dashboard functions
Many Hosted NoSQLDbaaS Providers represented
Heroku DBaaS Addons
Many Hosted NoSQLDbaaS Providers represented
And Search
Heroku DBaaS Addons
Heroku Redis2Go example Redis2Go
Model:
Managed NoSQL
Pricing:
Plan-based
Underlying DB:
Redis
API:
Redis
Create Heroku App:
Heroku Redis2Go example Redis2Go
Model:
Managed NoSQL
Pricing:
Plan-based
Underlying DB:
Redis
API:
Redis
Create Heroku App:
Add Redis2Go Addon:
Heroku Redis2Go example Redis2Go
Model:
Managed NoSQL
Pricing:
Plan-based
Underlying DB:
Redis
API:
Redis
Create Heroku App:
Add Redis2Go Addon:
Use Connection URL (environment variable):
Heroku Redis2Go example Redis2Go
Model:
Managed NoSQL
Pricing:
Plan-based
Underlying DB:
Redis
API:
Redis
Create Heroku App:
Add Redis2Go Addon:
Use Connection URL (environment variable):
Deploy:
Heroku Redis2Go example Redis2Go
Model:
Managed NoSQL
Pricing:
Plan-based
Underlying DB:
Redis
API:
Redis
Create Heroku App:
Add Redis2Go Addon:
Use Connection URL (environment variable):
Deploy:• Very simple• Only suited for small to medium
applications (no SLAs, limited control)
SimpleDB Table-Store
CP Yes (asqueries)
Auto-matic
SQL-like(no joins, groups, …)
REST + SDKs
Dynamo-DB
Table-Store
CP By rangekey / index
Local Sec.Global Sec.
Key+Cond. On Range Key(s)
REST + SDKs
Automaticover Prim. Key
AzureTables
Table-Store
CP By rangekey
Key+Cond. On Range Key
REST + SDKs
Automaticover Part. Key
99.9% uptime
AE/CloudDataStore
Entity-Group
CP Yes (asqueries)
Auto-matic
Conjunct.of Eq. Predicates
REST/SDK, JDO,JPA
Automaticover EntityGroups
S3, Az. Blob, GCS
Blob-Store
AP REST + SDKs
Automaticover key
99.9% uptime(S3)
Proprietary Database services
Model CAP ScansSec.
IndicesQueries API SLA
Scale-out
SimpleDB Table-Store
CP Yes (asqueries)
Auto-matic
SQL-like(no joins, groups, …)
REST + SDKs
Dynamo-DB
Table-Store
CP By rangekey / index
Local Sec.Global Sec.
Key+Cond. On Range Key(s)
REST + SDKs
Automaticover Prim. Key
AzureTables
Table-Store
CP By rangekey
Key+Cond. On Range Key
REST + SDKs
Automaticover Part. Key
99.9% uptime
AE/CloudDataStore
Entity-Group
CP Yes (asqueries)
Auto-matic
Conjunct.of Eq. Predicates
REST/SDK, JDO,JPA
Automaticover EntityGroups
S3, Az. Blob, GCS
Blob-Store
AP REST + SDKs
Automaticover key
99.9% uptime(S3)
Proprietary Database services
Model CAP ScansSec.
IndicesQueries API SLA
Scale-out
There are many more object stores (HP, Rackspace, etc.)…but no comparable Table Stores
Azure Storage
Load-Balancing-System
Application
HTTP/HTTPS
Worker-Role
Web-Role
IIS-Webserver
Virtual Machines
Call
OtherServices
Not Allowed
Virtual Machines
Azure Storage
Load-Balancing-System
Application
HTTP/HTTPS
Worker-Role
Web-Role
IIS-Webserver
Virtual Machines
Call
OtherServices
Not Allowed
Virtual Machines
Tables
Blobs
Queues
Table Service example: Azure Tables
Partition Key
Row Key (sortiert)
Timestamp(autom.)
Property1 Propertyn
intro.pdf v1.1 14/6/2013 … …
intro.pdf v1.2 15/6/2013 …
präs.pptx v0.0 11/6/2013 …
Partition
Partition
RES
T A
PI
Table Service example: Azure Tables
Partition Key
Row Key (sortiert)
Timestamp(autom.)
Property1 Propertyn
intro.pdf v1.1 14/6/2013 … …
intro.pdf v1.2 15/6/2013 …
präs.pptx v0.0 11/6/2013 …
Partition
Partition
RES
T A
PI
SparseHash-distributed toparition servers
No Index: Lookup only (!) by full table scanAtomic "Entity-Group Batch Transaction" possible
Similar to Amazon SimpleDB and DynamoDB
Table Service example: Azure Tables
Partition Key
Row Key (sortiert)
Timestamp(autom.)
Property1 Propertyn
intro.pdf v1.1 14/6/2013 … …
intro.pdf v1.2 15/6/2013 …
präs.pptx v0.0 11/6/2013 …
Partition
Partition
RES
T A
PI
• Indexes all attributes• Rich(er) queries• Many Limits (size, RPS, etc.)
• Provisioned Throughput• On SSDs („single digit latency“)• Optional Indexes
Azure Table Storage Azure Tables
Model:
Propriertary
Pricing:
Requests + Storage
+ Network
Underlying DB:
Custom System
API:
REST
Challenges:
Single partition and range key (modelling) Very “basic“ queries
Azure Tables
Azure Table Storage Azure Tables
Model:
Propriertary
Pricing:
Requests + Storage
+ Network
Underlying DB:
Custom System
API:
REST
Challenges:
Single partition and range key (modelling) Very “basic“ queries
Azure Tables
Azure Table Storage Azure Tables
Model:
Propriertary
Pricing:
Requests + Storage
+ Network
Underlying DB:
Custom System
API:
REST
Challenges:
Single partition and range key (modelling) Very “basic“ queries
Good:Automatic DistributionReplicated 3x locally + 1x async. geo-replica
Azure Tables
Azure Table Storage Azure Tables
Model:
Propriertary
Pricing:
Requests + Storage
+ Network
Underlying DB:
Custom System
API:
REST
Challenges:
Single partition and range key (modelling) Very “basic“ queries
Good:Automatic DistributionReplicated 3x locally + 1x async. geo-replica
Very good:SLA (99.9% uptime)Internal architecture published
Azure Tables
Azure Table Storage Azure Tables
Model:
Propriertary
Pricing:
Requests + Storage
+ Network
Underlying DB:
Custom System
API:
REST
Challenges:
Single partition and range key (modelling) Very “basic“ queries
Good:Automatic DistributionReplicated 3x locally + 1x async. geo-replica
Very good:SLA (99.9% uptime)Internal architecture published
Azure Tables
Windows Azure Storage
B. Calder, et al. "Windows Azure Storage: a highly available cloud storage service with strong consistency." , SOSP 2011
Idea:• Layered storage infrastructure for Blobs and Tables• Use research results (GFS, BigTable, Paxos, LSM, Erasure Coding)
Azure Table Storage Azure Tables
Model:
Propriertary
Pricing:
Requests + Storage
+ Network
Underlying DB:
Custom System
API:
REST
Challenges:
Single partition and range key (modelling) Very “basic“ queries
Good:Automatic DistributionReplicated 3x locally + 1x async. geo-replica
Very good:SLA (99.9% uptime)Internal architecture published
Azure Tables
Windows Azure Storage
B. Calder, et al. "Windows Azure Storage: a highly available cloud storage service with strong consistency." , SOSP 2011
Idea:• Layered storage infrastructure for Blobs and Tables• Use research results (GFS, BigTable, Paxos, LSM, Erasure Coding)
Think: GFS/HDFSThink: BigTable/HBase
Think: Chubby/ZooKeeper
DynamoDB DynamoDB
Model:
Propriertary
Pricing:
Provisioned
Throughput +
Network
Underlying DB:
Custom System
API:
REST
Successor to SimpleDB
Limitations: • Slow (~50-100 RPS)• 10 GB per Domain• Query result max. 2500 records or
1 MB• Max. 1K-sized attributes
DynamoDB DynamoDB
Model:
Propriertary
Pricing:
Provisioned
Throughput +
Network
Underlying DB:
Custom System
API:
REST
Successor to SimpleDB
dom:com.cnn content : "<html>…"
Primary Key Attribute (scalar or set)
page:index
Range Key
Item:
DynamoDB DynamoDB
Model:
Propriertary
Pricing:
Provisioned
Throughput +
Network
Underlying DB:
Custom System
API:
REST
Successor to SimpleDB
dom:com.cnn content : "<html>…"
Primary Key Attribute (scalar or set)
Querying Options:
• GetItem: Key Lookup
• Query: Primary Key + Condition on Range Key
• Scan: Full Table Scan with filter
• EMR: Hive queries (for analytics)
page:index
Range Key
Item:
DynamoDB DynamoDB
Model:
Propriertary
Pricing:
Provisioned
Throughput +
Network
Underlying DB:
Custom System
API:
REST
Successor to SimpleDB
Consistency: Strongly (2x price) or Eventually Consistent Reads
Atomic (Conditional) Updates per Item
Indexing Options:◦ Local Sec. Index: consistent additional Range Key
◦ Global Sec. Index: eventually consistent index-table (Primary Key)
dom:com.cnn content : "<html>…"
Primary Key Attribute (scalar or set)
page:index
Range Key
Item:
DynamoDB DynamoDB
Model:
Propriertary
Pricing:
Provisioned
Throughput +
Network
Underlying DB:
Custom System
API:
REST
Successor to SimpleDB
DynamoDB DynamoDB
Model:
Propriertary
Pricing:
Provisioned
Throughput +
Network
Underlying DB:
Custom System
API:
REST
Successor to SimpleDB
DynamoDB DynamoDB
Model:
Propriertary
Pricing:
Provisioned
Throughput +
Network
Underlying DB:
Custom System
API:
REST
Successor to SimpleDB
DynamoDB DynamoDB
Model:
Propriertary
Pricing:
Provisioned
Throughput +
Network
Underlying DB:
Custom System
API:
REST
Successor to SimpleDB
DynamoDB DynamoDB
Model:
Propriertary
Pricing:
Provisioned
Throughput +
Network
Underlying DB:
Custom System
API:
REST
Successor to SimpleDB
Unit of Billing
DynamoDB DynamoDB
Model:
Propriertary
Pricing:
Provisioned
Throughput +
Network
Underlying DB:
Custom System
API:
REST
Successor to SimpleDB
Unit of BillingGood:• Low Latency (SSD)• Data partitioning and AZ-replicationBad:• Scaling not elastic (Capacity Units)• No SLAs, no internals published (≠ Dynamo!)• No built-in backups ( AWS data pipeline)• Vendor Lock-in
AE/Cloud DataStore DataStore
Model:
Propriertary
Pricing:
CPU + Storage +
Network
Underlying DB:
MegaStore
API:
SDK, JPA, JDO
Google Cloud
Datastore
Structured Storage System for App Engine
Based on:◦ Megastore BigTable Colossus
AE/Cloud DataStore DataStore
Model:
Propriertary
Pricing:
CPU + Storage +
Network
Underlying DB:
MegaStore
API:
SDK, JPA, JDO
Google Cloud
Datastore
Structured Storage System for App Engine
Based on:◦ Megastore BigTable Colossus
AE/Cloud DataStore DataStore
Model:
Propriertary
Pricing:
CPU + Storage +
Network
Underlying DB:
MegaStore
API:
SDK, JPA, JDO
Google Cloud
Datastore
Structured Storage System for App Engine
Based on:◦ Megastore BigTable Colossus
AE/Cloud DataStore DataStore
Model:
Propriertary
Pricing:
CPU + Storage +
Network
Underlying DB:
MegaStore
API:
SDK, JPA, JDO
Google Cloud
Datastore
Structured Storage System for App Engine
Based on:◦ Megastore BigTable Colossus
Schemafree Entity Group (EG) data model:
User
IDName
Photo
IDUserURL
Root Table Child Table
1
n
AE/Cloud DataStore DataStore
Model:
Propriertary
Pricing:
CPU + Storage +
Network
Underlying DB:
MegaStore
API:
SDK, JPA, JDO
Google Cloud
Datastore
Structured Storage System for App Engine
Based on:◦ Megastore BigTable Colossus
Schemafree Entity Group (EG) data model:
User
IDName
Photo
IDUserURL
Root Table Child Table
1
n
EG: User + n Photos• Unit of ACID transactions/
consistency• Fields autoindexed
(eventually consistent)
AE/Cloud DataStore DataStore
Model:
Propriertary
Pricing:
CPU + Storage +
Network
Underlying DB:
MegaStore
API:
SDK, JPA, JDO
Google Cloud
Datastore
Structured Storage System for App Engine
Based on:◦ Megastore BigTable Colossus
Schemafree Entity Group (EG) data model:
User
IDName
Photo
IDUserURL
Root Table Child Table
1
n
EG: User + n Photos• Unit of ACID transactions/
consistency• Fields autoindexed
(eventually consistent)
SELECT * FROM photos
WHERE ANCESTOR IS :34 AND name = „sunset“
ORDER BY date ASC
LIMIT 10
OFFSET 10
AE/Cloud DataStore
Internally:
Entity Groups define partitions
SynchronousPaxos-based
replication
ACID per EG. Maximum of1 Write/s to an EG.
Eventual Consistencyacross groups
Stored in BigTable
AE/Cloud DataStore
Internally:
Entity Groups define partitions
SynchronousPaxos-based
replication
ACID per EG. Maximum of1 Write/s to an EG.
Eventual Consistencyacross groups
Stored in BigTable
AE/Cloud DataStore
Internally:
MegaStore
J. Baker, et al. "Megastore: Providing Scalable, Highly Available Storage for Interactive Services." CIDR 2011.
• Paxos-based replication andtransactions
• 100 Google applicationsProblems: Slow Writes, PredefinedEntity Groups
Entity Groups define partitions
SynchronousPaxos-based
replication
ACID per EG. Maximum of1 Write/s to an EG.
Eventual Consistencyacross groups
Stored in BigTable
AE/Cloud DataStore
Internally:
MegaStore
J. Baker, et al. "Megastore: Providing Scalable, Highly Available Storage for Interactive Services." CIDR 2011.
• Paxos-based replication andtransactions
• 100 Google applicationsProblems: Slow Writes, PredefinedEntity Groups
Entity Groups define partitions
SynchronousPaxos-based
replication
ACID per EG. Maximum of1 Write/s to an EG.
Eventual Consistencyacross groups
Stored in BigTable
Spanner
J. Corbett et al. "Spanner: Google’s globally distributed database." TOCS 2013
Idea:• Autosharded Entity Groups• Not based on BigTableImplementation:• TrueTime API (GPS + atomic
clocks) commit timestampsof 2PL-SI transactions
• Paxos-replication per Shard
AE/Cloud DataStore
Internally:
MegaStore
J. Baker, et al. "Megastore: Providing Scalable, Highly Available Storage for Interactive Services." CIDR 2011.
• Paxos-based replication andtransactions
• 100 Google applicationsProblems: Slow Writes, PredefinedEntity Groups
Entity Groups define partitions
SynchronousPaxos-based
replication
ACID per EG. Maximum of1 Write/s to an EG.
Eventual Consistencyacross groups
Stored in BigTable
Spanner
J. Corbett et al. "Spanner: Google’s globally distributed database." TOCS 2013
Idea:• Autosharded Entity Groups• Not based on BigTableImplementation:• TrueTime API (GPS + atomic
clocks) commit timestampsof 2PL-SI transactions
• Paxos-replication per Shard
F1
J. Shute, et al. "F1: A distributed SQL database that scales.“, VLDB 2013
Idea:• Full SQL relational database built on
Spanner, powers AdWordsImplementation:• 5-way replication• Data Model: relational + hierarchy
(customercampaignAdGroup)• Transactions: Snapshot-Read-Only,
Pessimistic (Spanner), optimistic• Distributed SQL Engine
AE/Cloud DataStore
Internally:
MegaStore
J. Baker, et al. "Megastore: Providing Scalable, Highly Available Storage for Interactive Services." CIDR 2011.
• Paxos-based replication andtransactions
• 100 Google applicationsProblems: Slow Writes, PredefinedEntity Groups
Entity Groups define partitions
SynchronousPaxos-based
replication
ACID per EG. Maximum of1 Write/s to an EG.
Eventual Consistencyacross groups
Stored in BigTable
Spanner
J. Corbett et al. "Spanner: Google’s globally distributed database." TOCS 2013
Idea:• Autosharded Entity Groups• Not based on BigTableImplementation:• TrueTime API (GPS + atomic
clocks) commit timestampsof 2PL-SI transactions
• Paxos-replication per Shard
F1
J. Shute, et al. "F1: A distributed SQL database that scales.“, VLDB 2013
Idea:• Full SQL relational database built on
Spanner, powers AdWordsImplementation:• 5-way replication• Data Model: relational + hierarchy
(customercampaignAdGroup)• Transactions: Snapshot-Read-Only,
Pessimistic (Spanner), optimistic• Distributed SQL Engine
Good:• Transactions (though limited)• Good scalability of data volumeBad:• Entity Groups hard to define• Bad Scale-Out for write and reads (Kossmann et
al.) no advantage over RDBMS for small datavolumes
A Spanner/F1 based DBaaS?
Idea: Blobs with RESTful CRUD Interface
Distribution: 𝐵𝑢𝑐𝑘𝑒𝑡, 𝐾𝑒𝑦 → 𝑆𝑒𝑟𝑣𝑒𝑟 + 𝑅𝑒𝑝𝑙𝑖𝑐𝑎𝑠
CDN Integration (Azure CDN, Cloudfront)
S3: > 2 Trillion (2 ⋅ 1012) objects
Azure Blobs, Amazon S3, Google Cloud Storage
Container
Blobcontains
Block 1 (up to 4 MB)
Block 2
Block n
REST-Request
Azure Blobs:
Block Blob: 4MB blocks large filesPage Blob: 512B blocks random IO
Automatic Versioning
Pricing: per request (10.000 ~ 1c) and storage(1GB/month ~ 3c) and network (1GB ~ 10c)
Azure Blobs, Amazon S3, Google Cloud Storage
S3 BlobDELETE /puppy.jpg HTTP/1.1
Host: mybucket.s3.amazonaws.comAuthorization: AWS AKIAIO...
AWS Ireland DC
Replicas
Reduced Redundancy: only 1 replicaGlacier: tape-disk archivalAmazon S3:
Automatic Versioning
Pricing: per request (10.000 ~ 1c) and storage(1GB/month ~ 3c) and network (1GB ~ 10c)
Azure Blobs, Amazon S3, Google Cloud Storage
S3 BlobDELETE /puppy.jpg HTTP/1.1
Host: mybucket.s3.amazonaws.comAuthorization: AWS AKIAIO...
AWS Ireland DC
Replicas
Reduced Redundancy: only 1 replicaGlacier: tape-disk archivalAmazon S3:
S3 Consistency
D. Bermbach,, S. Tai. "Eventual consistency: How soon is eventual?”, MW4SOC 11
Findings:• Inconsistency window varies
from 2-11 seconds• Monotonic Read Consistency
is often violated
Automatic Versioning
Pricing: per request (10.000 ~ 1c) and storage(1GB/month ~ 3c) and network (1GB ~ 10c)
Azure Blobs, Amazon S3, Google Cloud Storage
S3 BlobDELETE /puppy.jpg HTTP/1.1
Host: mybucket.s3.amazonaws.comAuthorization: AWS AKIAIO...
AWS Ireland DC
Replicas
Reduced Redundancy: only 1 replicaGlacier: tape-disk archivalAmazon S3:
S3 Consistency
D. Bermbach,, S. Tai. "Eventual consistency: How soon is eventual?”, MW4SOC 11
Findings:• Inconsistency window varies
from 2-11 seconds• Monotonic Read Consistency
is often violated
Building a database on S3
M. Brantner, et al. "Building a database on S3." Sigmod 2008
Idea:• Use S3 as the persistent storage of a
database
Implementation:• Buffer Pool and Log Manager on S3• No transaction or query support
Founded June, 2011
Acquired by Facebook April, 2013
Pricing: ◦ Free
◦ Pro (199$)
◦ Enterprise
Parse - MBaaS Parse
Model:
Backend-aas
Pricing:
Plan-based
Underlying DB:
Mainly MongoDB
API:
SDKs, REST
Founded June, 2011
Acquired by Facebook April, 2013
Pricing: ◦ Free
◦ Pro (199$)
◦ Enterprise
Parse - MBaaS Parse
Model:
Backend-aas
Pricing:
Plan-based
Underlying DB:
Mainly MongoDB
API:
SDKs, REST
Parse Core
Founded June, 2011
Acquired by Facebook April, 2013
Pricing: ◦ Free
◦ Pro (199$)
◦ Enterprise
Parse - MBaaS Parse
Model:
Backend-aas
Pricing:
Plan-based
Underlying DB:
Mainly MongoDB
API:
SDKs, REST
Parse Core Parse Analytics
Founded June, 2011
Acquired by Facebook April, 2013
Pricing: ◦ Free
◦ Pro (199$)
◦ Enterprise
Parse - MBaaS Parse
Model:
Backend-aas
Pricing:
Plan-based
Underlying DB:
Mainly MongoDB
API:
SDKs, REST
Parse Core Parse Analytics Parse Push
Parse - MBaaS
AuthenticationUser + PasswordOAuth: Facebook, Twitter
Parse - MBaaS
AuthenticationUser + PasswordOAuth: Facebook, Twitter
Parse - MBaaS
Query Cinemas(new Parse.Query('Cinemas'))
.withinKilometers(...)
.fetch()
AuthenticationUser + PasswordOAuth: Facebook, Twitter
Parse - MBaaS
Query Cinemas(new Parse.Query('Cinemas'))
.withinKilometers(...)
.fetch()
Query Movies(new Parse.Query('Movies'))
.greaterThan('startAt‘, now)
.notEqualTo('cinemas', cId)
.fetch()
Meteor Meteor
Model:
Backend-aaS
Pricing:
No yet revealed
Underlying DB:
MongoDB
API:
WebSockets
Idea: Full-Stack JavaScript with Node.js, MongoDB and WebSockets
Web Browser Node.js + MongoDB(Single Server)
WebSocket
Meteor Meteor
Model:
Backend-aaS
Pricing:
No yet revealed
Underlying DB:
MongoDB
API:
WebSockets
Idea: Full-Stack JavaScript with Node.js, MongoDB and WebSockets
<div class="player {{selected}}"><span class="name">{{name}}</span><span class="score">{{score}}</span>
</div>
Web Browser Node.js + MongoDB(Single Server)
WebSocket
Meteor Meteor
Model:
Backend-aaS
Pricing:
No yet revealed
Underlying DB:
MongoDB
API:
WebSockets
Idea: Full-Stack JavaScript with Node.js, MongoDB and WebSockets
Players = new Meteor.
Collection("players");
if (Meteor.isServer) {
//…
<div class="player {{selected}}"><span class="name">{{name}}</span><span class="score">{{score}}</span>
</div>
Web Browser Node.js + MongoDB(Single Server)
WebSocket
Meteor Meteor
Model:
Backend-aaS
Pricing:
No yet revealed
Underlying DB:
MongoDB
API:
WebSockets
Idea: Full-Stack JavaScript with Node.js, MongoDB and WebSockets
Players = new Meteor.
Collection("players");
if (Meteor.isServer) {
//…
if (Meteor.isClient) {
Template.leaderboard.players = function () {
return Players.find({},
{sort: {score: -1, name: 1}});
};
<div class="player {{selected}}"><span class="name">{{name}}</span><span class="score">{{score}}</span>
</div>
Web Browser Node.js + MongoDB(Single Server)
WebSocket
Meteor Meteor
Model:
Backend-aaS
Pricing:
No yet revealed
Underlying DB:
MongoDB
API:
WebSockets
Idea: Full-Stack JavaScript with Node.js, MongoDB and WebSockets
Players = new Meteor.
Collection("players");
if (Meteor.isServer) {
//…
if (Meteor.isClient) {
Template.leaderboard.players = function () {
return Players.find({},
{sort: {score: -1, name: 1}});
};
<div class="player {{selected}}"><span class="name">{{name}}</span><span class="score">{{score}}</span>
</div>
Web Browser Node.js + MongoDB(Single Server)
WebSocket
$ meteor deploy abc.meteor.com
Meteor Meteor
Model:
Backend-aaS
Pricing:
No yet revealed
Underlying DB:
MongoDB
API:
WebSockets
Idea: Full-Stack JavaScript with Node.js, MongoDB and WebSockets
Players = new Meteor.
Collection("players");
if (Meteor.isServer) {
//…
if (Meteor.isClient) {
Template.leaderboard.players = function () {
return Players.find({},
{sort: {score: -1, name: 1}});
};
<div class="player {{selected}}"><span class="name">{{name}}</span><span class="score">{{score}}</span>
</div>
Web Browser Node.js + MongoDB(Single Server)
WebSocket
$ meteor deploy abc.meteor.com
Very productive for very small projects
Fundamentally limited scalability:• Server tails Mongo‘s oplog• And holds the fetched data state of
every client
Outline
• Hot Topics• Orestes: a scalable, low-
latency architecture• Baqend: putting it into
practice
What are Cloud Databases?
Cloud Databases in the wild
Research Perspectives
Wrap-up andliterature
Example: CryptDB
Idea: Only decrypt as much as neccessary
Encrypted Databases: Research
RDBMS
SQL-Proxy
Encrypts and decrypts, rewrites queries
Example: CryptDB
Idea: Only decrypt as much as neccessary
Encrypted Databases: Research
RDBMS
SQL-Proxy
Encrypts and decrypts, rewrites queries
Example: CryptDB
Idea: Only decrypt as much as neccessary
Encrypted Databases: Research
RDBMS
SQL-Proxy
Encrypts and decrypts, rewrites queries
Relational Cloud
C. Curino, et al. "Relational cloud: A database-as-a-service for the cloud.“, CIDR 2011
DBaaS Architecture:• Encrypted with CryptDB• Multi-Tenancy through live
migration• Workload-aware partitioning
(graph-based)
Example: CryptDB
Idea: Only decrypt as much as neccessary
Encrypted Databases: Research
RDBMS
SQL-Proxy
Encrypts and decrypts, rewrites queries
Relational Cloud
C. Curino, et al. "Relational cloud: A database-as-a-service for the cloud.“, CIDR 2011
DBaaS Architecture:• Encrypted with CryptDB• Multi-Tenancy through live
migration• Workload-aware partitioning
(graph-based)
• Early approach• Not adopted in practice, yet
Dream solution:Full Homorphic Encryption
Transactions/Consistency: Research
Dynamo Eventual None 1 RT -
Yahoo PNuts Timeline per key Single Key 1 RT possible
COPS Causality Multi-Record 1 RT possible
MySQL (async) Serializable Static Partition 1 RT possible
Megastore Serializable Static Partition 2 RT -
Spanner/F1 Snapshot Isolation Partition 2 RT -
MDCC Read-Commited Multi-Record 1 RT -
Consistency Transactional UnitCommit Latency
Data Loss?
Transactions/Consistency: Research
Dynamo Eventual None 1 RT -
Yahoo PNuts Timeline per key Single Key 1 RT possible
COPS Causality Multi-Record 1 RT possible
MySQL (async) Serializable Static Partition 1 RT possible
Megastore Serializable Static Partition 2 RT -
Spanner/F1 Snapshot Isolation Partition 2 RT -
MDCC Read-Commited Multi-Record 1 RT -
Consistency Transactional UnitCommit Latency
Data Loss?
Multi-Data Center Consistency
T. Kraska et al. "MDCC: Multi-data center consistency." EuroSys, 2013.
Idea:• Multi-Data center commit protocol with
single round-trip
Implementation:• Optimistic Commit Protocol• Fast, Generalized Multi-Paxos
Result: almost as fast as Dynamo-style
Transactions/Consistency: Research
Dynamo Eventual None 1 RT -
Yahoo PNuts Timeline per key Single Key 1 RT possible
COPS Causality Multi-Record 1 RT possible
MySQL (async) Serializable Static Partition 1 RT possible
Megastore Serializable Static Partition 2 RT -
Spanner/F1 Snapshot Isolation Partition 2 RT -
MDCC Read-Commited Multi-Record 1 RT -
Consistency Transactional UnitCommit Latency
Data Loss?
Multi-Data Center Consistency
T. Kraska et al. "MDCC: Multi-data center consistency." EuroSys, 2013.
Idea:• Multi-Data center commit protocol with
single round-trip
Implementation:• Optimistic Commit Protocol• Fast, Generalized Multi-Paxos
Result: almost as fast as Dynamo-style
Currently no NoSQL DB implementsconsistent Multi-DC replication
YCSB (Yahoo Cloud Serving Benchmark)
Benchmarking: Research
Data Store
YCSB (Yahoo Cloud Serving Benchmark)
Benchmarking: Research
Client
Wo
rkload
Gen
erator
Plu
ggable
DB
interface
Data Store
Threads
Stats
YCSB (Yahoo Cloud Serving Benchmark)
Benchmarking: Research
Client
Wo
rkload
Gen
erator
Plu
ggable
DB
interface
Workload:
1. Operation Mix
2. Record Size
3. Popularity Distribution
Runtime Parameters:
DB host name,
threads, etc.
Data Store
Threads
Stats
YCSB (Yahoo Cloud Serving Benchmark)
Benchmarking: Research
Client
Wo
rkload
Gen
erator
Plu
ggable
DB
interface
Workload:
1. Operation Mix
2. Record Size
3. Popularity Distribution
Runtime Parameters:
DB host name,
threads, etc.
Read()Insert()Update()Delete()Scan()
Data Store
Threads
Stats
DB protocol
YCSB (Yahoo Cloud Serving Benchmark)
Benchmarking: Research
Client
Wo
rkload
Gen
erator
Plu
ggable
DB
interface
Workload:
1. Operation Mix
2. Record Size
3. Popularity Distribution
Runtime Parameters:
DB host name,
threads, etc.
Read()Insert()Update()Delete()Scan()
Data Store
Threads
Stats
DB protocol
Workload Operation Mix Distribution Example
A – Update Heavy Read: 50%Update: 50%
Zipfian Session Store
B – Read Heavy Read: 95%Update: 5%
Zipfian Photo Tagging
C – Read Only Read: 100% Zipfian User Profile Cache
D – Read Latest Read: 95%Insert: 5%
Latest User Status Updates
E – Short Ranges Scan: 95%Insert: 5%
Zipfian/Uniform
Threaded Conversations
Example Result
(Read Heavy):
Benchmarking: Research
Example Result
(Read Heavy):
Benchmarking: Research
Weaknesses:• Single client can be a
bottleneck• No consistency &
availability measurement
Example Result
(Read Heavy):
Benchmarking: ResearchYCSB++
S. Patil, M. Polte, et al.„Ycsb++: benchmarking and performance debugging advanced features in scalable table stores“, SOCC 2011
• Clients coordinate throughZookeeper
• Simple Read-After-Write Checks• Evaluation: Hbase & Accumulo
Weaknesses:• Single client can be a
bottleneck• No consistency &
availability measurement
Example Result
(Read Heavy):
Benchmarking: ResearchYCSB++
S. Patil, M. Polte, et al.„Ycsb++: benchmarking and performance debugging advanced features in scalable table stores“, SOCC 2011
• Clients coordinate throughZookeeper
• Simple Read-After-Write Checks• Evaluation: Hbase & Accumulo
Weaknesses:• Single client can be a
bottleneck• No consistency &
availability measurement
• No Transaction Support
YCSB+T
A. Dey et al. “YCSB+T: Benchmarking Web-Scale Transactional Databases”, CloudDB 2014
• New workload: TransactionalBank Account
• Simple anomaly detection forLost Updates
• No comparison of systems
Example Result
(Read Heavy):
Benchmarking: ResearchYCSB++
S. Patil, M. Polte, et al.„Ycsb++: benchmarking and performance debugging advanced features in scalable table stores“, SOCC 2011
• Clients coordinate throughZookeeper
• Simple Read-After-Write Checks• Evaluation: Hbase & Accumulo
Weaknesses:• Single client can be a
bottleneck• No consistency &
availability measurement
• No Transaction Support
YCSB+T
A. Dey et al. “YCSB+T: Benchmarking Web-Scale Transactional Databases”, CloudDB 2014
• New workload: TransactionalBank Account
• Simple anomaly detection forLost Updates
• No comparison of systems
No specific applicationCloudStone, CARE, TPC
extensions?
Benchmarking: state of the art
Benchmarking: state of the art
Tests latency and throughputof IaaS-Providers, CDNs and
Object Stores
Benchmarking: state of the art
Tests latency and throughputof IaaS-Providers, CDNs and
Object Stores
Does not test: Cloud Databases
VisionYCSB Harmony
Idea:
VisionYCSB Harmony
YCSBClient
YCSBClient YCSB
Client
YCSBClient
Idea:1. Periodically
launch clients
VisionYCSB Harmony
YCSBClient
YCSBClient YCSB
Client
YCSBClient
Idea:1. Periodically
launch clients2. Benchmark
Cloud DB3. Publish results
VisionYCSB Harmony
System Availablity Reads/s Writes/s Avg. Latency 95th perc. Latency
Plot
DynamoDB 99.9% 23411 34534 3.2 ms 9 ms
RDS 99.8% 2342 2455 30 ms 80 ms
Azure Table 99.5% 22343 23442 12 ms 20 ms
GoogleDataStore
99.5% 3000 2000 30 ms 300 ms
Database research at theUniversity of Hamburg
Motivation
Classic 3-Tier-ArchitectureThree-Tier Architecture
Motivation
ClientApplicationDatabase
Web Server
Web Server
Classic 3-Tier-ArchitectureThree-Tier Architecture
Motivation
ClientApplicationDatabase
Web Server
Web Server
Classic 3-Tier-ArchitectureThree-Tier Architecture
Motivation
ClientApplicationDatabase
Web Server
Web Server
High Latency
Classic 3-Tier-ArchitectureThree-Tier Architecture
Motivation
ClientApplicationDatabase
Web Server
Web Server
Average (2014):90 HTTP Requests per
page load
High Latency
Classic 3-Tier-ArchitectureThree-Tier Architecture
Motivation
ClientApplicationDatabase
Web Server
Web Server
Average (2014):90 HTTP Requests per
page load
High Latency
Classic 3-Tier-ArchitectureThree-Tier ArchitectureWith every 100ms of additional page load time, revenue decreases by 1%.
Study by Amazon
Motivation
ClientApplicationDatabase
Web Server
Web Server
Average (2014):90 HTTP Requests per
page load
High Latency
Classic 3-Tier-ArchitectureThree-Tier ArchitectureWith every 100ms of additional page load time, revenue decreases by 1%.
Study by Amazon
When increasing load time of search results by 500ms, traffic decreases by 20%.
Study by Google
Motivation
ClientApplicationDatabase
Web Server
Web Server
Average (2014):90 HTTP Requests per
page load
High Latency
Classic 3-Tier-ArchitectureThree-Tier ArchitectureWith every 100ms of additional page load time, revenue decreases by 1%.
Study by Amazon
When increasing load time of search results by 500ms, traffic decreases by 20%.
Study by Google
SPAs
Rich ClientApplicationDatabase
Web Server
Web Server
Single-Page Applications
High Latency
(Rich Client, Smart Client)
Data (e.g. JSON)
(AngularJS, Backbone,
Ember.JS, etc.)
ORESTES
Rich ClientOrestesDB-Cluster
RESTServer
Low LatencyCloud
RESTServer
ORESTES
Rich ClientOrestesDB-Cluster
RESTServer
Web-Caches Low Latency
Cloud
RESTServer
CacheableDatabase Objects
ORESTES
Rich ClientOrestesDB-Cluster
RESTServer
Web-Caches Low Latency
ScalableNoSQL DBs
Cloud
RESTServer
CacheableDatabase Objects
ORESTES
Rich ClientOrestesDB-Cluster
RESTServer
Web-Caches Low Latency
ScalableNoSQL DBs
Cloud
Exposes the DB via REST API and handles:• Scaling• Cache Consistency• Transactions• Schema
RESTServer
CacheableDatabase Objects
Middleware:◦ Scalable REST API for aggregate-oriented
persistence, queries, schema management and transactions.
Caching:◦ Consistent web caching of database objects for
read scalability and latency reduction.
Transactions:◦ Optimistic cache-aware transaction model.
ORESTES: Components
Java/JDO
persist
find
createQueryJavaScript/JPA Port
REST/HTTP API
others
Application
Server
Browser or
Mobile Device
ApplicationLayer
PersistenceAPI
DataStore
Java/JDO
persist
find
createQueryJavaScript/JPA Port
REST/HTTP API
others
Application
Server
Browser or
Mobile Device
ApplicationLayer
PersistenceAPI
DataStore
HTTP Server
Trans-
actionsQuerys
Object
Persist.Schema
Key-
Value
Doc-
uments
DBaaS & BaaS Layer
Transaction Validation
HTTP Server
Config-
urationPartial
Updates
Access Control
Multi-Tenancy Schema Management
Workload Management
Cache Coherence
Autoscaling
Database-independentConcerns
Database -specific Wrappers
SLAs
HTTP Server
Java/JDO
persist
find
createQueryJavaScript/JPA Port
REST/HTTP API
others
Application
Server
Browser or
Mobile Device
ApplicationLayer
PersistenceAPI
DataStore
ISP
Forward-Proxy Caches
ISP Caches
Reverse-Proxy Caches and
Load Balancers
CDN Caches
Content DeliveryNetworks
Purge
Scale
HTTP Server
Trans-
actionsQuerys
Object
Persist.Schema
Key-
Value
Doc-
uments
DBaaS & BaaS Layer
Transaction Validation
HTTP Server
Config-
urationPartial
Updates
Access Control
Multi-Tenancy Schema Management
Workload Management
Cache Coherence
Autoscaling
Database-independentConcerns
Database -specific Wrappers
SLAs
HTTP Server
Java/JDO
persist
find
createQueryJavaScript/JPA Port
REST/HTTP API
others
Application
Server
Browser or
Mobile Device
ApplicationLayer
PersistenceAPI
DataStore
ISP
Forward-Proxy Caches
ISP Caches
Reverse-Proxy Caches and
Load Balancers
CDN Caches
Content DeliveryNetworks
Purge
Scale
HTTP Server
Trans-
actionsQuerys
Object
Persist.Schema
Key-
Value
Doc-
uments
DBaaS & BaaS Layer
Transaction Validation
HTTP Server
Config-
urationPartial
Updates
Access Control
Multi-Tenancy Schema Management
Workload Management
Cache Coherence
Autoscaling
Database-independentConcerns
Database -specific Wrappers
SLAs
HTTP Server
Redis (Replicated)
10201040
10101010Counting
Bloom Filter
add
delete
Node.JS (local to Server)
Stored Procedures
Custom Validation
Java/JDO
persist
find
createQueryJavaScript/JPA Port
REST/HTTP API
others
Application
Server
Browser or
Mobile Device
ApplicationLayer
PersistenceAPI
DataStore
ISP
Forward-Proxy Caches
ISP Caches
Reverse-Proxy Caches and
Load Balancers
CDN Caches
Content DeliveryNetworks
Purge
Scale
HTTP Server
Trans-
actionsQuerys
Object
Persist.Schema
Key-
Value
Doc-
uments
DBaaS & BaaS Layer
Transaction Validation
HTTP Server
Config-
urationPartial
Updates
Access Control
Multi-Tenancy Schema Management
Workload Management
Cache Coherence
Autoscaling
Database-independentConcerns
Database -specific Wrappers
SLAs
HTTP Server
Redis (Replicated)
10201040
10101010Counting
Bloom Filter
add
delete
Node.JS (local to Server)
Stored Procedures
Custom Validation
GET /db/{bucket}/{class}/{id}
200 OKCache-Control: public, max-age=6000ETag: "3"JSON Object
Caching
Orestes
Client-
(Browser-)
Cache
Proxy
CachesISP
CachesCDN
Caches
Reverse-
Proxy
Caches
Miss
Hit
MissMiss
MissMiss
100%
50%
0%
P(Cache-Hit)
0 ms 1 ms 10 ms 20 ms 50-500 ms 50-500 ms
Caching
Orestes
Client-
(Browser-)
Cache
Proxy
CachesISP
CachesCDN
Caches
Reverse-
Proxy
Caches
Miss
Hit
MissMiss
MissMiss
100%
50%
0%
P(Cache-Hit)
0 ms 1 ms 10 ms 20 ms 50-500 ms 50-500 ms
em.find(id) JavaScript
Caching
Orestes
Client-
(Browser-)
Cache
Proxy
CachesISP
CachesCDN
Caches
Reverse-
Proxy
Caches
Miss
Hit
MissMiss
MissMiss
100%
50%
0%
P(Cache-Hit)
0 ms 1 ms 10 ms 20 ms 50-500 ms 50-500 ms
GET /db/posts/{id} HTTP
Caching
Orestes
Client-
(Browser-)
Cache
Proxy
CachesISP
CachesCDN
Caches
Reverse-
Proxy
Caches
Miss
Hit
MissMiss
MissMiss
100%
50%
0%
P(Cache-Hit)
0 ms 1 ms 10 ms 20 ms 50-500 ms 50-500 ms
Cache-Hit: Return ObjectCache-Miss: Forward Request
Caching
Orestes
Client-
(Browser-)
Cache
Proxy
CachesISP
CachesCDN
Caches
Reverse-
Proxy
Caches
Miss
Hit
MissMiss
MissMiss
100%
50%
0%
P(Cache-Hit)
0 ms 1 ms 10 ms 20 ms 50-500 ms 50-500 ms
Fetch object from DB and returnit with caching information
Caching
Orestes
Client-
(Browser-)
Cache
Proxy
CachesISP
CachesCDN
Caches
Reverse-
Proxy
Caches
Miss
Hit
MissMiss
MissMiss
100%
50%
0%
P(Cache-Hit)
0 ms 1 ms 10 ms 20 ms 50-500 ms 50-500 ms
Scalability and Cache-Hits
Caching
Orestes
Client-
(Browser-)
Cache
Proxy
CachesISP
CachesCDN
Caches
Reverse-
Proxy
Caches
Miss
Hit
MissMiss
MissMiss
100%
50%
0%
P(Cache-Hit)
0 ms 1 ms 10 ms 20 ms 50-500 ms 50-500 ms
Scalability and Cache-Hits
Latency Benefit
App
App
BFB: Bloom Filter Bounded Staleness
Cache
1 4 020
App
App
BFB: Bloom Filter Bounded Staleness
Cache
1 4 020
How to prevent stalereads (inconsistency)?
App
App
BFB: Bloom Filter Bounded Staleness
Cache
1 4 020
App
App
BFB: Bloom Filter Bounded Staleness
Cache
1 4 020
App
App
BFB: Bloom Filter Bounded Staleness
Cache
1 4 020
App
App
BFB: Bloom Filter Bounded Staleness
Cache
1 4 020
purge(obj)
hashB(oid)hashA(oid)
13
App
App
BFB: Bloom Filter Bounded Staleness
Cache
1 4 020
purge(obj)
131 1 110Flat(Counting Bloomfilter)
App
App
BFB: Bloom Filter Bounded Staleness
Cache
1 4 020
purge(obj)
131 1 110
hashB(oid)hashA(oid)
App
App
BFB: Bloom Filter Bounded Staleness
Cache
1 4 020
purge(obj)
131 1 110
hashB(oid)hashA(oid)
App
App
BFB: Bloom Filter Bounded Staleness
Cache
1 4 020
purge(obj)
131 1 110
App
App
BFB: Bloom Filter Bounded Staleness
Cache
1 4 020
hashB(oid)hashA(oid)
1 1 110
App
App
BFB: Bloom Filter Bounded Staleness
Cache
1 4 020
hashB(oid)hashA(oid)
1 1 110
𝑓 ≈ 1 − 𝑒−𝑘𝑛𝑚
𝑘
𝑘 = ln 2 ⋅ (𝑛
𝑚)
False-Positive
Rate:
Hash-
Functions:
With 10.000 Updates per 10 minutes and 1% error rate: 12 KByte
App
App
BFB: Bloom Filter Bounded Staleness
Cache
1 4 020
hashB(oid)hashA(oid)
1 1 110
SCOT: Scalable Cache-Aware Optimistic Transactions
Cache
Cache
Cache
REST-Server
REST-Server
REST-Server
DB
Coordinator
Client
SCOT: Scalable Cache-Aware Optimistic Transactions
Cache
Cache
Cache
REST-Server
REST-Server
REST-Server
DB
Coordinator
Client
Begin Transaction
Bloom Filter1
SCOT: Scalable Cache-Aware Optimistic Transactions
Cache
Cache
Cache
REST-Server
REST-Server
REST-Server
DB
Coordinator
Client
Begin Transaction
Bloom Filter1
Reads
Writes2
Writes
(Hidden)
SCOT: Scalable Cache-Aware Optimistic Transactions
Cache
Cache
Cache
REST-Server
REST-Server
REST-Server
DB
Coordinator
Client
Begin Transaction
Bloom Filter1
Reads
Writes2
Commit: read- & write-set versions
Committed OR aborted + stale objects3
Writes
(Hidden)
SCOT: Scalable Cache-Aware Optimistic Transactions
Cache
Cache
Cache
REST-Server
REST-Server
REST-Server
DB
Coordinator
Client
Begin Transaction
Bloom Filter1
Reads
Writes2
Commit: read- & write-set versions
Committed OR aborted + stale objects3
Writes
(Hidden)
validation 4
E.g. Redis or ZooKeeper
prevent conflicting
validations
SCOT: Scalable Cache-Aware Optimistic Transactions
Cache
Cache
Cache
REST-Server
REST-Server
REST-Server
DB
Coordinator
Client
Begin Transaction
Bloom Filter1
Reads
Writes2
Commit: read- & write-set versions
Committed OR aborted + stale objects3
Writes
(Hidden)
validation 4
E.g. Redis or ZooKeeper
5Writes (Public)
Read all
prevent conflicting
validations
SCOT: Scalable Cache-Aware Optimistic Transactions
Cache
Cache
Cache
REST-Server
REST-Server
REST-Server
DB
Coordinator
Client
Begin Transaction
Bloom Filter1
Reads
Writes2
Commit: read- & write-set versions
Committed OR aborted + stale objects3
Writes
(Hidden)
validation 4
E.g. Redis or ZooKeeper
5Writes (Public)
Read all
prevent conflicting
validations
Caching → Shorter transactionduration → less aborts
PolyglotPersistence
application
Orestes servers
REST/HTTPprotocol
Redis MongoDB db4o
meta data contains SLA
parse SLA & route data
manage materialisation
resolve mapping
Polyglot Persistence Mediator
PolyglotPersistence
application
Orestes servers
REST/HTTPprotocol
Redis MongoDB db4o
meta data contains SLA
parse SLA & route data
manage materialisation
resolve mapping
Polyglot Persistence Mediator
PolyglotPersistence
application
Orestes servers
REST/HTTPprotocol
Redis MongoDB db4o
meta data contains SLA
parse SLA & route data
manage materialisation
resolve mapping
Polyglot Persistence Mediator
PolyglotPersistence
application
Orestes servers
REST/HTTPprotocol
Redis MongoDB db4o
meta data contains SLA
parse SLA & route data
manage materialisation
resolve mapping
Polyglot Persistence Mediator
Results:Article-Objects with Impression Count
Article
IDTitle…
Imp.
Imp.ID
MongoDB Redis Sorted Set
Speedup with PPM:• 50-1000%• 66% performance of Varnish
Cloud Evaluation of ORESTES
Client Machine
50 ...
Web
CacheOrestes
Server
Versant
DB
Amazon EC2 Ireland EC2 USA165 ms
Client Machine
Client Machine
Cloud Evaluation of ORESTES
Client Machine
50 ...
Web
CacheOrestes
Server
Versant
DB
Amazon EC2 Ireland EC2 USA165 ms
Client Machine
Client Machine
30 000 Objekte
500 Anfragen/
Client
30 000 Objects500 Req./Clients10/1 Read/Write
Cloud Evaluation of ORESTES
Client Machine
50 ...
Web
CacheOrestes
Server
Versant
DB
Amazon EC2 Ireland EC2 USA165 ms
Client Machine
Client Machine
Stateless Scale-out REST middleware for DBaaS
Generic Schema, Authentication, Multi-Tenancy, etc.
SCOT (Scalable Optimistic Cache-Aware Transactions):◦ Optimistic Database-indepdent ACID transactions
BFB (Bloom Filter Bounded Staleness):◦ Allows static caching with tunable consistency guarantee
PPM (Polyglot Persistence Mediator):◦ SLAs appropriate database
Orestes: Summary
From Research toPractice
Orestes as a startup
Baqend
Orestes as a startup
Baqend
Internet
Seoxy
REST-API Transactions Schema Management Cache Consistency
Auto-Scaling Multi-Tenancy Security and Access Control Provisioning
Orestes as a startup
Baqend
Orestes as a startup
Baqend
Orestes as a startup
Baqend
Orestes as a startup
Baqend
Orestes as a startup
Baqend
Baqend in Action
Baqend in ActionGET /app.html
Baqend in ActionGET /app.html
db.find(Menu, 'main').done(...);
db.find(Page, 'hero').done(...);
db.query(Page, 'top3').done(...);
Baqend in ActionGET /app.html
db.find(Menu, 'main').done(...);
db.find(Page, 'hero').done(...);
db.query(Page, 'top3').done(...);
GET /img/pic005.jpgGET /img/pic017.jpgGET /img/pic022.jpg
More: Baqend.com
Wrap-up & book recommendations
How to choose a cloud database:
Wrap-up
Managed RDBMS
Managed DWH
Managed NoSQL DB
Backend-as-a-Service
Proprietary Serivce
Object Store
How to choose a cloud database:
Wrap-up
Define your functionalrequirements
Define your non-functionalrequirements
Managed RDBMS
Managed DWH
Managed NoSQL DB
Backend-as-a-Service
Proprietary Serivce
Object Store
How to choose a cloud database:
Wrap-up
Define your functionalrequirements
Define your non-functionalrequirements
Managed RDBMS
Managed DWH
Managed NoSQL DB
Backend-as-a-Service
Proprietary Serivce
Object Store
1. Underyling DB2. Docs & books3. Your own tests
Evaluate by:
How to choose a cloud database:
Wrap-up
Define your functionalrequirements
Define your non-functionalrequirements
Managed RDBMS
Managed DWH
Managed NoSQL DB
Backend-as-a-Service
Proprietary Serivce
Object Store
1. Underyling DB2. Docs & books3. Your own tests
4. SLAs5. Docs & books6. Experience Reports
Evaluate by:
How to choose a cloud database:
Wrap-up
Define your functionalrequirements
Define your non-functionalrequirements
Managed RDBMS
Managed DWH
Managed NoSQL DB
Backend-as-a-Service
Proprietary Serivce
Object Store
1. Underyling DB2. Docs & books3. Your own tests
4. SLAs5. Docs & books6. Experience Reports
Evaluate by:
Try it
Book recommendations
Book recommendations
• (Non-scientific) literature is rare• Some books cover specific DBaaS and
cloud platforms
Blogs
http://nosql.mypopescu.com/
http://www.dzone.com/mz/nosql
http://www.dbms2.com/
http://hackingdistributed.com/
http://highscalability.com/
http://www.nosqlweekly.com/
VLDB (Very Large Databases)
SIGMOD (Special Interest Group on Management of Data)
ICDE (International Conference on Data Engineering)
CIDR (Conference on Innovative Data Systems Research)
SOCC (Symposium on Cloud Computing)
OSDI/SOSP (Operating Systems Design and
Implementation/ Symposium on Operating System Principles)
EuroSys
Top Scientific ConferencesD
atabase
Research
Distrib
uted
System
s R
esearch
VLDB (Very Large Databases)
SIGMOD (Special Interest Group on Management of Data)
ICDE (International Conference on Data Engineering)
CIDR (Conference on Innovative Data Systems Research)
SOCC (Symposium on Cloud Computing)
OSDI/SOSP (Operating Systems Design and
Implementation/ Symposium on Operating System Principles)
EuroSys
Top Scientific ConferencesD
atabase
Research
Distrib
uted
System
s R
esearch
This year probably in Washington D.C.Learn more: scdm2013.com
Thank you. Queries?
Contact:
http://baqend.comhttp://orestes.infohttp://scdm2013.com