cloud databases in research and practice

319
Felix Gessert (29.04.2014) Slides: baqend.com/nosql.pdf

Upload: felix-gessert

Post on 26-Jan-2015

114 views

Category:

Science


0 download

DESCRIPTION

The combination of database systems and cloud computing is extremely attractive: unlimited storage capacities, elastic scalability and as-a-Service models seem to be within reach. This talk will give an in-depth survey of existing solutions for cloud databases that evolved in the last years and provide classification and comparison. This includes real-world systems (e.g. Azure Tables, DynamoDB and Parse) as well as research approaches (e.g. RelationalCloud and ElasTras). In practice however, there are some unsolved problems. Network latency, scalable transactions, SLAs, multi-tenancy, abstract data modelling, elastic scalability and polyglot persistence pose daunting tasks for many scenarios. Therefore, we conclude with „Orestes“ a research approach based on well-known techniques such as web caching, Bloom filters and optimistic concurrency control that demonstrates how existing cloud databases can be enhanced to suit specific applications.

TRANSCRIPT

Page 1: Cloud Databases in Research and Practice

Felix Gessert(29.04.2014)

Slides: baqend.com/nosql.pdf

Page 2: Cloud Databases in Research and Practice

About me

PhD student (database group university of hamburg)

Page 3: Cloud Databases in Research and Practice

About me

PhD student (database group university of hamburg)

Research Project for PhD

Page 4: Cloud Databases in Research and Practice

About me

PhD student (database group university of hamburg)

Research Project for PhD

Cloud Database Startup

Page 5: Cloud Databases in Research and Practice

Outline

• Categories of Cloud Databases

• Properties

What are Cloud Databases?

Cloud Databases in the wild

Research Perspectives

Wrap-up andliterature

Page 6: Cloud Databases in Research and Practice

2003: GFS (Google)

2006: BigTable (Google)

2007: Dynamo (Amazon)

2007: S3 (Amazon)

2008: SimpleDB (Amazon)

2010: SQL Azure

2011: Parse

2012: DynamoDB (Amazon)

2013: Redshift (Amazon)

Page 7: Cloud Databases in Research and Practice

Context: What kinds of Cloud databases are there?

Page 8: Cloud Databases in Research and Practice

DBaaS

Infrastructure-as-a-Service

Platform-as-a-Service

…Database-as-a-ServiceCloud SQL

Amazon RDS

SQL Azure

Cloudant

MongoHQ

Parse

Orestes

Google F1

DynamoDB BigQuery

EMR GCS

S3

Page 9: Cloud Databases in Research and Practice

DBaaS

Infrastructure-as-a-Service

Platform-as-a-Service

…Database-as-a-ServiceCloud SQL

Amazon RDS

SQL Azure

Cloudant

MongoHQ

Parse

Orestes

Google F1

DynamoDB

ManagedRDBMSs

BigQuery

EMR GCS

S3

Page 10: Cloud Databases in Research and Practice

DBaaS

Infrastructure-as-a-Service

Platform-as-a-Service

…Database-as-a-ServiceCloud SQL

Amazon RDS

SQL Azure

Cloudant

MongoHQ

Parse

Orestes

Google F1

DynamoDB

ManagedRDBMSs

Cloud-Deploymentof DBMSs

BigQuery

EMR GCS

S3

Page 11: Cloud Databases in Research and Practice

DBaaS

Infrastructure-as-a-Service

Platform-as-a-Service

…Database-as-a-ServiceCloud SQL

Amazon RDS

SQL Azure

Cloudant

MongoHQ

Parse

Orestes

Google F1

DynamoDB

ManagedNoSQL DBs

ManagedRDBMSs

Cloud-Deploymentof DBMSs

BigQuery

EMR GCS

S3

Page 12: Cloud Databases in Research and Practice

DBaaS

Infrastructure-as-a-Service

Platform-as-a-Service

…Database-as-a-ServiceCloud SQL

Amazon RDS

SQL Azure

Cloudant

MongoHQ

Parse

Orestes

Google F1

DynamoDB

ManagedNoSQL DBs

ManagedRDBMSs

Cloud-Deploymentof DBMSs

Cloud-onlyDBaaS-Systems

BigQuery

EMR GCS

S3

Page 13: Cloud Databases in Research and Practice

DBaaS

Infrastructure-as-a-Service

Platform-as-a-Service

…Database-as-a-ServiceCloud SQL

Amazon RDS

SQL Azure

Cloudant

MongoHQ

Parse

Orestes

Google F1

DynamoDB

ManagedNoSQL DBs

ManagedRDBMSs

Cloud-Deploymentof DBMSs

Cloud-onlyDBaaS-Systems

BigQuery

EMR

Analytics-as-a-Service

GCS

S3

Page 14: Cloud Databases in Research and Practice

DBaaS

Infrastructure-as-a-Service

Platform-as-a-Service

…Database-as-a-ServiceCloud SQL

Amazon RDS

SQL Azure

Cloudant

MongoHQ

Parse

Orestes

Google F1

DynamoDB

ManagedNoSQL DBs

ManagedRDBMSs

Cloud-Deploymentof DBMSs

Cloud-onlyDBaaS-Systems

BigQuery

EMR

Analytics-as-a-Service

GCS

S3

ObjectStores

Page 15: Cloud Databases in Research and Practice

DBaaS

Infrastructure-as-a-Service

Platform-as-a-Service

…Database-as-a-ServiceCloud SQL

Amazon RDS

SQL Azure

Cloudant

MongoHQ

Parse

Orestes

Google F1

DynamoDB

ManagedNoSQL DBs

ManagedRDBMSs

Storage APIs

Cloud-Deploymentof DBMSs

Cloud-onlyDBaaS-Systems

BigQuery

EMR

Analytics-as-a-Service

GCS

S3

ObjectStores

Page 16: Cloud Databases in Research and Practice

DBaaS

Infrastructure-as-a-Service

Platform-as-a-Service

…Database-as-a-ServiceCloud SQL

Amazon RDS

SQL Azure

Cloudant

MongoHQ

Parse

Orestes

Google F1

DynamoDB

ManagedNoSQL DBs

ManagedRDBMSs

Backend-as-a-Service

Storage APIs

Cloud-Deploymentof DBMSs

Cloud-onlyDBaaS-Systems

BigQuery

EMR

Analytics-as-a-Service

GCS

S3

ObjectStores

Page 17: Cloud Databases in Research and Practice

Cloud-deployeddatabase

Data-Analytics-as-a-Service

Database-as-a-Service

SQL

ManagedRDBMS

ManagedDWH

NoSQL

ManagedNoSQL DB

Backend-as-a-Service

Proprietary or

Polyglot

Files Object Store

Cloud Databases

Page 18: Cloud Databases in Research and Practice

Cloud-deployeddatabase

Data-Analytics-as-a-Service

Database-as-a-Service

SQL

ManagedRDBMS

ManagedDWH

NoSQL

ManagedNoSQL DB

Backend-as-a-Service

Proprietary or

Polyglot

Files Object Store

Cloud Databases

Standard Interface

Page 19: Cloud Databases in Research and Practice

Cloud-deployeddatabase

Data-Analytics-as-a-Service

Database-as-a-Service

SQL

ManagedRDBMS

ManagedDWH

NoSQL

ManagedNoSQL DB

Backend-as-a-Service

Proprietary or

Polyglot

Files Object Store

Cloud Databases

Standard Interface

Vendor-specific

Interface

Page 20: Cloud Databases in Research and Practice

Application Architecture <-> Cloud Database Category

Architectures

Applications

Data Warehouse

Operative Database

Reporting Data MiningAnalytics

Data

Manag

emen

tData

Analy

tics

Page 21: Cloud Databases in Research and Practice

Application Architecture <-> Cloud Database Category

Architectures

Applications

Data Warehouse

Operative Database

Reporting Data MiningAnalytics

Data

Manag

emen

tData

Analy

tics

DBaaS

Page 22: Cloud Databases in Research and Practice

Application Architecture <-> Cloud Database Category

Architectures

Applications

Data Warehouse

Operative Database

Reporting Data MiningAnalytics

Data

Manag

emen

tData

Analy

tics

DBaaSProbaby not.

Page 23: Cloud Databases in Research and Practice

Cloud-Deployed Database

IaaS-Cloud

Page 24: Cloud Databases in Research and Practice

Cloud-Deployed Database

IaaS-Cloud

Cloud-deploy yourfavourite database system

Page 25: Cloud Databases in Research and Practice

Cloud-Deployed Database

IaaS-Cloud

Cloud-deploy yourfavourite database system

Does not solve:Provisioning, Backups, Security, Scaling, Elasticity, Performance Tuning, Failover, Replication, ...

Page 26: Cloud Databases in Research and Practice

Managed RDBMS/DWH/NoSQL DB

IaaS-Cloud

DBaaS-Provider

Page 27: Cloud Databases in Research and Practice

Managed RDBMS/DWH/NoSQL DB

IaaS-Cloud

RDBMS DWH NoSQL DB

DBaaS-Provider

Provisioning, Backups, Security, Scaling, Elasticity, Performance Tuning, Failover, Replication, ...

Page 28: Cloud Databases in Research and Practice

Managed RDBMS/DWH/NoSQL DB

IaaS-Cloud

RDBMS DWH NoSQL DB

DBaaS-Provider

Page 29: Cloud Databases in Research and Practice

Managed RDBMS/DWH/NoSQL DB

IaaS-Cloud

RDBMS DWH NoSQL DB

DBaaS-Provider

SQL Azure

Google

Cloud SQL

RD

BM

S

Page 30: Cloud Databases in Research and Practice

Managed RDBMS/DWH/NoSQL DB

IaaS-Cloud

RDBMS DWH NoSQL DB

DBaaS-Provider

SQL Azure

Google

Cloud SQL

RD

BM

SN

oSQ

LD

B

Page 31: Cloud Databases in Research and Practice

Managed RDBMS/DWH/NoSQL DB

IaaS-Cloud

RDBMS DWH NoSQL DB

DBaaS-Provider

Amazon Redshift

SQL Azure

Google

Cloud SQL

RD

BM

SN

oSQ

LD

BD

WH

Page 32: Cloud Databases in Research and Practice

Proprietary DB/Object Store

Cloud

Black-Box Databaseor file system

Managed byCloud Provider

Provid

er‘sA

PI

Page 33: Cloud Databases in Research and Practice

Proprietary DB/Object Store

Cloud

Black-Box Databaseor file system

Managed byCloud Provider

Provid

er‘sA

PI

Amazon

SimpleDB

Google Cloud

DatastoreAzure Tables

Database.com

BigTable, Megastore, Spanner, F1, Dynamo,

PNuts, Relational Cloud, …

Prop

rietaryD

B

Page 34: Cloud Databases in Research and Practice

Proprietary DB/Object Store

Cloud

Black-Box Databaseor file system

Managed byCloud Provider

Provid

er‘sA

PI

Amazon

SimpleDB

Google Cloud

Storage

Azure Blob

Storage

Google Cloud

DatastoreAzure Tables

Openstack

Swift

Database.com

BigTable, Megastore, Spanner, F1, Dynamo,

PNuts, Relational Cloud, …

Prop

rietaryD

BO

bject

Store

Page 35: Cloud Databases in Research and Practice

Backend-as-a-Service, Polyglot PersistenceService

IaaS-Cloud

Backend API

Service-Layer

Data API

Page 36: Cloud Databases in Research and Practice

Backend-as-a-Service, Polyglot PersistenceService

IaaS-Cloud

Backend API

Service-Layer

Data API

Authentication, Users, Validation,etc.

Maps to (different) databases

Page 37: Cloud Databases in Research and Practice

Backend-as-a-Service, Polyglot PersistenceService

IaaS-Cloud

Backend API

Service-Layer

Data API

Realtim

e B

aaS

Page 38: Cloud Databases in Research and Practice

Backend-as-a-Service, Polyglot PersistenceService

IaaS-Cloud

Backend API

Service-Layer

Data API

Realtim

e B

aaS

BaaS

AppCelerator

Cloud

Page 39: Cloud Databases in Research and Practice

Analytics-as-a-Service

Cloud

Analytics Cluster

Provisioning, Data Ingest

Page 40: Cloud Databases in Research and Practice

Analytics-as-a-Service

Cloud

Analytics Cluster

Provisioning, Data Ingest

Azure

HDInsight

Amazon Elastic

MapReduce

Had

oo

p

Page 41: Cloud Databases in Research and Practice

Analytics-as-a-Service

Cloud

Analytics Cluster

Provisioning, Data Ingest

Azure

HDInsight

Google

BigQuery

Google

Prediction API

Amazon Elastic

MapReduce

Had

oo

pC

usto

m

Page 42: Cloud Databases in Research and Practice

DBaaS: Common Aspects

#1 Metric: Total Cost

Daniela Florescu and Donald Kossmann “Rethinking cost and performance of database systems”, SIGMOD Rec. 2009.

Page 43: Cloud Databases in Research and Practice

DBaaS: Common Aspects

#1 Metric: Total Cost

Maximum utilization ofavailable hardware:

Multi-Tenancy

Daniela Florescu and Donald Kossmann “Rethinking cost and performance of database systems”, SIGMOD Rec. 2009.

Page 44: Cloud Databases in Research and Practice

DBaaS: Common Aspects

Multi-Tenancy - four common approaches:

T. Kiefer, W. Lehner “Private table database virtualization for dbaas” UCC, 2011

Private OS Private Process/DB Private Schema Shared Schema

Page 45: Cloud Databases in Research and Practice

DBaaS: Common Aspects

Multi-Tenancy - four common approaches:

T. Kiefer, W. Lehner “Private table database virtualization for dbaas” UCC, 2011

Private OS

VM

Hardware Resources

Database Process

Database

Schema

Private Process/DB Private Schema Shared Schema

e.g. Amazon RDS

Page 46: Cloud Databases in Research and Practice

DBaaS: Common Aspects

Multi-Tenancy - four common approaches:

T. Kiefer, W. Lehner “Private table database virtualization for dbaas” UCC, 2011

Private OS

VM

Hardware Resources

Database Process

Database

Schema

Private Process/DB Private Schema

VM

Hardware Resources

Database Process

Database

Schema

Shared Schema

e.g. Amazon RDS e.g. MongoHQ

Page 47: Cloud Databases in Research and Practice

DBaaS: Common Aspects

Multi-Tenancy - four common approaches:

T. Kiefer, W. Lehner “Private table database virtualization for dbaas” UCC, 2011

Private OS

VM

Hardware Resources

Database Process

Database

Schema

Private Process/DB Private Schema

VM

Hardware Resources

Database Process

Database

Schema

VM

Hardware Resources

Database Process

Database

Schema

Shared Schema

e.g. Amazon RDS e.g. MongoHQ e.g. Google DataStore

Page 48: Cloud Databases in Research and Practice

DBaaS: Common Aspects

Multi-Tenancy - four common approaches:

T. Kiefer, W. Lehner “Private table database virtualization for dbaas” UCC, 2011

Private OS

VM

Hardware Resources

Database Process

Database

Schema

Private Process/DB Private Schema

VM

Hardware Resources

Database Process

Database

Schema

VM

Hardware Resources

Database Process

Database

Schema

Shared Schema

VM

Hardware Resources

Database Process

Database

Schema

Virtual Schema

e.g. Amazon RDS e.g. MongoHQ e.g. Google DataStore Most SaaS Apps

Page 49: Cloud Databases in Research and Practice

DBaaS: Common Aspects

Multi-Tenancy - four common approaches:

W. Lehner, U. Sattler “Web-scale Data Management for the Cloud” Springer, 2013

Private OS

Private Process/DB

Private Schema

Shared Schema

App.indep.

IsolationRessource

Util.Maintenance,Provisioning

Page 50: Cloud Databases in Research and Practice

Billing Models:

DBaaS: Common Aspects

Usage

Account

Page 51: Cloud Databases in Research and Practice

Billing Models:

DBaaS: Common Aspects

Usage

Account

Pay-per-useParameters: Network, Bandwidth, Storage, CPU, Requests, etc.Payment: Pre-Paid, Post-PaidVariants: On-Demand, Auction, Reserved

e.g. DynamoDB

Page 52: Cloud Databases in Research and Practice

Billing Models:

DBaaS: Common Aspects

Usage

Account

End ofmonth

Plan-basedParameters: Allocated Plan (e.g. 2 instances + X GB storage)

e.g. MongoHQ

Page 53: Cloud Databases in Research and Practice

Billing Models:

DBaaS: Common Aspects

Usage

Account

End ofmonth

Plan-basedParameters: Allocated Plan (e.g. 2 instances + X GB storage)

Free Tier: free plan or free initial account credit

e.g. MongoHQ

Page 54: Cloud Databases in Research and Practice

DBaaS: Common Aspects

Database-a-

a-Service

Authentication

Authorization

API

Page 55: Cloud Databases in Research and Practice

DBaaS: Common Aspects

Database-a-

a-Service

Authentication

Authorization

API

Authenticate

Page 56: Cloud Databases in Research and Practice

DBaaS: Common Aspects

Internal Schemes External IdentityProvider

Federated Identity (SSO)

e.g. Amazon IAM e.g. OpenID e.g. SAML

Database-a-

a-Service

Authentication

Authorization

API

Authenticate

Page 57: Cloud Databases in Research and Practice

DBaaS: Common Aspects

Internal Schemes External IdentityProvider

Federated Identity (SSO)

e.g. Amazon IAM e.g. OpenID e.g. SAML

Used extensively

Database-a-

a-Service

Authentication

Authorization

API

Authenticate

Page 58: Cloud Databases in Research and Practice

DBaaS: Common Aspects

Internal Schemes External IdentityProvider

Federated Identity (SSO)

e.g. Amazon IAM e.g. OpenID e.g. SAML

Used extensively

Database-a-

a-Service

Authentication

Authorization

API

Authenticate

Token

Page 59: Cloud Databases in Research and Practice

DBaaS: Common Aspects

Internal Schemes External IdentityProvider

Federated Identity (SSO)

e.g. Amazon IAM e.g. OpenID e.g. SAML

Used extensively

Database-a-

a-Service

Authentication

Authorization

API

Authenticate

Token

Authenticated Request

Page 60: Cloud Databases in Research and Practice

DBaaS: Common Aspects

Internal Schemes External IdentityProvider

Federated Identity (SSO)

e.g. Amazon IAM e.g. OpenID e.g. SAML

Used extensively

User-based Access Control

Role-based Access Control

Policies

e.g. Amazon S3 ACLs e.g. Amazon IAM e.g. XACML

Database-a-

a-Service

Authentication

Authorization

API

Authenticate

Token

Authenticated Request

Page 61: Cloud Databases in Research and Practice

DBaaS: Common Aspects

Internal Schemes External IdentityProvider

Federated Identity (SSO)

e.g. Amazon IAM e.g. OpenID e.g. SAML

Used extensively

User-based Access Control

Role-based Access Control

Policies

e.g. Amazon S3 ACLs e.g. Amazon IAM e.g. XACML

Database-a-

a-Service

Authentication

Authorization

API

Authenticate

Token

Authenticated Request

Response

Page 62: Cloud Databases in Research and Practice

DBaaS: Common Aspects

Internal Schemes External IdentityProvider

Federated Identity (SSO)

e.g. Amazon IAM e.g. OpenID e.g. SAML

Used extensively

User-based Access Control

Role-based Access Control

Policies

e.g. Amazon S3 ACLs e.g. Amazon IAM e.g. XACML

Database-a-

a-Service

Authentication

Authorization

API

Authenticate

Token

Authenticated Request

Response

Page 63: Cloud Databases in Research and Practice

DBaaS: Common Aspects

Internal Schemes External IdentityProvider

Federated Identity (SSO)

e.g. Amazon IAM e.g. OpenID e.g. SAML

Used extensively

User-based Access Control

Role-based Access Control

Policies

e.g. Amazon S3 ACLs e.g. Amazon IAM e.g. XACML

Database-a-

a-Service

Authentication

Authorization

API

Authenticate

Token

Authenticated Request

Response

Federated ACLs

M. Decat, B. Lagaisse, et al. “Toward efficient and confidentiality-aware federation of access control policies “, DOA Trusted Cloud 2013

• Imagine ACL: „Patient data can only beaccessed by treating physician“

• Idea: decompose policy and evaluate parts neardata owner

Page 64: Cloud Databases in Research and Practice

Service Level Agreements

DBaaS: Common Aspects

SLA

Page 65: Cloud Databases in Research and Practice

Service Level Agreements

DBaaS: Common Aspects

SLA

Legal Part1. Fees2. Penalties

Technical Part1. SLO2. SLO3. SLO

Page 66: Cloud Databases in Research and Practice

Service Level Agreements

DBaaS: Common Aspects

SLA

Legal Part1. Fees2. Penalties

Technical Part1. SLO2. SLO3. SLO

Service Level Objectives:• Availability• Durability• Consistency/Staleness• Query Response Time

Page 67: Cloud Databases in Research and Practice

SLAs – achieved through Workload Management

DBaaS: Common Aspects

W. Lehner, U. Sattler “Web-scale Data Management for the Cloud” Springer, 2013

Page 68: Cloud Databases in Research and Practice

SLAs – achieved through Workload Management

DBaaS: Common Aspects

W. Lehner, U. Sattler “Web-scale Data Management for the Cloud” Springer, 2013

Page 69: Cloud Databases in Research and Practice

SLAs – achieved through Workload Management

DBaaS: Common Aspects

W. Lehner, U. Sattler “Web-scale Data Management for the Cloud” Springer, 2013

Page 70: Cloud Databases in Research and Practice

SLAs – achieved through Workload Management

DBaaS: Common Aspects

W. Lehner, U. Sattler “Web-scale Data Management for the Cloud” Springer, 2013

Page 71: Cloud Databases in Research and Practice

SLAs – achieved through Workload Management

DBaaS: Common Aspects

W. Lehner, U. Sattler “Web-scale Data Management for the Cloud” Springer, 2013

Maximize:

Page 72: Cloud Databases in Research and Practice

SLAs – achieved through Workload Management

DBaaS: Common Aspects

W. Lehner, U. Sattler “Web-scale Data Management for the Cloud” Springer, 2013

Page 73: Cloud Databases in Research and Practice

SLAs – achieved through Workload Management

DBaaS: Common Aspects

W. Lehner, U. Sattler “Web-scale Data Management for the Cloud” Springer, 2013

QOS for NoSQL DBs

Y. Zhu et al. “Scheduling with Freshness and Performance Guarantees for Web Applications in the Cloud“, CRPIT

Old: Workload management in RDBMs (DB2 andOracle)

New:Use well-known scheduling algorithms for queriesin replicated DBs

Page 74: Cloud Databases in Research and Practice

Resource Provisionig

Goal: Resources ⇔ SLAs

DBaaS: Common Aspects

T. Lorido-Botran, J. Miguel-Alonso et al.: “Auto-scaling Techniques forElastic Applications in Cloud Environments”. Technical Report, 2013

Resources

Time

Page 75: Cloud Databases in Research and Practice

Resource Provisionig

Goal: Resources ⇔ SLAs

DBaaS: Common Aspects

T. Lorido-Botran, J. Miguel-Alonso et al.: “Auto-scaling Techniques forElastic Applications in Cloud Environments”. Technical Report, 2013

Resources

Time

Expected Load

Page 76: Cloud Databases in Research and Practice

Resource Provisionig

Goal: Resources ⇔ SLAs

DBaaS: Common Aspects

T. Lorido-Botran, J. Miguel-Alonso et al.: “Auto-scaling Techniques forElastic Applications in Cloud Environments”. Technical Report, 2013

Resources

Time

Expected Load

Provisioned Resources:• #No of Shard- or Replica

servers• Computing, Storage,

Network Capacities

Page 77: Cloud Databases in Research and Practice

Resource Provisionig

Goal: Resources ⇔ SLAs

DBaaS: Common Aspects

T. Lorido-Botran, J. Miguel-Alonso et al.: “Auto-scaling Techniques forElastic Applications in Cloud Environments”. Technical Report, 2013

Resources

Time

Actual Load

Page 78: Cloud Databases in Research and Practice

Resource Provisionig

Goal: Resources ⇔ SLAs

DBaaS: Common Aspects

T. Lorido-Botran, J. Miguel-Alonso et al.: “Auto-scaling Techniques forElastic Applications in Cloud Environments”. Technical Report, 2013

Resources

Time

Actual Load

Overprovisioning:• SLAs met• Excess Capacities

Page 79: Cloud Databases in Research and Practice

Resource Provisionig

Goal: Resources ⇔ SLAs

DBaaS: Common Aspects

T. Lorido-Botran, J. Miguel-Alonso et al.: “Auto-scaling Techniques forElastic Applications in Cloud Environments”. Technical Report, 2013

Resources

Time

Actual Load

Overprovisioning:• SLAs met• Excess Capacities

Underprovisioning:• SLAs violated• Usage maximized

Page 80: Cloud Databases in Research and Practice

Resource Provisionig

Goal: Resources ⇔ SLAs

DBaaS: Common Aspects

T. Lorido-Botran, J. Miguel-Alonso et al.: “Auto-scaling Techniques forElastic Applications in Cloud Environments”. Technical Report, 2013

Resources

Time

Actual Load

Overprovisioning:• SLAs met• Excess Capacities

Underprovisioning:• SLAs violated• Usage maximized

SmartSLA

P. Xiong: “Intelligent management of virtualized resources for database systems in cloud environment”, ICDE 2011

Solution: machine learning (regression + boosting) for prediction choose allocation that minimizesSLA penalties

Resourceallocation

Databaseperformance

LearnMapping

Page 81: Cloud Databases in Research and Practice

Functional Requirements

Scan-Querys

Conditional Updates

Transactions

Query by Example

Joins

Analytics

Elasticity

Consistency

Read-Latency

Write-Latency

Write-Throughput

Scalability of Data Volume

Read Scalability

Read-Availability

Write-Availability

Non-Functional Requirements

Durability

Write Scalability

DBaaS: General Considerations

Page 82: Cloud Databases in Research and Practice

Functional Requirements

Scan-Querys

Conditional Updates

Transactions

Query by Example

Joins

Analytics

Elasticity

Consistency

Read-Latency

Write-Latency

Write-Throughput

Scalability of Data Volume

Read Scalability

Read-Availability

Write-Availability

Non-Functional Requirements

Durability

Write Scalability

DBaaS: General Considerations

Page 83: Cloud Databases in Research and Practice

Functional Requirements

Scan-Querys

Conditional Updates

Transactions

Query by Example

Joins

Analytics

Elasticity

Consistency

Read-Latency

Write-Latency

Write-Throughput

Scalability of Data Volume

Read Scalability

Read-Availability

Write-Availability

Non-Functional Requirements

Durability

Write Scalability

DBaaS: General Considerations

Page 84: Cloud Databases in Research and Practice

Functional Requirements

Scan-Querys

Conditional Updates

Transactions

Query by Example

Joins

Analytics

Elasticity

Consistency

Read-Latency

Write-Latency

Write-Throughput

Scalability of Data Volume

Read Scalability

Read-Availability

Write-Availability

Non-Functional Requirements

Durability

Write Scalability

DBaaS: General Considerations

aaS

Page 85: Cloud Databases in Research and Practice

Functional Requirements

Scan-Querys

Conditional Updates

Transactions

Query by Example

Joins

Analytics

Elasticity

Consistency

Read-Latency

Write-Latency

Write-Throughput

Scalability of Data Volume

Read Scalability

Read-Availability

Write-Availability

Non-Functional Requirements

Durability

Write Scalability

DBaaS: General Considerations

aaS

Questions to ask: • Which requirements are met by the DB?• Which are met by the provider? SLAs

Page 86: Cloud Databases in Research and Practice

Outline

Examples of different clouddatabase systems:• Cloud-deployed• Managed DBMS

• SQL• NoSQL

• Proprietary• BaaS

What are Cloud Databases?

Cloud Databases in the wild

Research Perspectives

Wrap-up andliterature

Page 87: Cloud Databases in Research and Practice

Idea: Run (mostly) unmodified DB on IaaS

Cloud-Deployed DB

Method I: DIY

Method II: Deployment Tools

Method III: Marketplaces

Page 88: Cloud Databases in Research and Practice

Idea: Run (mostly) unmodified DB on IaaS

Cloud-Deployed DB

Method I: DIY

Method II: Deployment Tools

Method III: Marketplaces

1. Provision VM(s)

Page 89: Cloud Databases in Research and Practice

Idea: Run (mostly) unmodified DB on IaaS

Cloud-Deployed DB

Method I: DIY

Method II: Deployment Tools

Method III: Marketplaces

1. Provision VM(s) 2. Install DBMS (manual, script, Chef, Puppet)

Page 90: Cloud Databases in Research and Practice

Idea: Run (mostly) unmodified DB on IaaS

Cloud-Deployed DB

Method I: DIY

Method II: Deployment Tools

Method III: Marketplaces

> whirr launch-cluster --confighbase.properties

Login, cluster-size etc. Amazon EC2

1. Provision VM(s) 2. Install DBMS (manual, script, Chef, Puppet)

Page 91: Cloud Databases in Research and Practice

Idea: Run (mostly) unmodified DB on IaaS

Cloud-Deployed DB

Method I: DIY

Method II: Deployment Tools

Method III: Marketplaces

> whirr launch-cluster --confighbase.properties

Login, cluster-size etc. Amazon EC2

1. Provision VM(s) 2. Install DBMS (manual, script, Chef, Puppet)

Page 92: Cloud Databases in Research and Practice

Idea: Run preconfigured DB on IaaS

AWS Marketplace AWS

Marketplace

Model:

Cloud-Deployed

Pricing:

Instance +

Volume +

License

Underlying DB:

Choosable

API:

DB-specific

Page 93: Cloud Databases in Research and Practice

Idea: Run preconfigured DB on IaaS

AWS Marketplace AWS

Marketplace

Model:

Cloud-Deployed

Pricing:

Instance +

Volume +

License

Underlying DB:

Choosable

API:

DB-specific

Page 94: Cloud Databases in Research and Practice

Idea: Run preconfigured DB on IaaS

AWS Marketplace AWS

Marketplace

Model:

Cloud-Deployed

Pricing:

Instance +

Volume +

License

Underlying DB:

Choosable

API:

DB-specific

Page 95: Cloud Databases in Research and Practice

Idea: Run preconfigured DB on IaaS

AWS Marketplace AWS

Marketplace

Model:

Cloud-Deployed

Pricing:

Instance +

Volume +

License

Underlying DB:

Choosable

API:

DB-specific

Page 96: Cloud Databases in Research and Practice

Idea: Run preconfigured DB on IaaS

AWS Marketplace AWS

Marketplace

Model:

Cloud-Deployed

Pricing:

Instance +

Volume +

License

Underlying DB:

Choosable

API:

DB-specific

Page 97: Cloud Databases in Research and Practice

Idea: Run preconfigured DB on IaaS

AWS Marketplace AWS

Marketplace

Model:

Cloud-Deployed

Pricing:

Instance +

Volume +

License

Underlying DB:

Choosable

API:

DB-specific

Page 98: Cloud Databases in Research and Practice

Idea: Run preconfigured DB on IaaS

AWS Marketplace AWS

Marketplace

Model:

Cloud-Deployed

Pricing:

Instance +

Volume +

License

Underlying DB:

Choosable

API:

DB-specific

Page 99: Cloud Databases in Research and Practice

Idea: Run preconfigured DB on IaaS

AWS Marketplace AWS

Marketplace

Model:

Cloud-Deployed

Pricing:

Instance +

Volume +

License

Underlying DB:

Choosable

API:

DB-specific

Bad: • No Clusters• Not managed (automatic Updates, Snapshots, etc.)• Private OS Multi-Tenancy Bad Resource Usage

Good:

• Easy to get started

Page 100: Cloud Databases in Research and Practice

Amazon Elastic MapReduce EMR

Model:

Analytics-aaS

Pricing:

Infrastructure

API:

Hadoop

Amazon Elastic

MapReduce

W. Lehner, U. Sattler “Web-scale Data Management for the Cloud” Springer, 2013

Page 101: Cloud Databases in Research and Practice

Amazon Elastic MapReduce EMR

Model:

Analytics-aaS

Pricing:

Infrastructure

API:

Hadoop

Amazon Elastic

MapReduce

Provisions

W. Lehner, U. Sattler “Web-scale Data Management for the Cloud” Springer, 2013

Page 102: Cloud Databases in Research and Practice

Amazon Elastic MapReduce EMR

Model:

Analytics-aaS

Pricing:

Infrastructure

API:

Hadoop

Amazon Elastic

MapReduce

Job Tracker

Task Tracker + HDFS Data Node

Task Tracker

Provisions

W. Lehner, U. Sattler “Web-scale Data Management for the Cloud” Springer, 2013

Page 103: Cloud Databases in Research and Practice

Amazon Elastic MapReduce EMR

Model:

Analytics-aaS

Pricing:

Infrastructure

API:

Hadoop

Amazon Elastic

MapReduce

Data Source and Sink

Job Tracker

Task Tracker + HDFS Data Node

Task Tracker

Provisions

W. Lehner, U. Sattler “Web-scale Data Management for the Cloud” Springer, 2013

Page 104: Cloud Databases in Research and Practice

Amazon Elastic MapReduce EMR

Model:

Analytics-aaS

Pricing:

Infrastructure

API:

Hadoop

Amazon Elastic

MapReduce

Submits Hadoop Jobs as:• JAR• Streaming• Cascading• Pig• Hive• Impala

Data Source and Sink

Job Tracker

Task Tracker + HDFS Data Node

Task Tracker

Provisions

W. Lehner, U. Sattler “Web-scale Data Management for the Cloud” Springer, 2013

Page 105: Cloud Databases in Research and Practice

Amazon Elastic MapReduce EMR

Model:

Analytics-aaS

Pricing:

Infrastructure

API:

Hadoop

Amazon Elastic

MapReduce

Submits Hadoop Jobs as:• JAR• Streaming• Cascading• Pig• Hive• Impala

Data Source and Sink

Job Tracker

Task Tracker + HDFS Data Node

Task Tracker

Provisions

W. Lehner, U. Sattler “Web-scale Data Management for the Cloud” Springer, 2013

• No data locality with S3• AWS Import/Export: send your HDD• HBase Integration• Compatible with Spot and Reserved Instances• Similar: Azure HDInsight

Page 106: Cloud Databases in Research and Practice

Idea: Web-scale analysis of nested data

Google BigQuery BigQuery

Model:

Analytics-aaS

Pricing:

Storage + GBs

Processed

API:

REST

Google

BigQuery

Page 107: Cloud Databases in Research and Practice

Idea: Web-scale analysis of nested data

Google BigQuery BigQuery

Model:

Analytics-aaS

Pricing:

Storage + GBs

Processed

API:

REST

Google

BigQuery

Page 108: Cloud Databases in Research and Practice

Idea: Web-scale analysis of nested data

Google BigQuery BigQuery

Model:

Analytics-aaS

Pricing:

Storage + GBs

Processed

API:

REST

Google

BigQuery

Dremel

Melnik et al. “Dremel: Interactive analysis of web-scale datasets”, VLDB 2010

Idea:Multi-Level execution tree on nested columnar data format(≥100 nodes)

Page 109: Cloud Databases in Research and Practice

Idea: Web-scale analysis of nested data

Google BigQuery BigQuery

Model:

Analytics-aaS

Pricing:

Storage + GBs

Processed

API:

REST

Google

BigQuery

Dremel

Melnik et al. “Dremel: Interactive analysis of web-scale datasets”, VLDB 2010

Idea:Multi-Level execution tree on nested columnar data format(≥100 nodes)

• SLA: 99.9% uptime / month• Fundamentally different from relational DWHs

and MapReduce• Design copied by Apache Drill, Impala, Shark

Page 110: Cloud Databases in Research and Practice

Relational Database Service

Amazon RDS RDS

Model:

Managed RDBMS

Pricing:

Instance + Volume

+ License

Underlying DB:

MySQL, Postgres,

MSSQL, Oracle

API:

DB-specific

Page 111: Cloud Databases in Research and Practice

Relational Database Service

Amazon RDS RDS

Model:

Managed RDBMS

Pricing:

Instance + Volume

+ License

Underlying DB:

MySQL, Postgres,

MSSQL, Oracle

API:

DB-specific

Page 112: Cloud Databases in Research and Practice

Relational Database Service

Amazon RDS RDS

Model:

Managed RDBMS

Pricing:

Instance + Volume

+ License

Underlying DB:

MySQL, Postgres,

MSSQL, Oracle

API:

DB-specific

• Synchronous Replication• Automatic Failover

Page 113: Cloud Databases in Research and Practice

Relational Database Service

Amazon RDS RDS

Model:

Managed RDBMS

Pricing:

Instance + Volume

+ License

Underlying DB:

MySQL, Postgres,

MSSQL, Oracle

API:

DB-specific

• Synchronous Replication• Automatic Failover

99,95% uptime SLA

Page 114: Cloud Databases in Research and Practice

Relational Database Service

Amazon RDS RDS

Model:

Managed RDBMS

Pricing:

Instance + Volume

+ License

Underlying DB:

MySQL, Postgres,

MSSQL, Oracle

API:

DB-specific

• Synchronous Replication• Automatic Failover

99,95% uptime SLA

Provisioned IOPS: access to EBS volumes network-optimized (up to 4000 IOPS)

Page 115: Cloud Databases in Research and Practice

Relational Database Service

Amazon RDS RDS

Model:

Managed RDBMS

Pricing:

Instance + Volume

+ License

Underlying DB:

MySQL, Postgres,

MSSQL, Oracle

API:

DB-specific

Page 116: Cloud Databases in Research and Practice

Relational Database Service

Amazon RDS RDS

Model:

Managed RDBMS

Pricing:

Instance + Volume

+ License

Underlying DB:

MySQL, Postgres,

MSSQL, Oracle

API:

DB-specific

EC2 instances: Up to 32 Cores, 244 GB RAM, 10 GbE

Page 117: Cloud Databases in Research and Practice

Relational Database Service

Amazon RDS RDS

Model:

Managed RDBMS

Pricing:

Instance + Volume

+ License

Underlying DB:

MySQL, Postgres,

MSSQL, Oracle

API:

DB-specific

EC2 instances: Up to 32 Cores, 244 GB RAM, 10 GbE

Minor Version Upgrades are performed without downtime

Page 118: Cloud Databases in Research and Practice

Relational Database Service

Amazon RDS RDS

Model:

Managed RDBMS

Pricing:

Instance + Volume

+ License

Underlying DB:

MySQL, Postgres,

MSSQL, Oracle

API:

DB-specific

Page 119: Cloud Databases in Research and Practice

Relational Database Service

Amazon RDS RDS

Model:

Managed RDBMS

Pricing:

Instance + Volume

+ License

Underlying DB:

MySQL, Postgres,

MSSQL, Oracle

API:

DB-specific

Backups are automated and scheduled

Page 120: Cloud Databases in Research and Practice

Relational Database Service

Amazon RDS RDS

Model:

Managed RDBMS

Pricing:

Instance + Volume

+ License

Underlying DB:

MySQL, Postgres,

MSSQL, Oracle

API:

DB-specific

Backups are automated and scheduled

• Support for (asynchronous) Read Replicas• Administration: Web-based or SDKs• Only RDBMSs• “Analytic Brother“ of RDS: RedShift (PDWH)

Page 121: Cloud Databases in Research and Practice

Similar to RDS

Microsoft SQL Azure SQL Azure

Model:

Managed RDBMS

Pricing:

Database size

Underlying DB:

MSSQL Server

API:

T-SQL/TDS

SQL Azure

Page 122: Cloud Databases in Research and Practice

Similar to RDS

Microsoft SQL Azure SQL Azure

Model:

Managed RDBMS

Pricing:

Database size

Underlying DB:

MSSQL Server

API:

T-SQL/TDS

SQL Azure

Page 123: Cloud Databases in Research and Practice

Similar to RDS

Microsoft SQL Azure SQL Azure

Model:

Managed RDBMS

Pricing:

Database size

Underlying DB:

MSSQL Server

API:

T-SQL/TDS

SQL Azure

Page 124: Cloud Databases in Research and Practice

Similar to RDS

Microsoft SQL Azure SQL Azure

Model:

Managed RDBMS

Pricing:

Database size

Underlying DB:

MSSQL Server

API:

T-SQL/TDS

SQL Azure

Page 125: Cloud Databases in Research and Practice

Similar to RDS

Microsoft SQL Azure SQL Azure

Model:

Managed RDBMS

Pricing:

Database size

Underlying DB:

MSSQL Server

API:

T-SQL/TDS

SQL Azure

Page 126: Cloud Databases in Research and Practice

Similar to RDS

Microsoft SQL Azure SQL Azure

Model:

Managed RDBMS

Pricing:

Database size

Underlying DB:

MSSQL Server

API:

T-SQL/TDS

SQL Azure

Cloud SQL Server

P. Bernstein et al. “Adapting Microsoft SQL server for cloud computing”, ICDE 2011

• Multi-Tenant MSSQL• Paxos-like commit protocol for

consistent replication

Page 127: Cloud Databases in Research and Practice

Similar to RDS

Microsoft SQL Azure SQL Azure

Model:

Managed RDBMS

Pricing:

Database size

Underlying DB:

MSSQL Server

API:

T-SQL/TDS

SQL Azure

Keyless Table Group: regular databaseKeyed Table Group: partitioned by row key

Cloud SQL Server

P. Bernstein et al. “Adapting Microsoft SQL server for cloud computing”, ICDE 2011

• Multi-Tenant MSSQL• Paxos-like commit protocol for

consistent replication

Page 128: Cloud Databases in Research and Practice

Similar to RDS

Microsoft SQL Azure SQL Azure

Model:

Managed RDBMS

Pricing:

Database size

Underlying DB:

MSSQL Server

API:

T-SQL/TDS

SQL Azure

Keyless Table Group: regular databaseKeyed Table Group: partitioned by row key

Consistency unit (ACID boundary)

Cloud SQL Server

P. Bernstein et al. “Adapting Microsoft SQL server for cloud computing”, ICDE 2011

• Multi-Tenant MSSQL• Paxos-like commit protocol for

consistent replication

Page 129: Cloud Databases in Research and Practice

Similar to RDS

Microsoft SQL Azure SQL Azure

Model:

Managed RDBMS

Pricing:

Database size

Underlying DB:

MSSQL Server

API:

T-SQL/TDS

SQL Azure

Keyless Table Group: regular databaseKeyed Table Group: partitioned by row key

Consistency unit (ACID boundary)

Automatic Partitioning for Keyed Table Groups

Cloud SQL Server

P. Bernstein et al. “Adapting Microsoft SQL server for cloud computing”, ICDE 2011

• Multi-Tenant MSSQL• Paxos-like commit protocol for

consistent replication

Page 130: Cloud Databases in Research and Practice

Similar to RDS

Microsoft SQL Azure SQL Azure

Model:

Managed RDBMS

Pricing:

Database size

Underlying DB:

MSSQL Server

API:

T-SQL/TDS

SQL Azure

Keyless Table Group: regular databaseKeyed Table Group: partitioned by row key

Consistency unit (ACID boundary)

Automatic Partitioning for Keyed Table Groups

Cloud SQL Server

P. Bernstein et al. “Adapting Microsoft SQL server for cloud computing”, ICDE 2011

• Multi-Tenant MSSQL• Paxos-like commit protocol for

consistent replication

• SLA: 99.9% uptime / month• Usually Cheaper than RDS (Multi-Tenancy)• Smaller Databases (max. 150 GB)• Rich MSSQL server tooling• Keyed Table Group internal feature only

Page 131: Cloud Databases in Research and Practice

MySQL for Google App Engine PaaS

Support for: patching, replication, backup

SLA: 99,95 % uptime / month

Other RDBMS servicesGoogle

Cloud SQL Google Cloud SQL

Pricing:

Database size

Underlying DB:

MySQL

Page 132: Cloud Databases in Research and Practice

MySQL for Google App Engine PaaS

Support for: patching, replication, backup

SLA: 99,95 % uptime / month

Postgres for Heroku PaaS

Hosted on EC2

No SLAs

Other RDBMS servicesGoogle

Cloud SQL Google Cloud SQL

Pricing:

Database size

Underlying DB:

MySQL

Heroku Postgres

Pricing:

Plan based

Underlying DB:

Postgres

Page 133: Cloud Databases in Research and Practice

MySQL for Google App Engine PaaS

Support for: patching, replication, backup

SLA: 99,95 % uptime / month

Postgres for Heroku PaaS

Hosted on EC2

No SLAs

MySQL for OpenStack (Icehouse)

Under development (HP driven)

VM ⇔ Tenant

Other RDBMS servicesGoogle

Cloud SQL Google Cloud SQL

Pricing:

Database size

Underlying DB:

MySQL

Heroku Postgres

Pricing:

Plan based

Underlying DB:

Postgres

Trove

Pricing:

Own Hardware

Underlying DB:

MySQL

Trove

Page 134: Cloud Databases in Research and Practice

MySQL for Google App Engine PaaS

Support for: patching, replication, backup

SLA: 99,95 % uptime / month

Postgres for Heroku PaaS

Hosted on EC2

No SLAs

MySQL for OpenStack (Icehouse)

Under development (HP driven)

VM ⇔ Tenant

Other RDBMS servicesGoogle

Cloud SQL Google Cloud SQL

Pricing:

Database size

Underlying DB:

MySQL

Heroku Postgres

Pricing:

Plan based

Underlying DB:

Postgres

Trove

Pricing:

Own Hardware

Underlying DB:

MySQL

Trove

Evaluation of Cloud RDBMSs

D. Kossmann,T. Kraska: An evaluation of alternative architectures for transaction processing in the cloud”, Sigmod 2010

TPC-W Benchmark (Online Shop), 2010, Emulated Browsers / RPS:

Page 135: Cloud Databases in Research and Practice

HBase Wide-Column

CP Over Row Key

~700 1/4 Apache

(EMR)

MongoDB Doc-ument

CP yes >100<500

4/4 GPL

Riak Key-Value

AP ~60 3/4 Apache

(Softlayer)

Cassandra Wide-Column

AP WithComp. Index

>300<1000

2/4 Apache

Redis Key-Value

CA Through Lists, etc.

manual N/A 4/4 BSD

Managed NoSQL services

Model CAP ScansSec.

IndicesLargestCluster

Lic.Lear-ning DBaaS

Page 136: Cloud Databases in Research and Practice

HBase Wide-Column

CP Over Row Key

~700 1/4 Apache

(EMR)

MongoDB Doc-ument

CP yes >100<500

4/4 GPL

Riak Key-Value

AP ~60 3/4 Apache

(Softlayer)

Cassandra Wide-Column

AP WithComp. Index

>300<1000

2/4 Apache

Redis Key-Value

CA Through Lists, etc.

manual N/A 4/4 BSD

Managed NoSQL services

Model CAP ScansSec.

IndicesLargestCluster

Lic.Lear-ning DBaaS

And there are many more:• CouchDB (e.g. Cloudant)• CouchBase (e.g. KuroBase Beta)• ElasticSearch(e.g. Bonsai)• Solr (e.g. WebSolr)• …

Page 137: Cloud Databases in Research and Practice

MongoHQ MongoHQ

Model:

Managed NoSQL

Pricing:

Plan-based

Underlying DB:

MongoDB

API:

Mongo, REST (beta)

EC2-based MongoDB-as-a-Serivce

Page 138: Cloud Databases in Research and Practice

MongoHQ MongoHQ

Model:

Managed NoSQL

Pricing:

Plan-based

Underlying DB:

MongoDB

API:

Mongo, REST (beta)

EC2-based MongoDB-as-a-Serivce

Page 139: Cloud Databases in Research and Practice

MongoHQ MongoHQ

Model:

Managed NoSQL

Pricing:

Plan-based

Underlying DB:

MongoDB

API:

Mongo, REST (beta)

EC2-based MongoDB-as-a-Serivce

Page 140: Cloud Databases in Research and Practice

MongoHQ MongoHQ

Model:

Managed NoSQL

Pricing:

Plan-based

Underlying DB:

MongoDB

API:

Mongo, REST (beta)

EC2-based MongoDB-as-a-Serivce

Page 141: Cloud Databases in Research and Practice

MongoHQ MongoHQ

Model:

Managed NoSQL

Pricing:

Plan-based

Underlying DB:

MongoDB

API:

Mongo, REST (beta)

EC2-based MongoDB-as-a-Serivce

Page 142: Cloud Databases in Research and Practice

MongoHQ MongoHQ

Model:

Managed NoSQL

Pricing:

Plan-based

Underlying DB:

MongoDB

API:

Mongo, REST (beta)

EC2-based MongoDB-as-a-Serivce

Private Process Multi-TenancyScale-Up Strategy: RAM, IOPs and CPU increased at runtimeMaximum Size: 1TB

Free Tier

Page 143: Cloud Databases in Research and Practice

MongoHQ MongoHQ

Model:

Managed NoSQL

Pricing:

Plan-based

Underlying DB:

MongoDB

API:

Mongo, REST (beta)

EC2-based MongoDB-as-a-Serivce

Private Process Multi-TenancyScale-Up Strategy: RAM, IOPs and CPU increased at runtimeMaximum Size: 1TB

Free Tier

VM-Deployment (EC2):M1.large on EC2: $128.10on EC2 with 1-yr-Res.: $30With MongoHQ: $637

Page 144: Cloud Databases in Research and Practice

MongoHQ MongoHQ

Model:

Managed NoSQL

Pricing:

Plan-based

Underlying DB:

MongoDB

API:

Mongo, REST (beta)

EC2-based MongoDB-as-a-Serivce

#1 thing that should never happen:

Page 145: Cloud Databases in Research and Practice

MongoHQ

Problem: Scalability

Client

Client

configconfigconfig

mongos

Replica Set

Master

Slave

Slave

mongos

Page 146: Cloud Databases in Research and Practice

MongoHQ

Problem: Scalability

Client

Client

configconfigconfig

mongos

Replica Set

Master

Slave

Slave

mongos

What if Writes / second or data volume become bottleneck?

Page 147: Cloud Databases in Research and Practice

MongoHQ

Problem: Scalability

Client

Client

configconfigconfig

mongos

Replica Set

Replica Set

Master

Slave

Slave

Master

Slave

Slave

mongos

Sharding (Scale Out)• Dynamic Scaling: Tenant adds

Replica Set• Elastic Scaling: Provider

adds/removes Replica SetMongoHQ: Only manual shardingon “contact us” basis

Page 148: Cloud Databases in Research and Practice

MongoHQ

Problem: Scalability

Client

Client

configconfigconfig

mongos

Replica Set

Replica Set

Master

Slave

Slave

Master

Slave

Slave

mongos

Sharding (Scale Out)• Dynamic Scaling: Tenant adds

Replica Set• Elastic Scaling: Provider

adds/removes Replica SetMongoHQ: Only manual shardingon “contact us” basis

• Bad: no SLAs, no horizontal scaling

• Good: Solid Dashboard, Backups, Replication, integration with Heroku

• Competitors: MongoLab, ObjectRocket, MongoSoup

Page 149: Cloud Databases in Research and Practice

ElastiCache ElastiCache

Model:

Managed NoSQL

Pricing:

Infrastructure

Underlying DB:

Memcache, Redis

API:

DB, REST

(management)

„RDS for Memcache and Redis“

Page 150: Cloud Databases in Research and Practice

ElastiCache ElastiCache

Model:

Managed NoSQL

Pricing:

Infrastructure

Underlying DB:

Memcache, Redis

API:

DB, REST

(management)

„RDS for Memcache and Redis“

Memcache or RedisMemcache can be run as a cluster (client-side sharding)

Limited choice of instance types

Page 151: Cloud Databases in Research and Practice

ElastiCache ElastiCache

Model:

Managed NoSQL

Pricing:

Infrastructure

Underlying DB:

Memcache, Redis

API:

DB, REST

(management)

„RDS for Memcache and Redis“Announced last Saturday: Snapshots

Page 152: Cloud Databases in Research and Practice

ElastiCache ElastiCache

Model:

Managed NoSQL

Pricing:

Infrastructure

Underlying DB:

Memcache, Redis

API:

DB, REST

(management)

„RDS for Memcache and Redis“

Page 153: Cloud Databases in Research and Practice

ElastiCache ElastiCache

Model:

Managed NoSQL

Pricing:

Infrastructure

Underlying DB:

Memcache, Redis

API:

DB, REST

(management)

„RDS for Memcache and Redis“

$ elasticache-modify-cache-parameter-group xy

AWS CLI tools and SDKsoffer a superset of Dashboard functions

Page 154: Cloud Databases in Research and Practice

Many Hosted NoSQLDbaaS Providers represented

Heroku DBaaS Addons

Page 155: Cloud Databases in Research and Practice

Many Hosted NoSQLDbaaS Providers represented

And Search

Heroku DBaaS Addons

Page 156: Cloud Databases in Research and Practice

Heroku Redis2Go example Redis2Go

Model:

Managed NoSQL

Pricing:

Plan-based

Underlying DB:

Redis

API:

Redis

Create Heroku App:

Page 157: Cloud Databases in Research and Practice

Heroku Redis2Go example Redis2Go

Model:

Managed NoSQL

Pricing:

Plan-based

Underlying DB:

Redis

API:

Redis

Create Heroku App:

Add Redis2Go Addon:

Page 158: Cloud Databases in Research and Practice

Heroku Redis2Go example Redis2Go

Model:

Managed NoSQL

Pricing:

Plan-based

Underlying DB:

Redis

API:

Redis

Create Heroku App:

Add Redis2Go Addon:

Use Connection URL (environment variable):

Page 159: Cloud Databases in Research and Practice

Heroku Redis2Go example Redis2Go

Model:

Managed NoSQL

Pricing:

Plan-based

Underlying DB:

Redis

API:

Redis

Create Heroku App:

Add Redis2Go Addon:

Use Connection URL (environment variable):

Deploy:

Page 160: Cloud Databases in Research and Practice

Heroku Redis2Go example Redis2Go

Model:

Managed NoSQL

Pricing:

Plan-based

Underlying DB:

Redis

API:

Redis

Create Heroku App:

Add Redis2Go Addon:

Use Connection URL (environment variable):

Deploy:• Very simple• Only suited for small to medium

applications (no SLAs, limited control)

Page 161: Cloud Databases in Research and Practice

SimpleDB Table-Store

CP Yes (asqueries)

Auto-matic

SQL-like(no joins, groups, …)

REST + SDKs

Dynamo-DB

Table-Store

CP By rangekey / index

Local Sec.Global Sec.

Key+Cond. On Range Key(s)

REST + SDKs

Automaticover Prim. Key

AzureTables

Table-Store

CP By rangekey

Key+Cond. On Range Key

REST + SDKs

Automaticover Part. Key

99.9% uptime

AE/CloudDataStore

Entity-Group

CP Yes (asqueries)

Auto-matic

Conjunct.of Eq. Predicates

REST/SDK, JDO,JPA

Automaticover EntityGroups

S3, Az. Blob, GCS

Blob-Store

AP REST + SDKs

Automaticover key

99.9% uptime(S3)

Proprietary Database services

Model CAP ScansSec.

IndicesQueries API SLA

Scale-out

Page 162: Cloud Databases in Research and Practice

SimpleDB Table-Store

CP Yes (asqueries)

Auto-matic

SQL-like(no joins, groups, …)

REST + SDKs

Dynamo-DB

Table-Store

CP By rangekey / index

Local Sec.Global Sec.

Key+Cond. On Range Key(s)

REST + SDKs

Automaticover Prim. Key

AzureTables

Table-Store

CP By rangekey

Key+Cond. On Range Key

REST + SDKs

Automaticover Part. Key

99.9% uptime

AE/CloudDataStore

Entity-Group

CP Yes (asqueries)

Auto-matic

Conjunct.of Eq. Predicates

REST/SDK, JDO,JPA

Automaticover EntityGroups

S3, Az. Blob, GCS

Blob-Store

AP REST + SDKs

Automaticover key

99.9% uptime(S3)

Proprietary Database services

Model CAP ScansSec.

IndicesQueries API SLA

Scale-out

There are many more object stores (HP, Rackspace, etc.)…but no comparable Table Stores

Page 163: Cloud Databases in Research and Practice

Azure Storage

Load-Balancing-System

Application

HTTP/HTTPS

Worker-Role

Web-Role

IIS-Webserver

Virtual Machines

Call

OtherServices

Not Allowed

Virtual Machines

Page 164: Cloud Databases in Research and Practice

Azure Storage

Load-Balancing-System

Application

HTTP/HTTPS

Worker-Role

Web-Role

IIS-Webserver

Virtual Machines

Call

OtherServices

Not Allowed

Virtual Machines

Tables

Blobs

Queues

Page 165: Cloud Databases in Research and Practice

Table Service example: Azure Tables

Partition Key

Row Key (sortiert)

Timestamp(autom.)

Property1 Propertyn

intro.pdf v1.1 14/6/2013 … …

intro.pdf v1.2 15/6/2013 …

präs.pptx v0.0 11/6/2013 …

Partition

Partition

RES

T A

PI

Page 166: Cloud Databases in Research and Practice

Table Service example: Azure Tables

Partition Key

Row Key (sortiert)

Timestamp(autom.)

Property1 Propertyn

intro.pdf v1.1 14/6/2013 … …

intro.pdf v1.2 15/6/2013 …

präs.pptx v0.0 11/6/2013 …

Partition

Partition

RES

T A

PI

SparseHash-distributed toparition servers

No Index: Lookup only (!) by full table scanAtomic "Entity-Group Batch Transaction" possible

Page 167: Cloud Databases in Research and Practice

Similar to Amazon SimpleDB and DynamoDB

Table Service example: Azure Tables

Partition Key

Row Key (sortiert)

Timestamp(autom.)

Property1 Propertyn

intro.pdf v1.1 14/6/2013 … …

intro.pdf v1.2 15/6/2013 …

präs.pptx v0.0 11/6/2013 …

Partition

Partition

RES

T A

PI

• Indexes all attributes• Rich(er) queries• Many Limits (size, RPS, etc.)

• Provisioned Throughput• On SSDs („single digit latency“)• Optional Indexes

Page 168: Cloud Databases in Research and Practice

Azure Table Storage Azure Tables

Model:

Propriertary

Pricing:

Requests + Storage

+ Network

Underlying DB:

Custom System

API:

REST

Challenges:

Single partition and range key (modelling) Very “basic“ queries

Azure Tables

Page 169: Cloud Databases in Research and Practice

Azure Table Storage Azure Tables

Model:

Propriertary

Pricing:

Requests + Storage

+ Network

Underlying DB:

Custom System

API:

REST

Challenges:

Single partition and range key (modelling) Very “basic“ queries

Azure Tables

Page 170: Cloud Databases in Research and Practice

Azure Table Storage Azure Tables

Model:

Propriertary

Pricing:

Requests + Storage

+ Network

Underlying DB:

Custom System

API:

REST

Challenges:

Single partition and range key (modelling) Very “basic“ queries

Good:Automatic DistributionReplicated 3x locally + 1x async. geo-replica

Azure Tables

Page 171: Cloud Databases in Research and Practice

Azure Table Storage Azure Tables

Model:

Propriertary

Pricing:

Requests + Storage

+ Network

Underlying DB:

Custom System

API:

REST

Challenges:

Single partition and range key (modelling) Very “basic“ queries

Good:Automatic DistributionReplicated 3x locally + 1x async. geo-replica

Very good:SLA (99.9% uptime)Internal architecture published

Azure Tables

Page 172: Cloud Databases in Research and Practice

Azure Table Storage Azure Tables

Model:

Propriertary

Pricing:

Requests + Storage

+ Network

Underlying DB:

Custom System

API:

REST

Challenges:

Single partition and range key (modelling) Very “basic“ queries

Good:Automatic DistributionReplicated 3x locally + 1x async. geo-replica

Very good:SLA (99.9% uptime)Internal architecture published

Azure Tables

Windows Azure Storage

B. Calder, et al. "Windows Azure Storage: a highly available cloud storage service with strong consistency." , SOSP 2011

Idea:• Layered storage infrastructure for Blobs and Tables• Use research results (GFS, BigTable, Paxos, LSM, Erasure Coding)

Page 173: Cloud Databases in Research and Practice

Azure Table Storage Azure Tables

Model:

Propriertary

Pricing:

Requests + Storage

+ Network

Underlying DB:

Custom System

API:

REST

Challenges:

Single partition and range key (modelling) Very “basic“ queries

Good:Automatic DistributionReplicated 3x locally + 1x async. geo-replica

Very good:SLA (99.9% uptime)Internal architecture published

Azure Tables

Windows Azure Storage

B. Calder, et al. "Windows Azure Storage: a highly available cloud storage service with strong consistency." , SOSP 2011

Idea:• Layered storage infrastructure for Blobs and Tables• Use research results (GFS, BigTable, Paxos, LSM, Erasure Coding)

Think: GFS/HDFSThink: BigTable/HBase

Think: Chubby/ZooKeeper

Page 174: Cloud Databases in Research and Practice

DynamoDB DynamoDB

Model:

Propriertary

Pricing:

Provisioned

Throughput +

Network

Underlying DB:

Custom System

API:

REST

Successor to SimpleDB

Limitations: • Slow (~50-100 RPS)• 10 GB per Domain• Query result max. 2500 records or

1 MB• Max. 1K-sized attributes

Page 175: Cloud Databases in Research and Practice

DynamoDB DynamoDB

Model:

Propriertary

Pricing:

Provisioned

Throughput +

Network

Underlying DB:

Custom System

API:

REST

Successor to SimpleDB

dom:com.cnn content : "<html>…"

Primary Key Attribute (scalar or set)

page:index

Range Key

Item:

Page 176: Cloud Databases in Research and Practice

DynamoDB DynamoDB

Model:

Propriertary

Pricing:

Provisioned

Throughput +

Network

Underlying DB:

Custom System

API:

REST

Successor to SimpleDB

dom:com.cnn content : "<html>…"

Primary Key Attribute (scalar or set)

Querying Options:

• GetItem: Key Lookup

• Query: Primary Key + Condition on Range Key

• Scan: Full Table Scan with filter

• EMR: Hive queries (for analytics)

page:index

Range Key

Item:

Page 177: Cloud Databases in Research and Practice

DynamoDB DynamoDB

Model:

Propriertary

Pricing:

Provisioned

Throughput +

Network

Underlying DB:

Custom System

API:

REST

Successor to SimpleDB

Consistency: Strongly (2x price) or Eventually Consistent Reads

Atomic (Conditional) Updates per Item

Indexing Options:◦ Local Sec. Index: consistent additional Range Key

◦ Global Sec. Index: eventually consistent index-table (Primary Key)

dom:com.cnn content : "<html>…"

Primary Key Attribute (scalar or set)

page:index

Range Key

Item:

Page 178: Cloud Databases in Research and Practice

DynamoDB DynamoDB

Model:

Propriertary

Pricing:

Provisioned

Throughput +

Network

Underlying DB:

Custom System

API:

REST

Successor to SimpleDB

Page 179: Cloud Databases in Research and Practice

DynamoDB DynamoDB

Model:

Propriertary

Pricing:

Provisioned

Throughput +

Network

Underlying DB:

Custom System

API:

REST

Successor to SimpleDB

Page 180: Cloud Databases in Research and Practice

DynamoDB DynamoDB

Model:

Propriertary

Pricing:

Provisioned

Throughput +

Network

Underlying DB:

Custom System

API:

REST

Successor to SimpleDB

Page 181: Cloud Databases in Research and Practice

DynamoDB DynamoDB

Model:

Propriertary

Pricing:

Provisioned

Throughput +

Network

Underlying DB:

Custom System

API:

REST

Successor to SimpleDB

Page 182: Cloud Databases in Research and Practice

DynamoDB DynamoDB

Model:

Propriertary

Pricing:

Provisioned

Throughput +

Network

Underlying DB:

Custom System

API:

REST

Successor to SimpleDB

Unit of Billing

Page 183: Cloud Databases in Research and Practice

DynamoDB DynamoDB

Model:

Propriertary

Pricing:

Provisioned

Throughput +

Network

Underlying DB:

Custom System

API:

REST

Successor to SimpleDB

Unit of BillingGood:• Low Latency (SSD)• Data partitioning and AZ-replicationBad:• Scaling not elastic (Capacity Units)• No SLAs, no internals published (≠ Dynamo!)• No built-in backups ( AWS data pipeline)• Vendor Lock-in

Page 184: Cloud Databases in Research and Practice

AE/Cloud DataStore DataStore

Model:

Propriertary

Pricing:

CPU + Storage +

Network

Underlying DB:

MegaStore

API:

SDK, JPA, JDO

Google Cloud

Datastore

Structured Storage System for App Engine

Based on:◦ Megastore BigTable Colossus

Page 185: Cloud Databases in Research and Practice

AE/Cloud DataStore DataStore

Model:

Propriertary

Pricing:

CPU + Storage +

Network

Underlying DB:

MegaStore

API:

SDK, JPA, JDO

Google Cloud

Datastore

Structured Storage System for App Engine

Based on:◦ Megastore BigTable Colossus

Page 186: Cloud Databases in Research and Practice

AE/Cloud DataStore DataStore

Model:

Propriertary

Pricing:

CPU + Storage +

Network

Underlying DB:

MegaStore

API:

SDK, JPA, JDO

Google Cloud

Datastore

Structured Storage System for App Engine

Based on:◦ Megastore BigTable Colossus

Page 187: Cloud Databases in Research and Practice

AE/Cloud DataStore DataStore

Model:

Propriertary

Pricing:

CPU + Storage +

Network

Underlying DB:

MegaStore

API:

SDK, JPA, JDO

Google Cloud

Datastore

Structured Storage System for App Engine

Based on:◦ Megastore BigTable Colossus

Schemafree Entity Group (EG) data model:

User

IDName

Photo

IDUserURL

Root Table Child Table

1

n

Page 188: Cloud Databases in Research and Practice

AE/Cloud DataStore DataStore

Model:

Propriertary

Pricing:

CPU + Storage +

Network

Underlying DB:

MegaStore

API:

SDK, JPA, JDO

Google Cloud

Datastore

Structured Storage System for App Engine

Based on:◦ Megastore BigTable Colossus

Schemafree Entity Group (EG) data model:

User

IDName

Photo

IDUserURL

Root Table Child Table

1

n

EG: User + n Photos• Unit of ACID transactions/

consistency• Fields autoindexed

(eventually consistent)

Page 189: Cloud Databases in Research and Practice

AE/Cloud DataStore DataStore

Model:

Propriertary

Pricing:

CPU + Storage +

Network

Underlying DB:

MegaStore

API:

SDK, JPA, JDO

Google Cloud

Datastore

Structured Storage System for App Engine

Based on:◦ Megastore BigTable Colossus

Schemafree Entity Group (EG) data model:

User

IDName

Photo

IDUserURL

Root Table Child Table

1

n

EG: User + n Photos• Unit of ACID transactions/

consistency• Fields autoindexed

(eventually consistent)

SELECT * FROM photos

WHERE ANCESTOR IS :34 AND name = „sunset“

ORDER BY date ASC

LIMIT 10

OFFSET 10

Page 190: Cloud Databases in Research and Practice

AE/Cloud DataStore

Internally:

Entity Groups define partitions

SynchronousPaxos-based

replication

ACID per EG. Maximum of1 Write/s to an EG.

Eventual Consistencyacross groups

Stored in BigTable

Page 191: Cloud Databases in Research and Practice

AE/Cloud DataStore

Internally:

Entity Groups define partitions

SynchronousPaxos-based

replication

ACID per EG. Maximum of1 Write/s to an EG.

Eventual Consistencyacross groups

Stored in BigTable

Page 192: Cloud Databases in Research and Practice

AE/Cloud DataStore

Internally:

MegaStore

J. Baker, et al. "Megastore: Providing Scalable, Highly Available Storage for Interactive Services." CIDR 2011.

• Paxos-based replication andtransactions

• 100 Google applicationsProblems: Slow Writes, PredefinedEntity Groups

Entity Groups define partitions

SynchronousPaxos-based

replication

ACID per EG. Maximum of1 Write/s to an EG.

Eventual Consistencyacross groups

Stored in BigTable

Page 193: Cloud Databases in Research and Practice

AE/Cloud DataStore

Internally:

MegaStore

J. Baker, et al. "Megastore: Providing Scalable, Highly Available Storage for Interactive Services." CIDR 2011.

• Paxos-based replication andtransactions

• 100 Google applicationsProblems: Slow Writes, PredefinedEntity Groups

Entity Groups define partitions

SynchronousPaxos-based

replication

ACID per EG. Maximum of1 Write/s to an EG.

Eventual Consistencyacross groups

Stored in BigTable

Spanner

J. Corbett et al. "Spanner: Google’s globally distributed database." TOCS 2013

Idea:• Autosharded Entity Groups• Not based on BigTableImplementation:• TrueTime API (GPS + atomic

clocks) commit timestampsof 2PL-SI transactions

• Paxos-replication per Shard

Page 194: Cloud Databases in Research and Practice

AE/Cloud DataStore

Internally:

MegaStore

J. Baker, et al. "Megastore: Providing Scalable, Highly Available Storage for Interactive Services." CIDR 2011.

• Paxos-based replication andtransactions

• 100 Google applicationsProblems: Slow Writes, PredefinedEntity Groups

Entity Groups define partitions

SynchronousPaxos-based

replication

ACID per EG. Maximum of1 Write/s to an EG.

Eventual Consistencyacross groups

Stored in BigTable

Spanner

J. Corbett et al. "Spanner: Google’s globally distributed database." TOCS 2013

Idea:• Autosharded Entity Groups• Not based on BigTableImplementation:• TrueTime API (GPS + atomic

clocks) commit timestampsof 2PL-SI transactions

• Paxos-replication per Shard

F1

J. Shute, et al. "F1: A distributed SQL database that scales.“, VLDB 2013

Idea:• Full SQL relational database built on

Spanner, powers AdWordsImplementation:• 5-way replication• Data Model: relational + hierarchy

(customercampaignAdGroup)• Transactions: Snapshot-Read-Only,

Pessimistic (Spanner), optimistic• Distributed SQL Engine

Page 195: Cloud Databases in Research and Practice

AE/Cloud DataStore

Internally:

MegaStore

J. Baker, et al. "Megastore: Providing Scalable, Highly Available Storage for Interactive Services." CIDR 2011.

• Paxos-based replication andtransactions

• 100 Google applicationsProblems: Slow Writes, PredefinedEntity Groups

Entity Groups define partitions

SynchronousPaxos-based

replication

ACID per EG. Maximum of1 Write/s to an EG.

Eventual Consistencyacross groups

Stored in BigTable

Spanner

J. Corbett et al. "Spanner: Google’s globally distributed database." TOCS 2013

Idea:• Autosharded Entity Groups• Not based on BigTableImplementation:• TrueTime API (GPS + atomic

clocks) commit timestampsof 2PL-SI transactions

• Paxos-replication per Shard

F1

J. Shute, et al. "F1: A distributed SQL database that scales.“, VLDB 2013

Idea:• Full SQL relational database built on

Spanner, powers AdWordsImplementation:• 5-way replication• Data Model: relational + hierarchy

(customercampaignAdGroup)• Transactions: Snapshot-Read-Only,

Pessimistic (Spanner), optimistic• Distributed SQL Engine

Good:• Transactions (though limited)• Good scalability of data volumeBad:• Entity Groups hard to define• Bad Scale-Out for write and reads (Kossmann et

al.) no advantage over RDBMS for small datavolumes

A Spanner/F1 based DBaaS?

Page 196: Cloud Databases in Research and Practice

Idea: Blobs with RESTful CRUD Interface

Distribution: 𝐵𝑢𝑐𝑘𝑒𝑡, 𝐾𝑒𝑦 → 𝑆𝑒𝑟𝑣𝑒𝑟 + 𝑅𝑒𝑝𝑙𝑖𝑐𝑎𝑠

CDN Integration (Azure CDN, Cloudfront)

S3: > 2 Trillion (2 ⋅ 1012) objects

Azure Blobs, Amazon S3, Google Cloud Storage

Container

Blobcontains

Block 1 (up to 4 MB)

Block 2

Block n

REST-Request

Azure Blobs:

Block Blob: 4MB blocks large filesPage Blob: 512B blocks random IO

Page 197: Cloud Databases in Research and Practice

Automatic Versioning

Pricing: per request (10.000 ~ 1c) and storage(1GB/month ~ 3c) and network (1GB ~ 10c)

Azure Blobs, Amazon S3, Google Cloud Storage

S3 BlobDELETE /puppy.jpg HTTP/1.1

Host: mybucket.s3.amazonaws.comAuthorization: AWS AKIAIO...

AWS Ireland DC

Replicas

Reduced Redundancy: only 1 replicaGlacier: tape-disk archivalAmazon S3:

Page 198: Cloud Databases in Research and Practice

Automatic Versioning

Pricing: per request (10.000 ~ 1c) and storage(1GB/month ~ 3c) and network (1GB ~ 10c)

Azure Blobs, Amazon S3, Google Cloud Storage

S3 BlobDELETE /puppy.jpg HTTP/1.1

Host: mybucket.s3.amazonaws.comAuthorization: AWS AKIAIO...

AWS Ireland DC

Replicas

Reduced Redundancy: only 1 replicaGlacier: tape-disk archivalAmazon S3:

S3 Consistency

D. Bermbach,, S. Tai. "Eventual consistency: How soon is eventual?”, MW4SOC 11

Findings:• Inconsistency window varies

from 2-11 seconds• Monotonic Read Consistency

is often violated

Page 199: Cloud Databases in Research and Practice

Automatic Versioning

Pricing: per request (10.000 ~ 1c) and storage(1GB/month ~ 3c) and network (1GB ~ 10c)

Azure Blobs, Amazon S3, Google Cloud Storage

S3 BlobDELETE /puppy.jpg HTTP/1.1

Host: mybucket.s3.amazonaws.comAuthorization: AWS AKIAIO...

AWS Ireland DC

Replicas

Reduced Redundancy: only 1 replicaGlacier: tape-disk archivalAmazon S3:

S3 Consistency

D. Bermbach,, S. Tai. "Eventual consistency: How soon is eventual?”, MW4SOC 11

Findings:• Inconsistency window varies

from 2-11 seconds• Monotonic Read Consistency

is often violated

Building a database on S3

M. Brantner, et al. "Building a database on S3." Sigmod 2008

Idea:• Use S3 as the persistent storage of a

database

Implementation:• Buffer Pool and Log Manager on S3• No transaction or query support

Page 200: Cloud Databases in Research and Practice

Founded June, 2011

Acquired by Facebook April, 2013

Pricing: ◦ Free

◦ Pro (199$)

◦ Enterprise

Parse - MBaaS Parse

Model:

Backend-aas

Pricing:

Plan-based

Underlying DB:

Mainly MongoDB

API:

SDKs, REST

Page 201: Cloud Databases in Research and Practice

Founded June, 2011

Acquired by Facebook April, 2013

Pricing: ◦ Free

◦ Pro (199$)

◦ Enterprise

Parse - MBaaS Parse

Model:

Backend-aas

Pricing:

Plan-based

Underlying DB:

Mainly MongoDB

API:

SDKs, REST

Parse Core

Page 202: Cloud Databases in Research and Practice

Founded June, 2011

Acquired by Facebook April, 2013

Pricing: ◦ Free

◦ Pro (199$)

◦ Enterprise

Parse - MBaaS Parse

Model:

Backend-aas

Pricing:

Plan-based

Underlying DB:

Mainly MongoDB

API:

SDKs, REST

Parse Core Parse Analytics

Page 203: Cloud Databases in Research and Practice

Founded June, 2011

Acquired by Facebook April, 2013

Pricing: ◦ Free

◦ Pro (199$)

◦ Enterprise

Parse - MBaaS Parse

Model:

Backend-aas

Pricing:

Plan-based

Underlying DB:

Mainly MongoDB

API:

SDKs, REST

Parse Core Parse Analytics Parse Push

Page 204: Cloud Databases in Research and Practice

Parse - MBaaS

Page 205: Cloud Databases in Research and Practice

AuthenticationUser + PasswordOAuth: Facebook, Twitter

Parse - MBaaS

Page 206: Cloud Databases in Research and Practice

AuthenticationUser + PasswordOAuth: Facebook, Twitter

Parse - MBaaS

Query Cinemas(new Parse.Query('Cinemas'))

.withinKilometers(...)

.fetch()

Page 207: Cloud Databases in Research and Practice

AuthenticationUser + PasswordOAuth: Facebook, Twitter

Parse - MBaaS

Query Cinemas(new Parse.Query('Cinemas'))

.withinKilometers(...)

.fetch()

Query Movies(new Parse.Query('Movies'))

.greaterThan('startAt‘, now)

.notEqualTo('cinemas', cId)

.fetch()

Page 208: Cloud Databases in Research and Practice

Meteor Meteor

Model:

Backend-aaS

Pricing:

No yet revealed

Underlying DB:

MongoDB

API:

WebSockets

Idea: Full-Stack JavaScript with Node.js, MongoDB and WebSockets

Web Browser Node.js + MongoDB(Single Server)

WebSocket

Page 209: Cloud Databases in Research and Practice

Meteor Meteor

Model:

Backend-aaS

Pricing:

No yet revealed

Underlying DB:

MongoDB

API:

WebSockets

Idea: Full-Stack JavaScript with Node.js, MongoDB and WebSockets

<div class="player {{selected}}"><span class="name">{{name}}</span><span class="score">{{score}}</span>

</div>

Web Browser Node.js + MongoDB(Single Server)

WebSocket

Page 210: Cloud Databases in Research and Practice

Meteor Meteor

Model:

Backend-aaS

Pricing:

No yet revealed

Underlying DB:

MongoDB

API:

WebSockets

Idea: Full-Stack JavaScript with Node.js, MongoDB and WebSockets

Players = new Meteor.

Collection("players");

if (Meteor.isServer) {

//…

<div class="player {{selected}}"><span class="name">{{name}}</span><span class="score">{{score}}</span>

</div>

Web Browser Node.js + MongoDB(Single Server)

WebSocket

Page 211: Cloud Databases in Research and Practice

Meteor Meteor

Model:

Backend-aaS

Pricing:

No yet revealed

Underlying DB:

MongoDB

API:

WebSockets

Idea: Full-Stack JavaScript with Node.js, MongoDB and WebSockets

Players = new Meteor.

Collection("players");

if (Meteor.isServer) {

//…

if (Meteor.isClient) {

Template.leaderboard.players = function () {

return Players.find({},

{sort: {score: -1, name: 1}});

};

<div class="player {{selected}}"><span class="name">{{name}}</span><span class="score">{{score}}</span>

</div>

Web Browser Node.js + MongoDB(Single Server)

WebSocket

Page 212: Cloud Databases in Research and Practice

Meteor Meteor

Model:

Backend-aaS

Pricing:

No yet revealed

Underlying DB:

MongoDB

API:

WebSockets

Idea: Full-Stack JavaScript with Node.js, MongoDB and WebSockets

Players = new Meteor.

Collection("players");

if (Meteor.isServer) {

//…

if (Meteor.isClient) {

Template.leaderboard.players = function () {

return Players.find({},

{sort: {score: -1, name: 1}});

};

<div class="player {{selected}}"><span class="name">{{name}}</span><span class="score">{{score}}</span>

</div>

Web Browser Node.js + MongoDB(Single Server)

WebSocket

$ meteor deploy abc.meteor.com

Page 213: Cloud Databases in Research and Practice

Meteor Meteor

Model:

Backend-aaS

Pricing:

No yet revealed

Underlying DB:

MongoDB

API:

WebSockets

Idea: Full-Stack JavaScript with Node.js, MongoDB and WebSockets

Players = new Meteor.

Collection("players");

if (Meteor.isServer) {

//…

if (Meteor.isClient) {

Template.leaderboard.players = function () {

return Players.find({},

{sort: {score: -1, name: 1}});

};

<div class="player {{selected}}"><span class="name">{{name}}</span><span class="score">{{score}}</span>

</div>

Web Browser Node.js + MongoDB(Single Server)

WebSocket

$ meteor deploy abc.meteor.com

Very productive for very small projects

Fundamentally limited scalability:• Server tails Mongo‘s oplog• And holds the fetched data state of

every client

Page 214: Cloud Databases in Research and Practice

Outline

• Hot Topics• Orestes: a scalable, low-

latency architecture• Baqend: putting it into

practice

What are Cloud Databases?

Cloud Databases in the wild

Research Perspectives

Wrap-up andliterature

Page 215: Cloud Databases in Research and Practice
Page 216: Cloud Databases in Research and Practice

Example: CryptDB

Idea: Only decrypt as much as neccessary

Encrypted Databases: Research

RDBMS

SQL-Proxy

Encrypts and decrypts, rewrites queries

Page 217: Cloud Databases in Research and Practice

Example: CryptDB

Idea: Only decrypt as much as neccessary

Encrypted Databases: Research

RDBMS

SQL-Proxy

Encrypts and decrypts, rewrites queries

Page 218: Cloud Databases in Research and Practice

Example: CryptDB

Idea: Only decrypt as much as neccessary

Encrypted Databases: Research

RDBMS

SQL-Proxy

Encrypts and decrypts, rewrites queries

Relational Cloud

C. Curino, et al. "Relational cloud: A database-as-a-service for the cloud.“, CIDR 2011

DBaaS Architecture:• Encrypted with CryptDB• Multi-Tenancy through live

migration• Workload-aware partitioning

(graph-based)

Page 219: Cloud Databases in Research and Practice

Example: CryptDB

Idea: Only decrypt as much as neccessary

Encrypted Databases: Research

RDBMS

SQL-Proxy

Encrypts and decrypts, rewrites queries

Relational Cloud

C. Curino, et al. "Relational cloud: A database-as-a-service for the cloud.“, CIDR 2011

DBaaS Architecture:• Encrypted with CryptDB• Multi-Tenancy through live

migration• Workload-aware partitioning

(graph-based)

• Early approach• Not adopted in practice, yet

Dream solution:Full Homorphic Encryption

Page 220: Cloud Databases in Research and Practice

Transactions/Consistency: Research

Dynamo Eventual None 1 RT -

Yahoo PNuts Timeline per key Single Key 1 RT possible

COPS Causality Multi-Record 1 RT possible

MySQL (async) Serializable Static Partition 1 RT possible

Megastore Serializable Static Partition 2 RT -

Spanner/F1 Snapshot Isolation Partition 2 RT -

MDCC Read-Commited Multi-Record 1 RT -

Consistency Transactional UnitCommit Latency

Data Loss?

Page 221: Cloud Databases in Research and Practice

Transactions/Consistency: Research

Dynamo Eventual None 1 RT -

Yahoo PNuts Timeline per key Single Key 1 RT possible

COPS Causality Multi-Record 1 RT possible

MySQL (async) Serializable Static Partition 1 RT possible

Megastore Serializable Static Partition 2 RT -

Spanner/F1 Snapshot Isolation Partition 2 RT -

MDCC Read-Commited Multi-Record 1 RT -

Consistency Transactional UnitCommit Latency

Data Loss?

Multi-Data Center Consistency

T. Kraska et al. "MDCC: Multi-data center consistency." EuroSys, 2013.

Idea:• Multi-Data center commit protocol with

single round-trip

Implementation:• Optimistic Commit Protocol• Fast, Generalized Multi-Paxos

Result: almost as fast as Dynamo-style

Page 222: Cloud Databases in Research and Practice

Transactions/Consistency: Research

Dynamo Eventual None 1 RT -

Yahoo PNuts Timeline per key Single Key 1 RT possible

COPS Causality Multi-Record 1 RT possible

MySQL (async) Serializable Static Partition 1 RT possible

Megastore Serializable Static Partition 2 RT -

Spanner/F1 Snapshot Isolation Partition 2 RT -

MDCC Read-Commited Multi-Record 1 RT -

Consistency Transactional UnitCommit Latency

Data Loss?

Multi-Data Center Consistency

T. Kraska et al. "MDCC: Multi-data center consistency." EuroSys, 2013.

Idea:• Multi-Data center commit protocol with

single round-trip

Implementation:• Optimistic Commit Protocol• Fast, Generalized Multi-Paxos

Result: almost as fast as Dynamo-style

Currently no NoSQL DB implementsconsistent Multi-DC replication

Page 223: Cloud Databases in Research and Practice

YCSB (Yahoo Cloud Serving Benchmark)

Benchmarking: Research

Data Store

Page 224: Cloud Databases in Research and Practice

YCSB (Yahoo Cloud Serving Benchmark)

Benchmarking: Research

Client

Wo

rkload

Gen

erator

Plu

ggable

DB

interface

Data Store

Threads

Stats

Page 225: Cloud Databases in Research and Practice

YCSB (Yahoo Cloud Serving Benchmark)

Benchmarking: Research

Client

Wo

rkload

Gen

erator

Plu

ggable

DB

interface

Workload:

1. Operation Mix

2. Record Size

3. Popularity Distribution

Runtime Parameters:

DB host name,

threads, etc.

Data Store

Threads

Stats

Page 226: Cloud Databases in Research and Practice

YCSB (Yahoo Cloud Serving Benchmark)

Benchmarking: Research

Client

Wo

rkload

Gen

erator

Plu

ggable

DB

interface

Workload:

1. Operation Mix

2. Record Size

3. Popularity Distribution

Runtime Parameters:

DB host name,

threads, etc.

Read()Insert()Update()Delete()Scan()

Data Store

Threads

Stats

DB protocol

Page 227: Cloud Databases in Research and Practice

YCSB (Yahoo Cloud Serving Benchmark)

Benchmarking: Research

Client

Wo

rkload

Gen

erator

Plu

ggable

DB

interface

Workload:

1. Operation Mix

2. Record Size

3. Popularity Distribution

Runtime Parameters:

DB host name,

threads, etc.

Read()Insert()Update()Delete()Scan()

Data Store

Threads

Stats

DB protocol

Workload Operation Mix Distribution Example

A – Update Heavy Read: 50%Update: 50%

Zipfian Session Store

B – Read Heavy Read: 95%Update: 5%

Zipfian Photo Tagging

C – Read Only Read: 100% Zipfian User Profile Cache

D – Read Latest Read: 95%Insert: 5%

Latest User Status Updates

E – Short Ranges Scan: 95%Insert: 5%

Zipfian/Uniform

Threaded Conversations

Page 228: Cloud Databases in Research and Practice

Example Result

(Read Heavy):

Benchmarking: Research

Page 229: Cloud Databases in Research and Practice

Example Result

(Read Heavy):

Benchmarking: Research

Weaknesses:• Single client can be a

bottleneck• No consistency &

availability measurement

Page 230: Cloud Databases in Research and Practice

Example Result

(Read Heavy):

Benchmarking: ResearchYCSB++

S. Patil, M. Polte, et al.„Ycsb++: benchmarking and performance debugging advanced features in scalable table stores“, SOCC 2011

• Clients coordinate throughZookeeper

• Simple Read-After-Write Checks• Evaluation: Hbase & Accumulo

Weaknesses:• Single client can be a

bottleneck• No consistency &

availability measurement

Page 231: Cloud Databases in Research and Practice

Example Result

(Read Heavy):

Benchmarking: ResearchYCSB++

S. Patil, M. Polte, et al.„Ycsb++: benchmarking and performance debugging advanced features in scalable table stores“, SOCC 2011

• Clients coordinate throughZookeeper

• Simple Read-After-Write Checks• Evaluation: Hbase & Accumulo

Weaknesses:• Single client can be a

bottleneck• No consistency &

availability measurement

• No Transaction Support

YCSB+T

A. Dey et al. “YCSB+T: Benchmarking Web-Scale Transactional Databases”, CloudDB 2014

• New workload: TransactionalBank Account

• Simple anomaly detection forLost Updates

• No comparison of systems

Page 232: Cloud Databases in Research and Practice

Example Result

(Read Heavy):

Benchmarking: ResearchYCSB++

S. Patil, M. Polte, et al.„Ycsb++: benchmarking and performance debugging advanced features in scalable table stores“, SOCC 2011

• Clients coordinate throughZookeeper

• Simple Read-After-Write Checks• Evaluation: Hbase & Accumulo

Weaknesses:• Single client can be a

bottleneck• No consistency &

availability measurement

• No Transaction Support

YCSB+T

A. Dey et al. “YCSB+T: Benchmarking Web-Scale Transactional Databases”, CloudDB 2014

• New workload: TransactionalBank Account

• Simple anomaly detection forLost Updates

• No comparison of systems

No specific applicationCloudStone, CARE, TPC

extensions?

Page 233: Cloud Databases in Research and Practice

Benchmarking: state of the art

Page 234: Cloud Databases in Research and Practice

Benchmarking: state of the art

Tests latency and throughputof IaaS-Providers, CDNs and

Object Stores

Page 235: Cloud Databases in Research and Practice

Benchmarking: state of the art

Tests latency and throughputof IaaS-Providers, CDNs and

Object Stores

Does not test: Cloud Databases

Page 236: Cloud Databases in Research and Practice

VisionYCSB Harmony

Idea:

Page 237: Cloud Databases in Research and Practice

VisionYCSB Harmony

YCSBClient

YCSBClient YCSB

Client

YCSBClient

Idea:1. Periodically

launch clients

Page 238: Cloud Databases in Research and Practice

VisionYCSB Harmony

YCSBClient

YCSBClient YCSB

Client

YCSBClient

Idea:1. Periodically

launch clients2. Benchmark

Cloud DB3. Publish results

Page 239: Cloud Databases in Research and Practice

VisionYCSB Harmony

System Availablity Reads/s Writes/s Avg. Latency 95th perc. Latency

Plot

DynamoDB 99.9% 23411 34534 3.2 ms 9 ms

RDS 99.8% 2342 2455 30 ms 80 ms

Azure Table 99.5% 22343 23442 12 ms 20 ms

GoogleDataStore

99.5% 3000 2000 30 ms 300 ms

Page 240: Cloud Databases in Research and Practice

Database research at theUniversity of Hamburg

Page 241: Cloud Databases in Research and Practice

Motivation

Classic 3-Tier-ArchitectureThree-Tier Architecture

Page 242: Cloud Databases in Research and Practice

Motivation

ClientApplicationDatabase

Web Server

Web Server

Classic 3-Tier-ArchitectureThree-Tier Architecture

Page 243: Cloud Databases in Research and Practice

Motivation

ClientApplicationDatabase

Web Server

Web Server

Classic 3-Tier-ArchitectureThree-Tier Architecture

Page 244: Cloud Databases in Research and Practice

Motivation

ClientApplicationDatabase

Web Server

Web Server

High Latency

Classic 3-Tier-ArchitectureThree-Tier Architecture

Page 245: Cloud Databases in Research and Practice

Motivation

ClientApplicationDatabase

Web Server

Web Server

Average (2014):90 HTTP Requests per

page load

High Latency

Classic 3-Tier-ArchitectureThree-Tier Architecture

Page 246: Cloud Databases in Research and Practice

Motivation

ClientApplicationDatabase

Web Server

Web Server

Average (2014):90 HTTP Requests per

page load

High Latency

Classic 3-Tier-ArchitectureThree-Tier ArchitectureWith every 100ms of additional page load time, revenue decreases by 1%.

Study by Amazon

Page 247: Cloud Databases in Research and Practice

Motivation

ClientApplicationDatabase

Web Server

Web Server

Average (2014):90 HTTP Requests per

page load

High Latency

Classic 3-Tier-ArchitectureThree-Tier ArchitectureWith every 100ms of additional page load time, revenue decreases by 1%.

Study by Amazon

When increasing load time of search results by 500ms, traffic decreases by 20%.

Study by Google

Page 248: Cloud Databases in Research and Practice

Motivation

ClientApplicationDatabase

Web Server

Web Server

Average (2014):90 HTTP Requests per

page load

High Latency

Classic 3-Tier-ArchitectureThree-Tier ArchitectureWith every 100ms of additional page load time, revenue decreases by 1%.

Study by Amazon

When increasing load time of search results by 500ms, traffic decreases by 20%.

Study by Google

Page 249: Cloud Databases in Research and Practice

SPAs

Rich ClientApplicationDatabase

Web Server

Web Server

Single-Page Applications

High Latency

(Rich Client, Smart Client)

Data (e.g. JSON)

(AngularJS, Backbone,

Ember.JS, etc.)

Page 250: Cloud Databases in Research and Practice

ORESTES

Rich ClientOrestesDB-Cluster

RESTServer

Low LatencyCloud

RESTServer

Page 251: Cloud Databases in Research and Practice

ORESTES

Rich ClientOrestesDB-Cluster

RESTServer

Web-Caches Low Latency

Cloud

RESTServer

CacheableDatabase Objects

Page 252: Cloud Databases in Research and Practice

ORESTES

Rich ClientOrestesDB-Cluster

RESTServer

Web-Caches Low Latency

ScalableNoSQL DBs

Cloud

RESTServer

CacheableDatabase Objects

Page 253: Cloud Databases in Research and Practice

ORESTES

Rich ClientOrestesDB-Cluster

RESTServer

Web-Caches Low Latency

ScalableNoSQL DBs

Cloud

Exposes the DB via REST API and handles:• Scaling• Cache Consistency• Transactions• Schema

RESTServer

CacheableDatabase Objects

Page 254: Cloud Databases in Research and Practice

Middleware:◦ Scalable REST API for aggregate-oriented

persistence, queries, schema management and transactions.

Caching:◦ Consistent web caching of database objects for

read scalability and latency reduction.

Transactions:◦ Optimistic cache-aware transaction model.

ORESTES: Components

Page 255: Cloud Databases in Research and Practice

Java/JDO

persist

find

createQueryJavaScript/JPA Port

REST/HTTP API

others

Application

Server

Browser or

Mobile Device

ApplicationLayer

PersistenceAPI

DataStore

Page 256: Cloud Databases in Research and Practice

Java/JDO

persist

find

createQueryJavaScript/JPA Port

REST/HTTP API

others

Application

Server

Browser or

Mobile Device

ApplicationLayer

PersistenceAPI

DataStore

HTTP Server

Trans-

actionsQuerys

Object

Persist.Schema

Key-

Value

Doc-

uments

DBaaS & BaaS Layer

Transaction Validation

HTTP Server

Config-

urationPartial

Updates

Access Control

Multi-Tenancy Schema Management

Workload Management

Cache Coherence

Autoscaling

Database-independentConcerns

Database -specific Wrappers

SLAs

HTTP Server

Page 257: Cloud Databases in Research and Practice

Java/JDO

persist

find

createQueryJavaScript/JPA Port

REST/HTTP API

others

Application

Server

Browser or

Mobile Device

ApplicationLayer

PersistenceAPI

DataStore

ISP

Forward-Proxy Caches

ISP Caches

Reverse-Proxy Caches and

Load Balancers

CDN Caches

Content DeliveryNetworks

Purge

Scale

HTTP Server

Trans-

actionsQuerys

Object

Persist.Schema

Key-

Value

Doc-

uments

DBaaS & BaaS Layer

Transaction Validation

HTTP Server

Config-

urationPartial

Updates

Access Control

Multi-Tenancy Schema Management

Workload Management

Cache Coherence

Autoscaling

Database-independentConcerns

Database -specific Wrappers

SLAs

HTTP Server

Page 258: Cloud Databases in Research and Practice

Java/JDO

persist

find

createQueryJavaScript/JPA Port

REST/HTTP API

others

Application

Server

Browser or

Mobile Device

ApplicationLayer

PersistenceAPI

DataStore

ISP

Forward-Proxy Caches

ISP Caches

Reverse-Proxy Caches and

Load Balancers

CDN Caches

Content DeliveryNetworks

Purge

Scale

HTTP Server

Trans-

actionsQuerys

Object

Persist.Schema

Key-

Value

Doc-

uments

DBaaS & BaaS Layer

Transaction Validation

HTTP Server

Config-

urationPartial

Updates

Access Control

Multi-Tenancy Schema Management

Workload Management

Cache Coherence

Autoscaling

Database-independentConcerns

Database -specific Wrappers

SLAs

HTTP Server

Redis (Replicated)

10201040

10101010Counting

Bloom Filter

add

delete

Node.JS (local to Server)

Stored Procedures

Custom Validation

Page 259: Cloud Databases in Research and Practice

Java/JDO

persist

find

createQueryJavaScript/JPA Port

REST/HTTP API

others

Application

Server

Browser or

Mobile Device

ApplicationLayer

PersistenceAPI

DataStore

ISP

Forward-Proxy Caches

ISP Caches

Reverse-Proxy Caches and

Load Balancers

CDN Caches

Content DeliveryNetworks

Purge

Scale

HTTP Server

Trans-

actionsQuerys

Object

Persist.Schema

Key-

Value

Doc-

uments

DBaaS & BaaS Layer

Transaction Validation

HTTP Server

Config-

urationPartial

Updates

Access Control

Multi-Tenancy Schema Management

Workload Management

Cache Coherence

Autoscaling

Database-independentConcerns

Database -specific Wrappers

SLAs

HTTP Server

Redis (Replicated)

10201040

10101010Counting

Bloom Filter

add

delete

Node.JS (local to Server)

Stored Procedures

Custom Validation

GET /db/{bucket}/{class}/{id}

200 OKCache-Control: public, max-age=6000ETag: "3"JSON Object

Page 260: Cloud Databases in Research and Practice

Caching

Orestes

Client-

(Browser-)

Cache

Proxy

CachesISP

CachesCDN

Caches

Reverse-

Proxy

Caches

Miss

Hit

MissMiss

MissMiss

100%

50%

0%

P(Cache-Hit)

0 ms 1 ms 10 ms 20 ms 50-500 ms 50-500 ms

Page 261: Cloud Databases in Research and Practice

Caching

Orestes

Client-

(Browser-)

Cache

Proxy

CachesISP

CachesCDN

Caches

Reverse-

Proxy

Caches

Miss

Hit

MissMiss

MissMiss

100%

50%

0%

P(Cache-Hit)

0 ms 1 ms 10 ms 20 ms 50-500 ms 50-500 ms

em.find(id) JavaScript

Page 262: Cloud Databases in Research and Practice

Caching

Orestes

Client-

(Browser-)

Cache

Proxy

CachesISP

CachesCDN

Caches

Reverse-

Proxy

Caches

Miss

Hit

MissMiss

MissMiss

100%

50%

0%

P(Cache-Hit)

0 ms 1 ms 10 ms 20 ms 50-500 ms 50-500 ms

GET /db/posts/{id} HTTP

Page 263: Cloud Databases in Research and Practice

Caching

Orestes

Client-

(Browser-)

Cache

Proxy

CachesISP

CachesCDN

Caches

Reverse-

Proxy

Caches

Miss

Hit

MissMiss

MissMiss

100%

50%

0%

P(Cache-Hit)

0 ms 1 ms 10 ms 20 ms 50-500 ms 50-500 ms

Cache-Hit: Return ObjectCache-Miss: Forward Request

Page 264: Cloud Databases in Research and Practice

Caching

Orestes

Client-

(Browser-)

Cache

Proxy

CachesISP

CachesCDN

Caches

Reverse-

Proxy

Caches

Miss

Hit

MissMiss

MissMiss

100%

50%

0%

P(Cache-Hit)

0 ms 1 ms 10 ms 20 ms 50-500 ms 50-500 ms

Fetch object from DB and returnit with caching information

Page 265: Cloud Databases in Research and Practice

Caching

Orestes

Client-

(Browser-)

Cache

Proxy

CachesISP

CachesCDN

Caches

Reverse-

Proxy

Caches

Miss

Hit

MissMiss

MissMiss

100%

50%

0%

P(Cache-Hit)

0 ms 1 ms 10 ms 20 ms 50-500 ms 50-500 ms

Scalability and Cache-Hits

Page 266: Cloud Databases in Research and Practice

Caching

Orestes

Client-

(Browser-)

Cache

Proxy

CachesISP

CachesCDN

Caches

Reverse-

Proxy

Caches

Miss

Hit

MissMiss

MissMiss

100%

50%

0%

P(Cache-Hit)

0 ms 1 ms 10 ms 20 ms 50-500 ms 50-500 ms

Scalability and Cache-Hits

Latency Benefit

Page 267: Cloud Databases in Research and Practice

App

App

BFB: Bloom Filter Bounded Staleness

Cache

1 4 020

Page 268: Cloud Databases in Research and Practice

App

App

BFB: Bloom Filter Bounded Staleness

Cache

1 4 020

How to prevent stalereads (inconsistency)?

Page 269: Cloud Databases in Research and Practice

App

App

BFB: Bloom Filter Bounded Staleness

Cache

1 4 020

Page 270: Cloud Databases in Research and Practice

App

App

BFB: Bloom Filter Bounded Staleness

Cache

1 4 020

Page 271: Cloud Databases in Research and Practice

App

App

BFB: Bloom Filter Bounded Staleness

Cache

1 4 020

Page 272: Cloud Databases in Research and Practice

App

App

BFB: Bloom Filter Bounded Staleness

Cache

1 4 020

purge(obj)

hashB(oid)hashA(oid)

13

Page 273: Cloud Databases in Research and Practice

App

App

BFB: Bloom Filter Bounded Staleness

Cache

1 4 020

purge(obj)

131 1 110Flat(Counting Bloomfilter)

Page 274: Cloud Databases in Research and Practice

App

App

BFB: Bloom Filter Bounded Staleness

Cache

1 4 020

purge(obj)

131 1 110

hashB(oid)hashA(oid)

Page 275: Cloud Databases in Research and Practice

App

App

BFB: Bloom Filter Bounded Staleness

Cache

1 4 020

purge(obj)

131 1 110

hashB(oid)hashA(oid)

Page 276: Cloud Databases in Research and Practice

App

App

BFB: Bloom Filter Bounded Staleness

Cache

1 4 020

purge(obj)

131 1 110

Page 277: Cloud Databases in Research and Practice

App

App

BFB: Bloom Filter Bounded Staleness

Cache

1 4 020

hashB(oid)hashA(oid)

1 1 110

Page 278: Cloud Databases in Research and Practice

App

App

BFB: Bloom Filter Bounded Staleness

Cache

1 4 020

hashB(oid)hashA(oid)

1 1 110

𝑓 ≈ 1 − 𝑒−𝑘𝑛𝑚

𝑘

𝑘 = ln 2 ⋅ (𝑛

𝑚)

False-Positive

Rate:

Hash-

Functions:

With 10.000 Updates per 10 minutes and 1% error rate: 12 KByte

Page 279: Cloud Databases in Research and Practice

App

App

BFB: Bloom Filter Bounded Staleness

Cache

1 4 020

hashB(oid)hashA(oid)

1 1 110

Page 280: Cloud Databases in Research and Practice

SCOT: Scalable Cache-Aware Optimistic Transactions

Cache

Cache

Cache

REST-Server

REST-Server

REST-Server

DB

Coordinator

Client

Page 281: Cloud Databases in Research and Practice

SCOT: Scalable Cache-Aware Optimistic Transactions

Cache

Cache

Cache

REST-Server

REST-Server

REST-Server

DB

Coordinator

Client

Begin Transaction

Bloom Filter1

Page 282: Cloud Databases in Research and Practice

SCOT: Scalable Cache-Aware Optimistic Transactions

Cache

Cache

Cache

REST-Server

REST-Server

REST-Server

DB

Coordinator

Client

Begin Transaction

Bloom Filter1

Reads

Writes2

Writes

(Hidden)

Page 283: Cloud Databases in Research and Practice

SCOT: Scalable Cache-Aware Optimistic Transactions

Cache

Cache

Cache

REST-Server

REST-Server

REST-Server

DB

Coordinator

Client

Begin Transaction

Bloom Filter1

Reads

Writes2

Commit: read- & write-set versions

Committed OR aborted + stale objects3

Writes

(Hidden)

Page 284: Cloud Databases in Research and Practice

SCOT: Scalable Cache-Aware Optimistic Transactions

Cache

Cache

Cache

REST-Server

REST-Server

REST-Server

DB

Coordinator

Client

Begin Transaction

Bloom Filter1

Reads

Writes2

Commit: read- & write-set versions

Committed OR aborted + stale objects3

Writes

(Hidden)

validation 4

E.g. Redis or ZooKeeper

prevent conflicting

validations

Page 285: Cloud Databases in Research and Practice

SCOT: Scalable Cache-Aware Optimistic Transactions

Cache

Cache

Cache

REST-Server

REST-Server

REST-Server

DB

Coordinator

Client

Begin Transaction

Bloom Filter1

Reads

Writes2

Commit: read- & write-set versions

Committed OR aborted + stale objects3

Writes

(Hidden)

validation 4

E.g. Redis or ZooKeeper

5Writes (Public)

Read all

prevent conflicting

validations

Page 286: Cloud Databases in Research and Practice

SCOT: Scalable Cache-Aware Optimistic Transactions

Cache

Cache

Cache

REST-Server

REST-Server

REST-Server

DB

Coordinator

Client

Begin Transaction

Bloom Filter1

Reads

Writes2

Commit: read- & write-set versions

Committed OR aborted + stale objects3

Writes

(Hidden)

validation 4

E.g. Redis or ZooKeeper

5Writes (Public)

Read all

prevent conflicting

validations

Caching → Shorter transactionduration → less aborts

Page 287: Cloud Databases in Research and Practice

PolyglotPersistence

application

Orestes servers

REST/HTTPprotocol

Redis MongoDB db4o

meta data contains SLA

parse SLA & route data

manage materialisation

resolve mapping

Polyglot Persistence Mediator

Page 288: Cloud Databases in Research and Practice

PolyglotPersistence

application

Orestes servers

REST/HTTPprotocol

Redis MongoDB db4o

meta data contains SLA

parse SLA & route data

manage materialisation

resolve mapping

Polyglot Persistence Mediator

Page 289: Cloud Databases in Research and Practice

PolyglotPersistence

application

Orestes servers

REST/HTTPprotocol

Redis MongoDB db4o

meta data contains SLA

parse SLA & route data

manage materialisation

resolve mapping

Polyglot Persistence Mediator

Page 290: Cloud Databases in Research and Practice

PolyglotPersistence

application

Orestes servers

REST/HTTPprotocol

Redis MongoDB db4o

meta data contains SLA

parse SLA & route data

manage materialisation

resolve mapping

Polyglot Persistence Mediator

Results:Article-Objects with Impression Count

Article

IDTitle…

Imp.

Imp.ID

MongoDB Redis Sorted Set

Speedup with PPM:• 50-1000%• 66% performance of Varnish

Page 291: Cloud Databases in Research and Practice

Cloud Evaluation of ORESTES

Client Machine

50 ...

Web

CacheOrestes

Server

Versant

DB

Amazon EC2 Ireland EC2 USA165 ms

Client Machine

Client Machine

Page 292: Cloud Databases in Research and Practice

Cloud Evaluation of ORESTES

Client Machine

50 ...

Web

CacheOrestes

Server

Versant

DB

Amazon EC2 Ireland EC2 USA165 ms

Client Machine

Client Machine

30 000 Objekte

500 Anfragen/

Client

30 000 Objects500 Req./Clients10/1 Read/Write

Page 293: Cloud Databases in Research and Practice

Cloud Evaluation of ORESTES

Client Machine

50 ...

Web

CacheOrestes

Server

Versant

DB

Amazon EC2 Ireland EC2 USA165 ms

Client Machine

Client Machine

Page 294: Cloud Databases in Research and Practice

Stateless Scale-out REST middleware for DBaaS

Generic Schema, Authentication, Multi-Tenancy, etc.

SCOT (Scalable Optimistic Cache-Aware Transactions):◦ Optimistic Database-indepdent ACID transactions

BFB (Bloom Filter Bounded Staleness):◦ Allows static caching with tunable consistency guarantee

PPM (Polyglot Persistence Mediator):◦ SLAs appropriate database

Orestes: Summary

Page 295: Cloud Databases in Research and Practice

From Research toPractice

Page 296: Cloud Databases in Research and Practice

Orestes as a startup

Baqend

Page 297: Cloud Databases in Research and Practice

Orestes as a startup

Baqend

Internet

Seoxy

REST-API Transactions Schema Management Cache Consistency

Auto-Scaling Multi-Tenancy Security and Access Control Provisioning

Page 298: Cloud Databases in Research and Practice

Orestes as a startup

Baqend

Page 299: Cloud Databases in Research and Practice

Orestes as a startup

Baqend

Page 300: Cloud Databases in Research and Practice

Orestes as a startup

Baqend

Page 301: Cloud Databases in Research and Practice

Orestes as a startup

Baqend

Page 302: Cloud Databases in Research and Practice

Orestes as a startup

Baqend

Page 303: Cloud Databases in Research and Practice

Baqend in Action

Page 304: Cloud Databases in Research and Practice

Baqend in ActionGET /app.html

Page 305: Cloud Databases in Research and Practice

Baqend in ActionGET /app.html

db.find(Menu, 'main').done(...);

db.find(Page, 'hero').done(...);

db.query(Page, 'top3').done(...);

Page 306: Cloud Databases in Research and Practice

Baqend in ActionGET /app.html

db.find(Menu, 'main').done(...);

db.find(Page, 'hero').done(...);

db.query(Page, 'top3').done(...);

GET /img/pic005.jpgGET /img/pic017.jpgGET /img/pic022.jpg

Page 307: Cloud Databases in Research and Practice

More: Baqend.com

Page 308: Cloud Databases in Research and Practice

Wrap-up & book recommendations

Page 309: Cloud Databases in Research and Practice

How to choose a cloud database:

Wrap-up

Managed RDBMS

Managed DWH

Managed NoSQL DB

Backend-as-a-Service

Proprietary Serivce

Object Store

Page 310: Cloud Databases in Research and Practice

How to choose a cloud database:

Wrap-up

Define your functionalrequirements

Define your non-functionalrequirements

Managed RDBMS

Managed DWH

Managed NoSQL DB

Backend-as-a-Service

Proprietary Serivce

Object Store

Page 311: Cloud Databases in Research and Practice

How to choose a cloud database:

Wrap-up

Define your functionalrequirements

Define your non-functionalrequirements

Managed RDBMS

Managed DWH

Managed NoSQL DB

Backend-as-a-Service

Proprietary Serivce

Object Store

1. Underyling DB2. Docs & books3. Your own tests

Evaluate by:

Page 312: Cloud Databases in Research and Practice

How to choose a cloud database:

Wrap-up

Define your functionalrequirements

Define your non-functionalrequirements

Managed RDBMS

Managed DWH

Managed NoSQL DB

Backend-as-a-Service

Proprietary Serivce

Object Store

1. Underyling DB2. Docs & books3. Your own tests

4. SLAs5. Docs & books6. Experience Reports

Evaluate by:

Page 313: Cloud Databases in Research and Practice

How to choose a cloud database:

Wrap-up

Define your functionalrequirements

Define your non-functionalrequirements

Managed RDBMS

Managed DWH

Managed NoSQL DB

Backend-as-a-Service

Proprietary Serivce

Object Store

1. Underyling DB2. Docs & books3. Your own tests

4. SLAs5. Docs & books6. Experience Reports

Evaluate by:

Try it

Page 314: Cloud Databases in Research and Practice

Book recommendations

Page 315: Cloud Databases in Research and Practice

Book recommendations

• (Non-scientific) literature is rare• Some books cover specific DBaaS and

cloud platforms

Page 316: Cloud Databases in Research and Practice

Blogs

http://nosql.mypopescu.com/

http://www.dzone.com/mz/nosql

http://www.dbms2.com/

http://hackingdistributed.com/

http://highscalability.com/

http://www.nosqlweekly.com/

Page 317: Cloud Databases in Research and Practice

VLDB (Very Large Databases)

SIGMOD (Special Interest Group on Management of Data)

ICDE (International Conference on Data Engineering)

CIDR (Conference on Innovative Data Systems Research)

SOCC (Symposium on Cloud Computing)

OSDI/SOSP (Operating Systems Design and

Implementation/ Symposium on Operating System Principles)

EuroSys

Top Scientific ConferencesD

atabase

Research

Distrib

uted

System

s R

esearch

Page 318: Cloud Databases in Research and Practice

VLDB (Very Large Databases)

SIGMOD (Special Interest Group on Management of Data)

ICDE (International Conference on Data Engineering)

CIDR (Conference on Innovative Data Systems Research)

SOCC (Symposium on Cloud Computing)

OSDI/SOSP (Operating Systems Design and

Implementation/ Symposium on Operating System Principles)

EuroSys

Top Scientific ConferencesD

atabase

Research

Distrib

uted

System

s R

esearch

This year probably in Washington D.C.Learn more: scdm2013.com

Page 319: Cloud Databases in Research and Practice

Thank you. Queries?

Contact:

[email protected]

http://baqend.comhttp://orestes.infohttp://scdm2013.com