the mimicking octopus - appspot.comjindal-web.appspot.com/slides/vldb2010_workshop.pdf · cheap...

35
Information Systems Group The Mimicking Octopus Towards a one-size-fits-all Architecture for Database Systems Alekh Jindal Supervisor: Prof. Dr. Jens Dittrich VLDB PhD Workshop September 13, 2010

Upload: others

Post on 26-Sep-2020

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: The Mimicking Octopus - appspot.comjindal-web.appspot.com/slides/VLDB2010_workshop.pdf · Cheap Fares Ticket Booking Booking Archives Flight Search. September 13, 2010 Towards a one-size-fits-all

Information Systems Group

The Mimicking Octopus Towards a one-size-fits-all Architecture for Database Systems

Alekh JindalSupervisor: Prof. Dr. Jens Dittrich

VLDB PhD Workshop September 13, 2010

Page 2: The Mimicking Octopus - appspot.comjindal-web.appspot.com/slides/VLDB2010_workshop.pdf · Cheap Fares Ticket Booking Booking Archives Flight Search. September 13, 2010 Towards a one-size-fits-all

September 13, 2010 Towards a one-size-fits-all Database Architecture - Alekh Jindal

Information Systems GroupDatabase Landscape

2Motivation

OLTP

OLAP

StreamingSystem

ArchivalSystem

SearchEngine

Page 3: The Mimicking Octopus - appspot.comjindal-web.appspot.com/slides/VLDB2010_workshop.pdf · Cheap Fares Ticket Booking Booking Archives Flight Search. September 13, 2010 Towards a one-size-fits-all

September 13, 2010 Towards a one-size-fits-all Database Architecture - Alekh Jindal

Information Systems GroupDatabase Landscape

2Motivation

OLTP

OLAP

StreamingSystem

ArchivalSystem

SearchEngine

Company Information System

Page 4: The Mimicking Octopus - appspot.comjindal-web.appspot.com/slides/VLDB2010_workshop.pdf · Cheap Fares Ticket Booking Booking Archives Flight Search. September 13, 2010 Towards a one-size-fits-all

September 13, 2010 Towards a one-size-fits-all Database Architecture - Alekh Jindal

Information Systems GroupDatabase Landscape

2Motivation

OLTP

OLAP

StreamingSystem

ArchivalSystem

SearchEngine

Airline Company

Several ApplicationsEvolving Applications

ETL style data pipelinesEventual Integration

Licensing Cost

DBA CostMaintenance Cost

Engineering Cost

Integration Cost

Hard-coded optimizationsHard-coded data layouts

Reporting

Cheap Fares

Ticket Booking

Booking Archives

Flight Search

Page 5: The Mimicking Octopus - appspot.comjindal-web.appspot.com/slides/VLDB2010_workshop.pdf · Cheap Fares Ticket Booking Booking Archives Flight Search. September 13, 2010 Towards a one-size-fits-all

September 13, 2010 Towards a one-size-fits-all Database Architecture - Alekh Jindal

Information Systems GroupProblem Statement

• Single database system

• Automatic adaption

• Improved performance

• Lower cost

• Better maintainability

3Motivation

Page 6: The Mimicking Octopus - appspot.comjindal-web.appspot.com/slides/VLDB2010_workshop.pdf · Cheap Fares Ticket Booking Booking Archives Flight Search. September 13, 2010 Towards a one-size-fits-all

September 13, 2010 Towards a one-size-fits-all Database Architecture - Alekh Jindal

Information Systems GroupOctopusDB Overview

• One-size-fits-all architecture

• Abstract storage concept: Storage Views(SV)

• Single optimization problem: SV Selection

• Holistic SV optimizer

4OctopusDB

Page 7: The Mimicking Octopus - appspot.comjindal-web.appspot.com/slides/VLDB2010_workshop.pdf · Cheap Fares Ticket Booking Booking Archives Flight Search. September 13, 2010 Towards a one-size-fits-all

September 13, 2010 Towards a one-size-fits-all Database Architecture - Alekh Jindal

Information Systems GroupSystem Architecture

5

• No hard-coded store

• All operations recorded as logical log entries in a primary log on stable storage using WAL

OctopusDB

Storage View StorePrimary Log Store

Log SV

Storage View Catalog

API

Purging & Checkpointing

Recovery Manager

Holistic SV Optimizer

Transaction Manager

Result

Query Catalog

Page 8: The Mimicking Octopus - appspot.comjindal-web.appspot.com/slides/VLDB2010_workshop.pdf · Cheap Fares Ticket Booking Booking Archives Flight Search. September 13, 2010 Towards a one-size-fits-all

September 13, 2010 Towards a one-size-fits-all Database Architecture - Alekh Jindal

Information Systems GroupSystem Architecture

5

• No hard-coded store

• All operations recorded as logical log entries in a primary log on stable storage using WAL

OctopusDB

Storage View StorePrimary Log Store

Log SV

Storage View Catalog

API

Purging & Checkpointing

Recovery Manager

Holistic SV Optimizer

Transaction Manager

Result

Query Catalog

Page 9: The Mimicking Octopus - appspot.comjindal-web.appspot.com/slides/VLDB2010_workshop.pdf · Cheap Fares Ticket Booking Booking Archives Flight Search. September 13, 2010 Towards a one-size-fits-all

September 13, 2010 Towards a one-size-fits-all Database Architecture - Alekh Jindal

Information Systems GroupSystem Architecture

5

• No hard-coded store

• All operations recorded as logical log entries in a primary log on stable storage using WAL

OctopusDB

Storage View StorePrimary Log Store

Log SV

Storage View Catalog

API

Purging & Checkpointing

Recovery Manager

Holistic SV Optimizer

Transaction Manager

Result

Query Catalog

Page 10: The Mimicking Octopus - appspot.comjindal-web.appspot.com/slides/VLDB2010_workshop.pdf · Cheap Fares Ticket Booking Booking Archives Flight Search. September 13, 2010 Towards a one-size-fits-all

September 13, 2010 Towards a one-size-fits-all Database Architecture - Alekh Jindal

Information Systems GroupStorage Views

• Arbitrary physical representations of data

• Different layouts under a single umbrella

6Storage Views

Page 11: The Mimicking Octopus - appspot.comjindal-web.appspot.com/slides/VLDB2010_workshop.pdf · Cheap Fares Ticket Booking Booking Archives Flight Search. September 13, 2010 Towards a one-size-fits-all

September 13, 2010 Towards a one-size-fits-all Database Architecture - Alekh Jindal

Information Systems GroupStorage Views

• Arbitrary physical representations of data

• Different layouts under a single umbrella

6Storage Views

Log SVRow SVColumn SVIndex SV

Primary

Page 12: The Mimicking Octopus - appspot.comjindal-web.appspot.com/slides/VLDB2010_workshop.pdf · Cheap Fares Ticket Booking Booking Archives Flight Search. September 13, 2010 Towards a one-size-fits-all

September 13, 2010 Towards a one-size-fits-all Database Architecture - Alekh Jindal

Information Systems GroupStorage Views

• Arbitrary physical representations of data

• Different layouts under a single umbrella

6Storage Views

Log SVRow SVColumn SVIndex SV

Partial Index SVBag-partitioned SVKey-consolidated SVVertically/Horizontally Partitioned SV

Primary Secondary

Page 13: The Mimicking Octopus - appspot.comjindal-web.appspot.com/slides/VLDB2010_workshop.pdf · Cheap Fares Ticket Booking Booking Archives Flight Search. September 13, 2010 Towards a one-size-fits-all

September 13, 2010 Towards a one-size-fits-all Database Architecture - Alekh Jindal

Information Systems GroupStorage Views

• Arbitrary physical representations of data

• Different layouts under a single umbrella

6Storage Views

Log SVRow SVColumn SVIndex SV

Partial Index SVBag-partitioned SVKey-consolidated SVVertically/Horizontally Partitioned SV

Primary Secondary

... any hybrid combination of the above

Page 14: The Mimicking Octopus - appspot.comjindal-web.appspot.com/slides/VLDB2010_workshop.pdf · Cheap Fares Ticket Booking Booking Archives Flight Search. September 13, 2010 Towards a one-size-fits-all

September 13, 2010 Towards a one-size-fits-all Database Architecture - Alekh Jindal

Information Systems GroupUse-case Scenario*

• Flight booking system

• Tables: Tickets, Customers

• Tickets: several attributes, frequently updated

• Customers: fewer attributes

• Queries:SELECT C.* FROM Tickets T, Customers CWHERE T.customer_id=C.id AND T.a1=x1 AND T.a2=x2 ... AND T.an=xn

7

* Inspired from Unterbrunner et al. in PVLDB, 2009.

Storage Views

Page 15: The Mimicking Octopus - appspot.comjindal-web.appspot.com/slides/VLDB2010_workshop.pdf · Cheap Fares Ticket Booking Booking Archives Flight Search. September 13, 2010 Towards a one-size-fits-all

September 13, 2010 Towards a one-size-fits-all Database Architecture - Alekh Jindal

Information Systems GroupFlight Booking System

Log SV Result

tickets.customer_id

πcustomer.* ( ))σ

a1=x1 .... an=xn(

customers.id

8

SELECT C.* FROM Tickets T , Customers CWHERE T.customer_id=C.idAND T.a1=x1 .... AND T.an=xn

Customers

Tickets

customers, 01, <tom, 25, customers, 02, <marc, 23, customers 03, <felix, 20, customers, 03, <felix, 20, .....

tickets, 301, <paris, rome, E,...>tickets, 302, <moscow, berlin, B,...>tickets, 303, <tokyo, beijing, E,...>tickets, 303, <tokyo, beijing, B,..>.....

customers, 01, <tom, 25, [email protected], ...>customers, 02, <marc, 23, [email protected], ...>tickets, 301, <paris, rome, E,...>tickets, 302, <moscow, berlin, B,...>tickets, 303, <tokyo, beijing, E,...>customers 03, <felix, 20, [email protected], ...>customers, 03, <felix, 20, [email protected], ...>tickets, 303, <tokyo, beijing, B,..>..........

Storage Views

Page 16: The Mimicking Octopus - appspot.comjindal-web.appspot.com/slides/VLDB2010_workshop.pdf · Cheap Fares Ticket Booking Booking Archives Flight Search. September 13, 2010 Towards a one-size-fits-all

September 13, 2010 Towards a one-size-fits-all Database Architecture - Alekh Jindal

Information Systems GroupBag-partitioning

Log SV

Log SV

Log SV Result

σbag=customers

σbag=tickets

tickets.customer_id

πcustomers.*

( ))σa1=x1 ... an=xn( customer.id

tickets log

customers log

9

Customers

Tickets

customers, 01, <tom, 25, [email protected], ...>customers, 02, <marc, 23, [email protected], ...>customers 03, <felix, 20, [email protected], ...>customers, 03, <felix, 20, [email protected], ...>.....

tickets, 301, <paris, rome, E,...>tickets, 302, <moscow, berlin, B,...>tickets, 303, <tokyo, beijing, E,...>tickets, 303, <tokyo, beijing, B,..>.....

SELECT C.* FROM Tickets T , Customers CWHERE T.customer_id=C.idAND T.a1=x1 .... AND T.an=xn

customers, 01, <tom, 25, customers, 02, <marc, 23, customers 03, <felix, 20, customers, 03, <felix, 20, .....

tickets, 301, <paris, rome, E,...>tickets, 302, <moscow, berlin, B,...>tickets, 303, <tokyo, beijing, E,...>tickets, 303, <tokyo, beijing, B,..>.....

customers, 01, <tom, 25, customers, 02, <marc, 23, tickets, 301, <paris, rome, E,...>tickets, 302, <moscow, berlin, B,...>tickets, 303, <tokyo, beijing, E,...>customers 03, <felix, 20, customers, 03, <felix, 20, tickets, 303, <tokyo, beijing, B,..>..........

Storage Views

Page 17: The Mimicking Octopus - appspot.comjindal-web.appspot.com/slides/VLDB2010_workshop.pdf · Cheap Fares Ticket Booking Booking Archives Flight Search. September 13, 2010 Towards a one-size-fits-all

September 13, 2010 Towards a one-size-fits-all Database Architecture - Alekh Jindal

Information Systems GroupKey-consolidation

Log SV

Log SV

Log SV Result

σbag=customers

Γbag,keyrecent

γ (( ))

σbag=ticketsΓ

bag,keyrecentγ (( ))

10

Customers

Tickets

customers, 01, <tom, 25, [email protected], ...>customers, 02, <marc, 23, [email protected], ...>customers, 03, <felix, 20, [email protected], ...>.....

tickets, 301, <paris, rome, E,...>tickets, 302, <moscow, berlin, B,...>tickets, 303, <tokyo, beijing, B,..>.....

SELECT C.* FROM Tickets T , Customers CWHERE T.customer_id=C.idAND T.a1=x1 .... AND T.an=xn

tickets.customer_id

πcustomers.*

( ))σa1=x1 ... an=xn( customer.id

customers, 01, <tom, 25, customers, 02, <marc, 23, customers 03, <felix, 20, customers, 03, <felix, 20, .....

tickets, 301, <paris, rome, E,...>tickets, 302, <moscow, berlin, B,...>tickets, 303, <tokyo, beijing, E,...>tickets, 303, <tokyo, beijing, B,..>.....

customers, 01, <tom, 25, customers, 02, <marc, 23, tickets, 301, <paris, rome, E,...>tickets, 302, <moscow, berlin, B,...>tickets, 303, <tokyo, beijing, E,...>customers 03, <felix, 20, customers, 03, <felix, 20, tickets, 303, <tokyo, beijing, B,..>..........

customers

Storage Views

Page 18: The Mimicking Octopus - appspot.comjindal-web.appspot.com/slides/VLDB2010_workshop.pdf · Cheap Fares Ticket Booking Booking Archives Flight Search. September 13, 2010 Towards a one-size-fits-all

September 13, 2010 Towards a one-size-fits-all Database Architecture - Alekh Jindal

Information Systems GroupStorage View Transformation

Col SV

Row SV

Log SV Result

tickets

customers

11

Customers

Tickets

SELECT C.* FROM Tickets T , Customers CWHERE T.customer_id=C.idAND T.a1=x1 .... AND T.an=xn

tickets.customer_id

πcustomers.*

( ))σa1=x1 ... an=xn( customer.id

σbag=ticketsΓ

bag,keyrecentγ (( ))

σbag=customers

Γbag,keyrecent

γ (( ))

customers, 01, <tom, 25, customers, 02, <marc, 23, customers 03, <felix, 20, customers, 03, <felix, 20, .....

tickets, 301, <paris, rome, E,...>tickets, 302, <moscow, berlin, B,...>tickets, 303, <tokyo, beijing, E,...>tickets, 303, <tokyo, beijing, B,..>.....

customers, 01, <tom, 25, customers, 02, <marc, 23, tickets, 301, <paris, rome, E,...>tickets, 302, <moscow, berlin, B,...>tickets, 303, <tokyo, beijing, E,...>customers 03, <felix, 20, customers, 03, <felix, 20, tickets, 303, <tokyo, beijing, B,..>..........

tickets, 301, <paris, rome, E,...>tickets, 302, <moscow, berlin, B,...>tickets, 303, <tokyo, beijing, B,..>.....

customers, 01, <tom, 25, [email protected], ...>customers, 02, <marc, 23, [email protected], ...>customers, 03, <felix, 20, [email protected], ...>.....

Storage Views

Page 19: The Mimicking Octopus - appspot.comjindal-web.appspot.com/slides/VLDB2010_workshop.pdf · Cheap Fares Ticket Booking Booking Archives Flight Search. September 13, 2010 Towards a one-size-fits-all

September 13, 2010 Towards a one-size-fits-all Database Architecture - Alekh Jindal

Information Systems GroupHot-Cold Storage Views

Customers

Tickets

Col SV

Row SV

Log SV Result

σbag=customers

Γbag,keyrecent

γ (( ))

σbag=ticketsΓ

bag,keyrecentγ ((

))σ

Col SV

σ time<

now-7d

ays

ticketsHot

ticketsCold

12

tickets, 303, <tokyo, beijing, B,..>.....

tickets, 301, <paris, rome, E,...>tickets, 302, <moscow, berlin, B,...>.....

SELECT C.* FROM Tickets T , Customers CWHERE T.customer_id=C.idAND T.a1=x1 .... AND T.an=xn

tickets.customer_id

πcustomers.*

( ))σa1=x1 ... an=xn( customer.id

customers, 01, <tom, 25, customers, 02, <marc, 23, customers 03, <felix, 20, customers, 03, <felix, 20, .....

tickets, 301, <paris, rome, E,...>tickets, 302, <moscow, berlin, B,...>tickets, 303, <tokyo, beijing, E,...>tickets, 303, <tokyo, beijing, B,..>.....

customers, 01, <tom, 25, customers, 02, <marc, 23, tickets, 301, <paris, rome, E,...>tickets, 302, <moscow, berlin, B,...>tickets, 303, <tokyo, beijing, E,...>customers 03, <felix, 20, customers, 03, <felix, 20, tickets, 303, <tokyo, beijing, B,..>..........

customers, 01, <tom, 25, customers, 02, <marc, 23, customers, 03, <felix, 20, .....

customers

Storage Views

time>=now-7days

Page 20: The Mimicking Octopus - appspot.comjindal-web.appspot.com/slides/VLDB2010_workshop.pdf · Cheap Fares Ticket Booking Booking Archives Flight Search. September 13, 2010 Towards a one-size-fits-all

September 13, 2010 Towards a one-size-fits-all Database Architecture - Alekh Jindal

Information Systems Group

σtime>=now-7days

customers, 01, <tom, 25, customers, 02, <marc, 23, customers, 03, <felix, 20, .....

customers, 01, <tom, 25, customers, 02, <marc, 23, customers 03, <felix, 20, customers, 03, <felix, 20, .....

Index Storage Views

Customers

Tickets

Col SV

Row SV

Log SV Result

σbag=customers

Γbag,keyrecent

γ (( ))

σbag=ticketsΓ

bag,keyrecentγ ((

))

Col SV

σtime<

now-7d

ays

13

SELECT C.* FROM Tickets T , Customers CWHERE T.customer_id=C.idAND T.a1=x1 .... AND T.an=xn

tickets.customer_id

πcustomers.*

( ))σa1=x1 ... an=xn( customer.id

Index SV

Index SV

ticketsHotIndex

customersIndex

π

π

id,rid

price, rid

tickets, 303, <tokyo, beijing, B,..>.....

tickets, 301, <paris, rome, E,...>tickets, 302, <moscow, berlin, B,...>.....

Storage Views

tickets, 301, <paris, rome, E,...>tickets, 302, <moscow, berlin, B,...>tickets, 303, <tokyo, beijing, E,...>tickets, 303, <tokyo, beijing, B,..>.....

customers, 01, <tom, 25, customers, 02, <marc, 23, tickets, 301, <paris, rome, E,...>tickets, 302, <moscow, berlin, B,...>tickets, 303, <tokyo, beijing, E,...>customers 03, <felix, 20, customers, 03, <felix, 20, tickets, 303, <tokyo, beijing, B,..>..........

Page 21: The Mimicking Octopus - appspot.comjindal-web.appspot.com/slides/VLDB2010_workshop.pdf · Cheap Fares Ticket Booking Booking Archives Flight Search. September 13, 2010 Towards a one-size-fits-all

September 13, 2010 Towards a one-size-fits-all Database Architecture - Alekh Jindal

Information Systems Group

σtime>=now-7days

customers, 01, <tom, 25, customers, 02, <marc, 23, customers, 03, <felix, 20, .....

customers, 01, <tom, 25, customers, 02, <marc, 23, customers 03, <felix, 20, customers, 03, <felix, 20, .....

Index Storage Views

Customers

Tickets

Col SV

Row SV

Log SV Result

σbag=customers

Γbag,keyrecent

γ (( ))

σbag=ticketsΓ

bag,keyrecentγ ((

))

Col SV

σtime<

now-7d

ays

13

SELECT C.* FROM Tickets T , Customers CWHERE T.customer_id=C.idAND T.a1=x1 .... AND T.an=xn

tickets.customer_id

πcustomers.*

( ))σa1=x1 ... an=xn( customer.id

Index SV

Index SV

ticketsHotIndex

customersIndex

π

π

id,rid

price, rid

tickets, 303, <tokyo, beijing, B,..>.....

tickets, 301, <paris, rome, E,...>tickets, 302, <moscow, berlin, B,...>.....

Storage Views

tickets, 301, <paris, rome, E,...>tickets, 302, <moscow, berlin, B,...>tickets, 303, <tokyo, beijing, E,...>tickets, 303, <tokyo, beijing, B,..>.....

customers, 01, <tom, 25, customers, 02, <marc, 23, tickets, 301, <paris, rome, E,...>tickets, 302, <moscow, berlin, B,...>tickets, 303, <tokyo, beijing, E,...>customers 03, <felix, 20, customers, 03, <felix, 20, tickets, 303, <tokyo, beijing, B,..>..........

Isn’t this same as Materialized Views?

Page 22: The Mimicking Octopus - appspot.comjindal-web.appspot.com/slides/VLDB2010_workshop.pdf · Cheap Fares Ticket Booking Booking Archives Flight Search. September 13, 2010 Towards a one-size-fits-all

September 13, 2010 Towards a one-size-fits-all Database Architecture - Alekh Jindal

Information Systems Group

σtime>=now-7days

customers, 01, <tom, 25, customers, 02, <marc, 23, customers, 03, <felix, 20, .....

customers, 01, <tom, 25, customers, 02, <marc, 23, customers 03, <felix, 20, customers, 03, <felix, 20, .....

Index Storage Views

Customers

Tickets

Col SV

Row SV

Log SV Result

σbag=customers

Γbag,keyrecent

γ (( ))

σbag=ticketsΓ

bag,keyrecentγ ((

))

Col SV

σtime<

now-7d

ays

13

SELECT C.* FROM Tickets T , Customers CWHERE T.customer_id=C.idAND T.a1=x1 .... AND T.an=xn

tickets.customer_id

πcustomers.*

( ))σa1=x1 ... an=xn( customer.id

Index SV

Index SV

ticketsHotIndex

customersIndex

π

π

id,rid

price, rid

tickets, 303, <tokyo, beijing, B,..>.....

tickets, 301, <paris, rome, E,...>tickets, 302, <moscow, berlin, B,...>.....

NO!Storage Views

tickets, 301, <paris, rome, E,...>tickets, 302, <moscow, berlin, B,...>tickets, 303, <tokyo, beijing, E,...>tickets, 303, <tokyo, beijing, B,..>.....

customers, 01, <tom, 25, customers, 02, <marc, 23, tickets, 301, <paris, rome, E,...>tickets, 302, <moscow, berlin, B,...>tickets, 303, <tokyo, beijing, E,...>customers 03, <felix, 20, customers, 03, <felix, 20, tickets, 303, <tokyo, beijing, B,..>..........

Isn’t this same as Materialized Views?

Page 23: The Mimicking Octopus - appspot.comjindal-web.appspot.com/slides/VLDB2010_workshop.pdf · Cheap Fares Ticket Booking Booking Archives Flight Search. September 13, 2010 Towards a one-size-fits-all

September 13, 2010 Towards a one-size-fits-all Database Architecture - Alekh Jindal

Information Systems Group

σtime>=now-7days

customers, 01, <tom, 25, customers, 02, <marc, 23, customers, 03, <felix, 20, .....

customers, 01, <tom, 25, customers, 02, <marc, 23, customers 03, <felix, 20, customers, 03, <felix, 20, .....

Index Storage Views

Customers

Tickets

Col SV

Row SV

Log SV Result

σbag=customers

Γbag,keyrecent

γ (( ))

σbag=ticketsΓ

bag,keyrecentγ ((

))

Col SV

σtime<

now-7d

ays

13

SELECT C.* FROM Tickets T , Customers CWHERE T.customer_id=C.idAND T.a1=x1 .... AND T.an=xn

tickets.customer_id

πcustomers.*

( ))σa1=x1 ... an=xn( customer.id

Index SV

Index SV

ticketsHotIndex

customersIndex

π

π

id,rid

price, rid

tickets, 303, <tokyo, beijing, B,..>.....

tickets, 301, <paris, rome, E,...>tickets, 302, <moscow, berlin, B,...>.....

Storage Views

tickets, 301, <paris, rome, E,...>tickets, 302, <moscow, berlin, B,...>tickets, 303, <tokyo, beijing, E,...>tickets, 303, <tokyo, beijing, B,..>.....

customers, 01, <tom, 25, customers, 02, <marc, 23, tickets, 301, <paris, rome, E,...>tickets, 302, <moscow, berlin, B,...>tickets, 303, <tokyo, beijing, E,...>customers 03, <felix, 20, customers, 03, <felix, 20, tickets, 303, <tokyo, beijing, B,..>..........

Materialized View knows what to materialize

Page 24: The Mimicking Octopus - appspot.comjindal-web.appspot.com/slides/VLDB2010_workshop.pdf · Cheap Fares Ticket Booking Booking Archives Flight Search. September 13, 2010 Towards a one-size-fits-all

September 13, 2010 Towards a one-size-fits-all Database Architecture - Alekh Jindal

Information Systems Group

σtime>=now-7days

customers, 01, <tom, 25, customers, 02, <marc, 23, customers, 03, <felix, 20, .....

customers, 01, <tom, 25, customers, 02, <marc, 23, customers 03, <felix, 20, customers, 03, <felix, 20, .....

Index Storage Views

Customers

Tickets

Col SV

Row SV

Log SV Result

σbag=customers

Γbag,keyrecent

γ (( ))

σbag=ticketsΓ

bag,keyrecentγ ((

))

Col SV

σtime<

now-7d

ays

13

SELECT C.* FROM Tickets T , Customers CWHERE T.customer_id=C.idAND T.a1=x1 .... AND T.an=xn

tickets.customer_id

πcustomers.*

( ))σa1=x1 ... an=xn( customer.id

Index SV

Index SV

ticketsHotIndex

customersIndex

π

π

id,rid

price, rid

tickets, 303, <tokyo, beijing, B,..>.....

tickets, 301, <paris, rome, E,...>tickets, 302, <moscow, berlin, B,...>.....

Storage Views

tickets, 301, <paris, rome, E,...>tickets, 302, <moscow, berlin, B,...>tickets, 303, <tokyo, beijing, E,...>tickets, 303, <tokyo, beijing, B,..>.....

customers, 01, <tom, 25, customers, 02, <marc, 23, tickets, 301, <paris, rome, E,...>tickets, 302, <moscow, berlin, B,...>tickets, 303, <tokyo, beijing, E,...>customers 03, <felix, 20, customers, 03, <felix, 20, tickets, 303, <tokyo, beijing, B,..>..........

Materialized View knows what to materializeStorage View also knows how to materialize

Page 25: The Mimicking Octopus - appspot.comjindal-web.appspot.com/slides/VLDB2010_workshop.pdf · Cheap Fares Ticket Booking Booking Archives Flight Search. September 13, 2010 Towards a one-size-fits-all

September 13, 2010 Towards a one-size-fits-all Database Architecture - Alekh Jindal

Information Systems Group

σtime>=now-7days

customers, 01, <tom, 25, customers, 02, <marc, 23, customers, 03, <felix, 20, .....

customers, 01, <tom, 25, customers, 02, <marc, 23, customers 03, <felix, 20, customers, 03, <felix, 20, .....

Index Storage Views

Customers

Tickets

Col SV

Row SV

Log SV Result

σbag=customers

Γbag,keyrecent

γ (( ))

σbag=ticketsΓ

bag,keyrecentγ ((

))

Col SV

σtime<

now-7d

ays

13

SELECT C.* FROM Tickets T , Customers CWHERE T.customer_id=C.idAND T.a1=x1 .... AND T.an=xn

tickets.customer_id

πcustomers.*

( ))σa1=x1 ... an=xn( customer.id

Index SV

Index SV

ticketsHotIndex

customersIndex

π

π

id,rid

price, rid

tickets, 303, <tokyo, beijing, B,..>.....

tickets, 301, <paris, rome, E,...>tickets, 302, <moscow, berlin, B,...>.....

Storage Views

tickets, 301, <paris, rome, E,...>tickets, 302, <moscow, berlin, B,...>tickets, 303, <tokyo, beijing, E,...>tickets, 303, <tokyo, beijing, B,..>.....

customers, 01, <tom, 25, customers, 02, <marc, 23, tickets, 301, <paris, rome, E,...>tickets, 302, <moscow, berlin, B,...>tickets, 303, <tokyo, beijing, E,...>customers 03, <felix, 20, customers, 03, <felix, 20, tickets, 303, <tokyo, beijing, B,..>..........

Materialized View knows what to materializeStorage View also knows how to materialize

A Materialized View still needs a Storage View

Page 26: The Mimicking Octopus - appspot.comjindal-web.appspot.com/slides/VLDB2010_workshop.pdf · Cheap Fares Ticket Booking Booking Archives Flight Search. September 13, 2010 Towards a one-size-fits-all

September 13, 2010 Towards a one-size-fits-all Database Architecture - Alekh Jindal

Information Systems GroupStorage View Selection

Customers

Tickets

Col SV

Row SV

Log SV Result

σbag=customers

Γbag,keyrecent

γ (( ))

σbag=ticketsΓ

bag,keyrecentγ ((

))σ

time>=now-7days

Col SV

σtime<

now-7d

ays

14

SELECT C.* FROM Tickets T , Customers CWHERE T.customer_id=C.idAND T.a1=x1 .... AND T.an=xn

Index SV

Index SV

ticketsHotIndex

customersIndex

π

π

id,rid

price, rid

ticketsCold

Result

Result

ResultResult Result

Optimizer

Page 27: The Mimicking Octopus - appspot.comjindal-web.appspot.com/slides/VLDB2010_workshop.pdf · Cheap Fares Ticket Booking Booking Archives Flight Search. September 13, 2010 Towards a one-size-fits-all

September 13, 2010 Towards a one-size-fits-all Database Architecture - Alekh Jindal

Information Systems GroupStorage View Selection

Customers

Tickets

Col SV

Row SV

Log SV Result

σbag=customers

Γbag,keyrecent

γ (( ))

σbag=ticketsΓ

bag,keyrecentγ ((

))σ

time>=now-7days

Col SV

σtime<

now-7d

ays

14

SELECT C.* FROM Tickets T , Customers CWHERE T.customer_id=C.idAND T.a1=x1 .... AND T.an=xn

Index SV

Index SV

ticketsHotIndex

customersIndex

π

π

id,rid

price, rid

ticketsCold

Result

Result

ResultResult Result

Pick right Storage Views to:create, update, query and drop

Optimizer

Page 28: The Mimicking Octopus - appspot.comjindal-web.appspot.com/slides/VLDB2010_workshop.pdf · Cheap Fares Ticket Booking Booking Archives Flight Search. September 13, 2010 Towards a one-size-fits-all

September 13, 2010 Towards a one-size-fits-all Database Architecture - Alekh Jindal

Information Systems GroupStorage View Selection

Customers

Tickets

Col SV

Row SV

Log SV Result

σbag=customers

Γbag,keyrecent

γ (( ))

σbag=ticketsΓ

bag,keyrecentγ ((

))σ

time>=now-7days

Col SV

σtime<

now-7d

ays

14

SELECT C.* FROM Tickets T , Customers CWHERE T.customer_id=C.idAND T.a1=x1 .... AND T.an=xn

Index SV

Index SV

ticketsHotIndex

customersIndex

π

π

id,rid

price, rid

ticketsCold

Result

Result

ResultResult Result

Single Optimization Problem:“Storage View Selection”

Optimizer

Page 29: The Mimicking Octopus - appspot.comjindal-web.appspot.com/slides/VLDB2010_workshop.pdf · Cheap Fares Ticket Booking Booking Archives Flight Search. September 13, 2010 Towards a one-size-fits-all

September 13, 2010 Towards a one-size-fits-all Database Architecture - Alekh Jindal

Information Systems GroupHolistic Storage View Optimizer

• Storage totally dynamic:Any subset of data in Any storage structure

• Storage View selection

• Storage View update maintenance

• Pick physical execution plan

• Combine results spanning several Storage Views

15Optimizer

Page 30: The Mimicking Octopus - appspot.comjindal-web.appspot.com/slides/VLDB2010_workshop.pdf · Cheap Fares Ticket Booking Booking Archives Flight Search. September 13, 2010 Towards a one-size-fits-all

September 13, 2010 Towards a one-size-fits-all Database Architecture - Alekh Jindal

Information Systems GroupResearch Challenges

• Single umbrella for different storage layouts- storage layer abstraction - still layout specific specialization

• Automatic adaptive bifurcation- monolithic system- right online algorithms

• Simplicity vs Optimization- only as complex as required- mimic several specialized systems

16

Challenges & Related

Page 31: The Mimicking Octopus - appspot.comjindal-web.appspot.com/slides/VLDB2010_workshop.pdf · Cheap Fares Ticket Booking Booking Archives Flight Search. September 13, 2010 Towards a one-size-fits-all

September 13, 2010 Towards a one-size-fits-all Database Architecture - Alekh Jindal

Information Systems GroupRelated Work

• Materialized Views [Chirkova et. al. VLDBJ 2002]- as pointed before different from storage views

• Dynamic materialized views [Zhou et. al. ICDE 2007]- horizontal dynamism, storage view still open

• View matching, query containment [A. Y. Halevy VLDBJ 2001]- again operate on a higher level

• Cracked databases [Idreos et. al. CIDR 2007]- logical partitioning of data, only horizontal

• Rodent store [Cudre-Mauroux et. al. CIDR 2009]- still assumes a store

• GMAP [Tsatalos et. al. VLDB 1994]- does not adapt the stores

17

Challenges & Related

Page 32: The Mimicking Octopus - appspot.comjindal-web.appspot.com/slides/VLDB2010_workshop.pdf · Cheap Fares Ticket Booking Booking Archives Flight Search. September 13, 2010 Towards a one-size-fits-all

September 13, 2010 Towards a one-size-fits-all Database Architecture - Alekh Jindal

Information Systems GroupOptimizer Cost Model

18

Symbol Meaning Model

C logscan(N) Log SV scan cost

‰ PNi=1 colsize(logi)

m

ı· Crandom +

‰ PNi=1 colsize(logi)

pageSize

ı/BW

Crowscan(N) Row SV scan cost

‰N·

PAi⇤A colsize(Ai)

m

ı· Crandom +

‰N·

PAi⇤A colsize(Ai)

pageSize

ı/BW

Ccolscan(N, S) Col SV scan cost

PAi⇤S

„‰N·

PAi⇤S colsize(Ai)

m

ı· Crandom +

lN·colsize(Ai)

pageSize

m/BW

«

C indexlookup(N) Index lookup cost Crandom · dlogF (N · (colsize(key) + pointerSize)/pageSize)e

Crow cl. indexscan (N, sel) Unclustered Indexed Row SV scan cost C index

lookup(N) + Crowscan(dsel · Ne)

Ccol. cl. indexscan (N, S, sel) Unclustered Indexed Col SV scan cost C index

lookup(N) + Ccolscan(dsel · Ne, S)

Crow uncl. indexscan (N, sel) Clustered Indexed Row SV scan cost C index

lookup + dsel · Ne · (Crandom + pageSize/BW)

Ccol. uncl. indexscan (N, S, sel) Unclustered Indexed Col SV scan cost C index

lookup + dsel · Ne · |S| · (Crandom + pageSize/BW)

Table 1: SV Query Cost modelSymbol Meaning ModelC log

update(Nu) Log SV update cost C logscan(Nu)

Crowupdate(N, Nu) Row SV update cost min

“Crandom +

lNNc

m· Crow

scan(2 · Nc),l

NNc

m· Crow

scan(Nc) + Nu · (Crandom + pageSize/BW)”

Ccolupdate(N, Nu, S) Col SV update cost min

“Crandom +

lNNc

m· Ccol

scan(2 · Nc),l

NNc

m· Ccol

scan(Nc) + Nu · |S| · (Crandom + pageSize/BW”

C indexsplit (d) Index split cost

“Pdi=1 (psplit)

i”

· Crandom

Crow cl. indexupdate (N, Nu, d) Cl. Index Row SV update cost C index

lookup(N) + 2 · Crowscan(Nu) + C index

split (d)

Ccol. cl. indexupdate (N, Nu, S, d) Cl. Index Col SV update cost C index

lookup(N) + 2 · Ccolscan(Nu, S) + C index

split (d)

Crow uncl. indexupdate (N, Nu, d) Uncl. Index Row SV update cost C index

lookup + Nu · (Crandom + pageSize/BW ) + C indexsplit (d)

Ccol. uncl. indexupdate (N, Nu, S, d) Uncl. Index Col SV update cost C index

lookup + Nu · |S| · (Crandom + pageSize/BW ) + C indexsplit (d)

Table 2: SV Update Cost model

transformation below. We consider Log, Row, Col, and Index SV.Table 3 describes the symbols used in our cost models.Query Cost. Table 1 shows the query cost models for Log, Row,Col, and Index SVs. We express each of the cost functions as asummation of random and sequential I/O costs. We consider thescan operation to be I/O-bound and hence neglect CPU costs. No-tice that the scan operations for Row and Col SVs are bufferedreads, i.e. OctopusDB reads as many tuples from a SV as can fitin the memory assigned to it. We need buffered reading for Col SV,because we need to join individual attributes to re-construct the tu-ple; for Row SV we also consider the additional random I/O costswhen reading multiple relations competing for the same hard disk,e.g. for join processing.

Symbol Meaning UnitN number of rows in tableNc number of rows in a table chunkBW sequential bandwidth of hard disk Pages/secCrandom costs for a random access secpageSize size of a page BytepointerSize size of a pointer in an index ByteF fan-out of index node

= 1 +j

pageSize�2·pointerSize2·colsize(key)

k

d depth of an index tree=

llogF

“lN·(colsize(key)+pointerSize)

pageSize

m”m

sel query selectivitym available main memory ByteA set of all attributes in tablekey indexed attributecolsize(Ai) size of a value of attribute i ByteS set of selected attributespsplit probability of a leaf split

Table 3: Symbols used in cost models

SV Transformation CostLog SV! Row SV C log

scan(N) + Crowscan(N)

Log SV! Col SV C logscan(N) + Ccol

scan(N, A)Row SV$ Col SV Crow

scan(N) + Ccolscan(N, A)

Row SV! Index SV Crowscan(N) +

“F d+1�1

F�1

”· Crandom

Col SV! Index SV Ccolscan(N, {key,rowID}) +

“F d+1�1

F�1

”· Crandom

Table 4: SV Transformation Cost modelUpdate Cost. All updates to OctopusDB are done in the Primary

Log and OctopusDB later propagates them recursively to the sub-sequent SVs using any appropriate maintenance algorithm. There-fore, update costs are a crucial factor when determining which SVsto keep. Table 2 shows our update cost model for Log, Row, Col,and Index SVs. For Row and Col SVs, we assume that either thetuples are scanned and updated in chunks of Nc; or each updatetriggers a random-I/O. We take the minimum cost among these twooptions as the update cost (as done by a cost-based optimizer). Forupdates in Index SVs, we also consider the costs to split leaves ornodes in the index structure (C index

split ). We model the probability ofhaving a node/leaf split at a level as exponentially proportional tothe depth of the level in the index tree.

For each call to registerSV or registerQuery, OctopusDB storesthe reference to output SV or Query in the SV or Query Catalogrespectively. Again, the holistic SV optimizer is responsible forpropagating updates from the primary log to all SVs recursively.There are several ways, e.g. lazy updates, to do such SV main-tenance. We believe that existing works from materialized viewscould be adapted in OctopusDB for SV maintenance. However,OctopusDB poses several new challenges, e.g. how to compute theoptimal number of stores. This also has to consider the amountof overlap among stores, i.e. to avoid extensive update costs forredundant data representations.Transformation Cost. Finally, we also model the costs to trans-form one type of SV to another in Table 4. We consider transfor-mation as a query scan on the input SV followed by a update scanon the output SV. For Index SV, only the index attributes and rowIDneed to be read; the index tree needs to be built on those attributesonly. The transformation cost model can be used by the holistic SVoptimizer when considering to transform one SV to another, e.g.whether to transform a Row SV into a Col SV. Transformation costis the price that OctopusDB has to pay while the benefit could bethe reduced scan costs i.e. the difference between the iteration costsof the old and new SVs, or reduced update cost. As SVs are fullyoptional, OctopusDB may balance the two cost factors based on agiven workload.

The three cost models discussed above form the backbone of theholistic SV optimizer. Based on these cost models the holistic SVoptimizer can create, maintain, scan, transform, or delete any SV.

5

Symbol Meaning Model

C logscan(N) Log SV scan cost

‰ PNi=1 colsize(logi)

m

ı· Crandom +

‰ PNi=1 colsize(logi)

pageSize

ı/BW

Crowscan(N) Row SV scan cost

‰N·

PAi⇤A colsize(Ai)

m

ı· Crandom +

‰N·

PAi⇤A colsize(Ai)

pageSize

ı/BW

Ccolscan(N, S) Col SV scan cost

PAi⇤S

„‰N·

PAi⇤S colsize(Ai)

m

ı· Crandom +

lN·colsize(Ai)

pageSize

m/BW

«

C indexlookup(N) Index lookup cost Crandom · dlogF (N · (colsize(key) + pointerSize)/pageSize)e

Crow cl. indexscan (N, sel) Unclustered Indexed Row SV scan cost C index

lookup(N) + Crowscan(dsel · Ne)

Ccol. cl. indexscan (N, S, sel) Unclustered Indexed Col SV scan cost C index

lookup(N) + Ccolscan(dsel · Ne, S)

Crow uncl. indexscan (N, sel) Clustered Indexed Row SV scan cost C index

lookup + dsel · Ne · (Crandom + pageSize/BW)

Ccol. uncl. indexscan (N, S, sel) Unclustered Indexed Col SV scan cost C index

lookup + dsel · Ne · |S| · (Crandom + pageSize/BW)

Table 1: SV Query Cost modelSymbol Meaning ModelC log

update(Nu) Log SV update cost C logscan(Nu)

Crowupdate(N, Nu) Row SV update cost min

“Crandom +

lNNc

m· Crow

scan(2 · Nc),l

NNc

m· Crow

scan(Nc) + Nu · (Crandom + pageSize/BW)”

Ccolupdate(N, Nu, S) Col SV update cost min

“Crandom +

lNNc

m· Ccol

scan(2 · Nc),l

NNc

m· Ccol

scan(Nc) + Nu · |S| · (Crandom + pageSize/BW”

C indexsplit (d) Index split cost

“Pdi=1 (psplit)

i”

· Crandom

Crow cl. indexupdate (N, Nu, d) Cl. Index Row SV update cost C index

lookup(N) + 2 · Crowscan(Nu) + C index

split (d)

Ccol. cl. indexupdate (N, Nu, S, d) Cl. Index Col SV update cost C index

lookup(N) + 2 · Ccolscan(Nu, S) + C index

split (d)

Crow uncl. indexupdate (N, Nu, d) Uncl. Index Row SV update cost C index

lookup + Nu · (Crandom + pageSize/BW ) + C indexsplit (d)

Ccol. uncl. indexupdate (N, Nu, S, d) Uncl. Index Col SV update cost C index

lookup + Nu · |S| · (Crandom + pageSize/BW ) + C indexsplit (d)

Table 2: SV Update Cost model

transformation below. We consider Log, Row, Col, and Index SV.Table 3 describes the symbols used in our cost models.Query Cost. Table 1 shows the query cost models for Log, Row,Col, and Index SVs. We express each of the cost functions as asummation of random and sequential I/O costs. We consider thescan operation to be I/O-bound and hence neglect CPU costs. No-tice that the scan operations for Row and Col SVs are bufferedreads, i.e. OctopusDB reads as many tuples from a SV as can fitin the memory assigned to it. We need buffered reading for Col SV,because we need to join individual attributes to re-construct the tu-ple; for Row SV we also consider the additional random I/O costswhen reading multiple relations competing for the same hard disk,e.g. for join processing.

Symbol Meaning UnitN number of rows in tableNc number of rows in a table chunkBW sequential bandwidth of hard disk Pages/secCrandom costs for a random access secpageSize size of a page BytepointerSize size of a pointer in an index ByteF fan-out of index node

= 1 +j

pageSize�2·pointerSize2·colsize(key)

k

d depth of an index tree=

llogF

“lN·(colsize(key)+pointerSize)

pageSize

m”m

sel query selectivitym available main memory ByteA set of all attributes in tablekey indexed attributecolsize(Ai) size of a value of attribute i ByteS set of selected attributespsplit probability of a leaf split

Table 3: Symbols used in cost models

SV Transformation CostLog SV! Row SV C log

scan(N) + Crowscan(N)

Log SV! Col SV C logscan(N) + Ccol

scan(N, A)Row SV$ Col SV Crow

scan(N) + Ccolscan(N, A)

Row SV! Index SV Crowscan(N) +

“F d+1�1

F�1

”· Crandom

Col SV! Index SV Ccolscan(N, {key,rowID}) +

“F d+1�1

F�1

”· Crandom

Table 4: SV Transformation Cost modelUpdate Cost. All updates to OctopusDB are done in the Primary

Log and OctopusDB later propagates them recursively to the sub-sequent SVs using any appropriate maintenance algorithm. There-fore, update costs are a crucial factor when determining which SVsto keep. Table 2 shows our update cost model for Log, Row, Col,and Index SVs. For Row and Col SVs, we assume that either thetuples are scanned and updated in chunks of Nc; or each updatetriggers a random-I/O. We take the minimum cost among these twooptions as the update cost (as done by a cost-based optimizer). Forupdates in Index SVs, we also consider the costs to split leaves ornodes in the index structure (C index

split ). We model the probability ofhaving a node/leaf split at a level as exponentially proportional tothe depth of the level in the index tree.

For each call to registerSV or registerQuery, OctopusDB storesthe reference to output SV or Query in the SV or Query Catalogrespectively. Again, the holistic SV optimizer is responsible forpropagating updates from the primary log to all SVs recursively.There are several ways, e.g. lazy updates, to do such SV main-tenance. We believe that existing works from materialized viewscould be adapted in OctopusDB for SV maintenance. However,OctopusDB poses several new challenges, e.g. how to compute theoptimal number of stores. This also has to consider the amountof overlap among stores, i.e. to avoid extensive update costs forredundant data representations.Transformation Cost. Finally, we also model the costs to trans-form one type of SV to another in Table 4. We consider transfor-mation as a query scan on the input SV followed by a update scanon the output SV. For Index SV, only the index attributes and rowIDneed to be read; the index tree needs to be built on those attributesonly. The transformation cost model can be used by the holistic SVoptimizer when considering to transform one SV to another, e.g.whether to transform a Row SV into a Col SV. Transformation costis the price that OctopusDB has to pay while the benefit could bethe reduced scan costs i.e. the difference between the iteration costsof the old and new SVs, or reduced update cost. As SVs are fullyoptional, OctopusDB may balance the two cost factors based on agiven workload.

The three cost models discussed above form the backbone of theholistic SV optimizer. Based on these cost models the holistic SVoptimizer can create, maintain, scan, transform, or delete any SV.

5

Symbol Meaning Model

C logscan(N) Log SV scan cost

‰ PNi=1 colsize(logi)

m

ı· Crandom +

‰ PNi=1 colsize(logi)

pageSize

ı/BW

Crowscan(N) Row SV scan cost

‰N·

PAi⇤A colsize(Ai)

m

ı· Crandom +

‰N·

PAi⇤A colsize(Ai)

pageSize

ı/BW

Ccolscan(N, S) Col SV scan cost

PAi⇤S

„‰N·

PAi⇤S colsize(Ai)

m

ı· Crandom +

lN·colsize(Ai)

pageSize

m/BW

«

C indexlookup(N) Index lookup cost Crandom · dlogF (N · (colsize(key) + pointerSize)/pageSize)e

Crow cl. indexscan (N, sel) Unclustered Indexed Row SV scan cost C index

lookup(N) + Crowscan(dsel · Ne)

Ccol. cl. indexscan (N, S, sel) Unclustered Indexed Col SV scan cost C index

lookup(N) + Ccolscan(dsel · Ne, S)

Crow uncl. indexscan (N, sel) Clustered Indexed Row SV scan cost C index

lookup + dsel · Ne · (Crandom + pageSize/BW)

Ccol. uncl. indexscan (N, S, sel) Unclustered Indexed Col SV scan cost C index

lookup + dsel · Ne · |S| · (Crandom + pageSize/BW)

Table 1: SV Query Cost modelSymbol Meaning ModelC log

update(Nu) Log SV update cost C logscan(Nu)

Crowupdate(N, Nu) Row SV update cost min

“Crandom +

lNNc

m· Crow

scan(2 · Nc),l

NNc

m· Crow

scan(Nc) + Nu · (Crandom + pageSize/BW)”

Ccolupdate(N, Nu, S) Col SV update cost min

“Crandom +

lNNc

m· Ccol

scan(2 · Nc),l

NNc

m· Ccol

scan(Nc) + Nu · |S| · (Crandom + pageSize/BW”

C indexsplit (d) Index split cost

“Pdi=1 (psplit)

i”

· Crandom

Crow cl. indexupdate (N, Nu, d) Cl. Index Row SV update cost C index

lookup(N) + 2 · Crowscan(Nu) + C index

split (d)

Ccol. cl. indexupdate (N, Nu, S, d) Cl. Index Col SV update cost C index

lookup(N) + 2 · Ccolscan(Nu, S) + C index

split (d)

Crow uncl. indexupdate (N, Nu, d) Uncl. Index Row SV update cost C index

lookup + Nu · (Crandom + pageSize/BW ) + C indexsplit (d)

Ccol. uncl. indexupdate (N, Nu, S, d) Uncl. Index Col SV update cost C index

lookup + Nu · |S| · (Crandom + pageSize/BW ) + C indexsplit (d)

Table 2: SV Update Cost model

transformation below. We consider Log, Row, Col, and Index SV.Table 3 describes the symbols used in our cost models.Query Cost. Table 1 shows the query cost models for Log, Row,Col, and Index SVs. We express each of the cost functions as asummation of random and sequential I/O costs. We consider thescan operation to be I/O-bound and hence neglect CPU costs. No-tice that the scan operations for Row and Col SVs are bufferedreads, i.e. OctopusDB reads as many tuples from a SV as can fitin the memory assigned to it. We need buffered reading for Col SV,because we need to join individual attributes to re-construct the tu-ple; for Row SV we also consider the additional random I/O costswhen reading multiple relations competing for the same hard disk,e.g. for join processing.

Symbol Meaning UnitN number of rows in tableNc number of rows in a table chunkBW sequential bandwidth of hard disk Pages/secCrandom costs for a random access secpageSize size of a page BytepointerSize size of a pointer in an index ByteF fan-out of index node

= 1 +j

pageSize�2·pointerSize2·colsize(key)

k

d depth of an index tree=

llogF

“lN·(colsize(key)+pointerSize)

pageSize

m”m

sel query selectivitym available main memory ByteA set of all attributes in tablekey indexed attributecolsize(Ai) size of a value of attribute i ByteS set of selected attributespsplit probability of a leaf split

Table 3: Symbols used in cost models

SV Transformation CostLog SV! Row SV C log

scan(N) + Crowscan(N)

Log SV! Col SV C logscan(N) + Ccol

scan(N, A)Row SV$ Col SV Crow

scan(N) + Ccolscan(N, A)

Row SV! Index SV Crowscan(N) +

“F d+1�1

F�1

”· Crandom

Col SV! Index SV Ccolscan(N, {key,rowID}) +

“F d+1�1

F�1

”· Crandom

Table 4: SV Transformation Cost modelUpdate Cost. All updates to OctopusDB are done in the Primary

Log and OctopusDB later propagates them recursively to the sub-sequent SVs using any appropriate maintenance algorithm. There-fore, update costs are a crucial factor when determining which SVsto keep. Table 2 shows our update cost model for Log, Row, Col,and Index SVs. For Row and Col SVs, we assume that either thetuples are scanned and updated in chunks of Nc; or each updatetriggers a random-I/O. We take the minimum cost among these twooptions as the update cost (as done by a cost-based optimizer). Forupdates in Index SVs, we also consider the costs to split leaves ornodes in the index structure (C index

split ). We model the probability ofhaving a node/leaf split at a level as exponentially proportional tothe depth of the level in the index tree.

For each call to registerSV or registerQuery, OctopusDB storesthe reference to output SV or Query in the SV or Query Catalogrespectively. Again, the holistic SV optimizer is responsible forpropagating updates from the primary log to all SVs recursively.There are several ways, e.g. lazy updates, to do such SV main-tenance. We believe that existing works from materialized viewscould be adapted in OctopusDB for SV maintenance. However,OctopusDB poses several new challenges, e.g. how to compute theoptimal number of stores. This also has to consider the amountof overlap among stores, i.e. to avoid extensive update costs forredundant data representations.Transformation Cost. Finally, we also model the costs to trans-form one type of SV to another in Table 4. We consider transfor-mation as a query scan on the input SV followed by a update scanon the output SV. For Index SV, only the index attributes and rowIDneed to be read; the index tree needs to be built on those attributesonly. The transformation cost model can be used by the holistic SVoptimizer when considering to transform one SV to another, e.g.whether to transform a Row SV into a Col SV. Transformation costis the price that OctopusDB has to pay while the benefit could bethe reduced scan costs i.e. the difference between the iteration costsof the old and new SVs, or reduced update cost. As SVs are fullyoptional, OctopusDB may balance the two cost factors based on agiven workload.

The three cost models discussed above form the backbone of theholistic SV optimizer. Based on these cost models the holistic SVoptimizer can create, maintain, scan, transform, or delete any SV.

5

Query Cost Model

Update Cost Model

Transform Cost Model

Further Directions

Page 33: The Mimicking Octopus - appspot.comjindal-web.appspot.com/slides/VLDB2010_workshop.pdf · Cheap Fares Ticket Booking Booking Archives Flight Search. September 13, 2010 Towards a one-size-fits-all

September 13, 2010 Towards a one-size-fits-all Database Architecture - Alekh Jindal

Information Systems GroupComparing Different Stores

19Further Directions

0

0.2

0.4

0.6

0.8

1

Row Store

Column Store

Indexed Row Store

Indexed Column Store

Fractured Mirrors

Indexed Fractured Mirrors

OctopusDB

work

load tim

e [s

econds]

Query Costs Update Costs

Tickets Customers

Tuples 100,000 20,000

Selectivity 0.9 0.1

AttributesReferenced 4/20 20/20

Page 34: The Mimicking Octopus - appspot.comjindal-web.appspot.com/slides/VLDB2010_workshop.pdf · Cheap Fares Ticket Booking Booking Archives Flight Search. September 13, 2010 Towards a one-size-fits-all

September 13, 2010 Towards a one-size-fits-all Database Architecture - Alekh Jindal

Information Systems GroupNext Steps

1.Automatically picking the right layout- row, column, partitioned, cracked, more?

2.Storage View compression - adaptive compression

3.Storage View maintenance- maintaining heterogenous SVs

4.OctopusDB benchmarking and evaluation - one-size-fits-all benchmark

20Further Directions

Page 35: The Mimicking Octopus - appspot.comjindal-web.appspot.com/slides/VLDB2010_workshop.pdf · Cheap Fares Ticket Booking Booking Archives Flight Search. September 13, 2010 Towards a one-size-fits-all

September 13, 2010 Towards a one-size-fits-all Database Architecture - Alekh Jindal

Information Systems GroupSummary

Storage View StorePrimary Log Store

Log SV

Storage View Catalog

API

Purging & Checkpointing

Recovery Manager

Holistic SV Optimizer

Transaction Manager

Result

Query Catalog

SELECT C.* FROM Tickets T , Customers CWHERE T.customer_id=C.idAND T.a1=x1 .... AND T.an=xn

Customers

Tickets

Col SV

Row SV

Log SV Result

!bag=customers

!bag,keyrecent

" (( ))

!bag=tickets!

bag,keyrecent" ((

))!

time>=now-7days

Col SV

!time<

now-7d

ays

tickets.customer_id

"customers.*

( ))!a1=x1 ... an=xn( customer.id

Index SV

Index SV

ticketsHotIndex

customersIndex

"

"

id,rid

price, rid

customers

ticketsCold

Thanks!

0

0.2

0.4

0.6

0.8

1

Row Store

Column Store

Indexed Row Store

Indexed Column Store

Fractured Mirrors

Indexed Fractured Mirrors

OctopusDB

wo

rklo

ad

tim

e

[se

co

nd

s]

Query Costs Update Costs

September 13, 2010 Towards a one-size-fits-all Database Architecture - Alekh Jindal

Information Systems GroupDatabase Landscape

2

Motivation

OLTP

OLAP

StreamingSystem

ArchivalSystem

SearchEngine

Airline Company

Several ApplicationsEvolving Applications

ETL style data pipelinesEventual Integration

Licensing Cost

DBA CostMaintenance Cost

Engineering Cost

Integration Cost

Hard-coded optimizationsHard-coded data layouts

Reporting

Cheap Fares

Ticket Booking

Booking Archives

Flight Search

Friday, September 10, 2010