automated schema design for nosql databases

25
Automated Schema Design for NoSQL Databases Michael J. Mior Supervised by Kenneth Salem

Upload: michael-mior

Post on 21-Apr-2017

153 views

Category:

Technology


4 download

TRANSCRIPT

Page 1: Automated Schema Design for NoSQL Databases

Automated Schema Design for NoSQL DatabasesMichael J. Mior

Supervised by Kenneth Salem

Page 2: Automated Schema Design for NoSQL Databases

Schema optimization

For a given workload, construct the optimal physical schema to answer queries in the

workload for a given storage budget

Page 3: Automated Schema Design for NoSQL Databases

What is NoSQL?

● Not SQL?● NoACID?

● Not only SQL

Page 4: Automated Schema Design for NoSQL Databases

Types of NoSQL Databases

● Document stores● Key-value stores● Graph databases● …● Wide column stores

Page 5: Automated Schema Design for NoSQL Databases

Cassandra data model

Page 6: Automated Schema Design for NoSQL Databases

Cassandra data model

Queries specify

A set of row keysand either

A list of column keysor

A prefix of the column keys with a range

Page 7: Automated Schema Design for NoSQL Databases

Cassandra schema design

Physical schema

POIsWithHotels

POI17Hotel3

“Holiday Inn”…

No single obvious mapping from the conceptual schema

Conceptual schema

Hotel

PointOfInterest

Near

Hotel8

“Motel 6”

Page 8: Automated Schema Design for NoSQL Databases

Cassandra schema designPOIsWithHotels

POI17Hotel3

“Holiday Inn”…

…HotelsWithPOIs

Hotel3POI17

“Liberty Bell”…

● Multiple choices of logical and physical schema are possible for a given conceptual schema

● Different choices are more suitable for different queries

● Harder to add additional structures in the future

Page 9: Automated Schema Design for NoSQL Databases

Cassandra schema designPOIs

POI17Name

“Liberty Bell”…

…Hotels

Hotel3Name

“Holiday Inn”…

HotelToPOI

Hotel3POI17

(null)…

…POIToHotel

POI17Hotel3

(null)…

Page 10: Automated Schema Design for NoSQL Databases

Cassandra schema design

POIToHotel

POI17Hotel3

“Holiday Inn”…

HotelToPOI

Hotel3POI17

“Liberty Bell”…

HotelToPOI

HotelIDPOIID

(null)…

…POIToHotel

POIIDHotelID

(null)…

Page 11: Automated Schema Design for NoSQL Databases

Challenges

● Storage costs

● View maintenance

● Transactions

● Data distribution

Page 12: Automated Schema Design for NoSQL Databases

Relational schema design

● Commercial database vendors have existing tools○ Microsoft AutoAdmin for SQL Server○ DB2 Design Advisor○ Oracle SQL Tuning Advisor

● These tools don’t solve all the problems for NoSQL databases ● Usually an existing schema is assumed, and tools suggest secondary

indices and materialized views

Page 13: Automated Schema Design for NoSQL Databases

Relational schema design

Conceptual schema Logical schema Physical schema

HotelCREATE TABLE Hotel(…);CREATE TABLE POI(…);CREATE TABLE HotelToPOI(…, FOREIGN KEY (HotelID) REFERENCES Hotel(HotelID) FOREIGN KEY (POOID) REFERENCES POI(POIID));

CREATE MATERIALIZED VIEW POIsWithHotels AS SELECT POI.*, Hotel.* FROM POI, HotelToPOI, Hotel WHERE HotelToPOI.HotelID = Hotel.HotelID AND HotelToPOI.POIID = POI.POIID;

Mapping from conceptual to logical schema is straightforward

Near

PointOfInterest

Page 14: Automated Schema Design for NoSQL Databases

Cassandra schema design

Physical schema

POIsWithHotels

POI17Hotel3

“Holiday Inn”…

No single obvious mapping from the conceptual schema

Conceptual schema

Hotel

Near

Hotel8

“Motel 6”

PointOfInterest

Page 15: Automated Schema Design for NoSQL Databases

Abstract example

Page 16: Automated Schema Design for NoSQL Databases

Query paths

Get all points of interests near hotels a guest has stayed at

SELECT Name FROM POI WHERE POI.HotelToPOI

.Hotel.Room.Reservation.Guest.GuestID = ?

Guest Reservation

Room

Hotel

Point ofInterest

Page 17: Automated Schema Design for NoSQL Databases

Index paths

● Queries can “skip” entities on a path using an index

● Indices to the final entity are equivalent to materialized views

● Schema design is now the selection of these indices

Guest Reservation

Room

Hotel

Point ofInterest

Page 18: Automated Schema Design for NoSQL Databases

Problem definitionInput:● ER diagram with entity counts and cardinalities● Weighted queries over the diagram● Storage constraint● Cost model for target DB

Output:● Suggested index configuration● Plans for executing each query

Page 19: Automated Schema Design for NoSQL Databases

Proposed solution

1. Enumerate possible useful indices for all queries

2. Determine the benefit of a given index for each query

3. Select the optimal indices for a given space constraint

Page 20: Automated Schema Design for NoSQL Databases

Future Work

● Retargeting to other NoSQL DBs

● Automatic implementation of query plans

● Handling workloads which include updates

● Richer query language

● Exploitation of DBMS-specific features

Page 21: Automated Schema Design for NoSQL Databases

Summary● Schema design in NoSQL databases

presents unique challenges

● A workload-driven approach is necessary

● Conceptual schemata and queries are viable abstractions for NoSQL schema design tools

Page 22: Automated Schema Design for NoSQL Databases

Query language

SELECT [attributes] FROM [entity]WHERE [path](=|<|<=|>|>=) [value] AND … ORDER BY [path] LIMIT [count]

Conjunctive queries with simple predicates and ordering

Higher level queries can be composed by the application

Page 23: Automated Schema Design for NoSQL Databases

Solution

IndexEnumerator

Query Planner

Index Selection

Index benefitanalysis

Query Plans

Selected IndicesWorkload

Input Output

Page 24: Automated Schema Design for NoSQL Databases

Implementation

● Focus on Cassandra as target system

● Simple cost model for read-only in-memory workloads

● DBMS-independent index enumerator and query planner

● Index selection ILP implemented with GLPK

Page 25: Automated Schema Design for NoSQL Databases

Evaluation - TBD

● Re-implement existing workloads in the target DBMS (e.g. RUBiS, RUBBoS)

● Develop new benchmarks ● Comparison with a human DBA

● Random ER diagrams/queries with realistic properties