semi formal model for document oriented databases

37
Semi Formal Model for Document Oriented Databases Daniel Coupal Universia.com 1

Upload: daniel-coupal

Post on 09-Jun-2015

338 views

Category:

Technology


0 download

DESCRIPTION

Presentation from NoSQL Now! 2013

TRANSCRIPT

Page 1: Semi Formal Model for Document Oriented Databases

Semi Formal Model for Document Oriented DatabasesDaniel CoupalUniversia.com

1

Page 2: Semi Formal Model for Document Oriented Databases

Agenda

1.Why Having a Model?

2.Modeling Steps

3.Capturing the Model

4.Tools

2

Page 3: Semi Formal Model for Document Oriented Databases

Why having a Model?

• Documentation, common language

• Repeatable process

• Abstraction from database implementations

• Support for tools

• A document DB is supposed to be “schemaless”!

• No! Having a schema is a good thing.Need to declare everything is the problem.

3

Page 4: Semi Formal Model for Document Oriented Databases

What if you have many apps?Info about the schema is in the code of Application A

Application B wants to read the data in the DB.Where is the description of what it can read, write, ...?

4

Page 5: Semi Formal Model for Document Oriented Databases

Why we choose NoSQL?• Rewards

• Huge amount of data

• Cheap hardware

• Blazing fast

5

Page 6: Semi Formal Model for Document Oriented Databases

Why we choose NoSQL?• Rewards

• Huge amount of data

• Cheap hardware

• Blazing fast

• Compromises

• No joins, no transactions, less integrity

• Not as mature technology

• Less tools

6

Tradeoff between Performance and Data Integrity

Page 7: Semi Formal Model for Document Oriented Databases

NoSQL Little Secrets• No experience on maintaining

databases and apps over the years, which is the most expensive activity in software development.

• Not all the same vendors will be there in few years.

• What if your DB is not maintained anymore?

• What if there is a better DB available?

7

Page 8: Semi Formal Model for Document Oriented Databases

NoSQL State of the Art

• Designing by Example

• Used in most tutorials

• Works well on small examples, like blogs

• Database with more tables needs a better way to capture the design

8

Page 9: Semi Formal Model for Document Oriented Databases

{ "_id" : ObjectId("508d27069cc1ae293b36928d"), "title" : "This is the title", "body" : "This is the body text.", "tags" : [ "chocolate", "spleen", "piano", "spatula" ], "created_date" : ISODate("2012-10-28T12:41:39.110Z"), "author_id" : ObjectId("508d280e9cc1ae293b36928e"), "category_id" : ObjectId("508d29709cc1ae293b369295"), "comments" : [ { "subject" : "This is comment 1", "body" : "This is the body of comment 1.", "author_id" : ObjectId("508d345f9cc1ae293b369296"), "created_date" : ISODate("2012-10-28T13:34:23.929Z") }, { "subject" : "This is comment 2", "body" : "This is the body of comment 2.", "author_id" : ObjectId("508d34739cc1ae293b369297"), "created_date" : ISODate("2012-10-28T13:34:43.192Z") }, ] }

9

NoSQL State of the Art

Page 10: Semi Formal Model for Document Oriented Databases

Complex ER Diagram

10

Page 11: Semi Formal Model for Document Oriented Databases

Northwind ER Diagram

11

Page 12: Semi Formal Model for Document Oriented Databases

Northwind Doc Diagram

11 tables in those 5 collectionsNo need for: - CustomerCustomerDemographics - EmployeeTerritoriesbecause they are N-to-N relationships, and don’t contain any data

Products

Suppliers

Orders Employees Customers

Customer Demographics

Shippers

OrderDetailsRegion

Categories

12

Territories

Page 13: Semi Formal Model for Document Oriented Databases

That was a bad example...

• Why?

13

Page 14: Semi Formal Model for Document Oriented Databases

That was a bad example...

• Why?

• With a document database, you don’t model data as your first step!

• Data is modeled based on the usage

• SQL’s model first approach leads to bad performance for every app.NOSQL does the opposite.

14

Page 15: Semi Formal Model for Document Oriented Databases

Modeling Steps

SQL NoSQL

Goal

Answer to

Step 1

Step 2

Step 3

Step 4

general usage current usage

what answer do I have? what questions do I have?

model data write queries

write application add indexes

write queries model data

add indexes write application

15

Page 16: Semi Formal Model for Document Oriented Databases

Step 1: Write Queries

• Basic fields to retrieve

• Frequency of the query, requested speed

• Criticality of the query for the system

• Design notes

➡ Sort the queries by importance

16

Page 17: Semi Formal Model for Document Oriented Databases

Step 2: Add Indexes

• Which indexes do you need for the queries to go fast?

• Attributes of your indexes

17

Page 18: Semi Formal Model for Document Oriented Databases

Step 3: Model Data

• List the collections

• How many documents per collection?

➡ NoSQL is all about size and performance, no?

• Attributes on the collections (capped, ...)

• List the fields, their types, constraints

➡ Only for the important fields

18

Page 19: Semi Formal Model for Document Oriented Databases

Step 4: Write Application

• Integration code/driver/queries/database

• Balance between using the product functionality and isolating the layer that deals with the database.

• Interesting new tools to normalize to a common query language: JSONiq, BigSQL, ...

19

Page 20: Semi Formal Model for Document Oriented Databases

Capturing the Model

• JSON is a cool format!

• Your document database is a cool storage facility!

• Language for the model: JSON Schema• supports things like: types, cardinality, references, acceptable values, ...

20

Page 21: Semi Formal Model for Document Oriented Databases

JSON Schema

{ "address": { "streetAddress": "21 2nd Street", "city":"New York" }, "phoneNumber": [ { "type":"home", "number":"212 555-1234" } ]}

{ "type": "object", "properties": { "address": { "type": "object", "properties": { "city": { "type": "string" }, "streetAddress": { "type": "string" } } }, "phoneNumber": { "type": "array", "items": { "properties": { "number": { "type": "string" }, "type": { "type": "string" } } } } }}

21

Page 22: Semi Formal Model for Document Oriented Databases

Model: Query

• Use:• the native DB notation

• or use SQL (everyone can read SQL)

• Avoid joins!!!

• Example:• Product by ProductID, ProductName, SupplierID

• Order by OrderID, CustomerID, ContactName

• Customer by CustomerID, ContactName, OrderID

22

Page 23: Semi Formal Model for Document Oriented Databases

Example

23

{! "id" : "REQ002",! "name" : "Get product by name",! "n" : “20000/day”, “t” : “2 ms”,! "notes" : [! ! "User asking about a product availability by product name"! ],! "sqlquery" : "select * from product where product.ProductName = abcde",! "mongoquery" : {! ! "ProductName" : "abcde"! }}

Page 24: Semi Formal Model for Document Oriented Databases

Model: Index

• Again, use the native DB notation

• Example:• Product.ProductID, .ProductName, .SupplierID

• Order.OrderID, .CustomerID, .ContactName

• Customer by .CustomerID, .ContactName, .OrderID

• Why is it useful, it looks so trivial?• If written a tool can validate it or create estimates

24

Page 25: Semi Formal Model for Document Oriented Databases

Example

25

{! "id" : "REQ002",! "name" : "Get product by name",! "n" : “20000/day”, “t” : “2 ms”,! "notes" : [! ! "User asking about a product availability by product name"! ],! "sqlquery" : "select * from product where product.ProductName = abcde",! "mongoquery" : {! ! "ProductName" : "abcde"! },! "index" : {! ! "collection" : "Products",! ! "field" : "ProductName"! }}

Page 26: Semi Formal Model for Document Oriented Databases

Model: Data

• Collection

• One JSON-Schema document per collection

• Fields for collection and database

• Optionally, add a version number

26

Page 27: Semi Formal Model for Document Oriented Databases

Example for ‘Orders’

27

{ “database” : “northwind”, “collection” : “Orders”, “version” : 1, "type":"object", "$schema": “http://json-schema.org/draft-03/schema”, "id": "http://jsonschema.net", “properties”: { "CustomerID": { "type":"string", "id": "http://jsonschema.net/CustomerID" }, “Details”: { "type":"array", "id": "http://jsonschema.net/Details", "items": { “type”: “object”, "id": "http://jsonschema.net/Details/0", “required”: [ “ProductID”, “Quantity” ], "properties": { "ProductID": { "type":"number", "id": "http://jsonschema.net/Details/0/ProductID" }, "Quantity": { “type”: “number", },

Page 28: Semi Formal Model for Document Oriented Databases

Simpler...

28

{ “database” : “northwind”, “collection” : “Orders”, “version” : 1, "type":"object", "properties": { "CustomerID": { "type":"string" }, "Details": { "type":"array", "items": { "type":"object", "properties": { "ProductID": { "type":"number" }, "Quantity": { "type":"number" },...

Page 29: Semi Formal Model for Document Oriented Databases

Model: Versioning

• Each modified version of a collection is a new document

• db.<database>.find({“version:2”})

➡shows all collections for version ‘2’ of the schema for the DB.

29

Page 30: Semi Formal Model for Document Oriented Databases

Partial Schema

• Example: you just want to validate the ‘version’ field which has values as ‘string’ and as ‘number’

30

{ "type": "object", "properties": { "version": { "type": "string", } }}

{ "version": 1.0, ...},{ "version": “1.0.1”, ...}

JSON SchemaJSON

Page 31: Semi Formal Model for Document Oriented Databases

Tools

• Get some JSON Schema from JSON:

• http://www.jsonschema.net/

• Validate your schema

• http://jsonschemalint.com/

• https://github.com/dcoupal/godbtools.git

• Validate/edit JSON

• http://jsonlint.com/ or RoboMongo

• Import SQL into NoSQL

• Pentaho, Talend

31

Page 32: Semi Formal Model for Document Oriented Databases

Tools considerations

• NoSQL often relies on data being in RAM. Scanning all your data can make your dataset in memory “cold”, instead of “hot”

• running incremental validations work better, ensure you have timestamps on insertions and updates

32

Page 33: Semi Formal Model for Document Oriented Databases

Document Validator

33

Schema(JSON Schema)

Collection(JSON)

Validator

Page 34: Semi Formal Model for Document Oriented Databases

“Eventual Integrity”

• NoSQL have eventual consistency

• With tools that validate and fix the data according to a set of rules, we get “eventual integrity”

34

Page 35: Semi Formal Model for Document Oriented Databases

Tools to be developed

• UI to manipulate a schema graphically

• More Complete Validators:

• constraints

• relationships

• Per language library to validate inserted/updated documents

35

Page 36: Semi Formal Model for Document Oriented Databases

Conclusion: Take Aways

• Design in this order: queries, indexes, data, application.

• Capture your model outside the application.

• Not having a schema is not a good thing!Use the attribute ‘schemaless’ wisely!

36

NoSQL

Goal

Answer to

Step 1

Step 2

Step 3

Step 4

current usage

what questions do I have?

write queries

add indexes

model data

write application