semi formal model for document oriented databases

Post on 09-Jun-2015

338 Views

Category:

Technology

0 Downloads

Preview:

Click to see full reader

DESCRIPTION

Presentation from NoSQL Now! 2013

TRANSCRIPT

Semi Formal Model for Document Oriented DatabasesDaniel CoupalUniversia.com

1

Agenda

1.Why Having a Model?

2.Modeling Steps

3.Capturing the Model

4.Tools

2

Why having a Model?

• Documentation, common language

• Repeatable process

• Abstraction from database implementations

• Support for tools

• A document DB is supposed to be “schemaless”!

• No! Having a schema is a good thing.Need to declare everything is the problem.

3

What if you have many apps?Info about the schema is in the code of Application A

Application B wants to read the data in the DB.Where is the description of what it can read, write, ...?

4

Why we choose NoSQL?• Rewards

• Huge amount of data

• Cheap hardware

• Blazing fast

5

Why we choose NoSQL?• Rewards

• Huge amount of data

• Cheap hardware

• Blazing fast

• Compromises

• No joins, no transactions, less integrity

• Not as mature technology

• Less tools

6

Tradeoff between Performance and Data Integrity

NoSQL Little Secrets• No experience on maintaining

databases and apps over the years, which is the most expensive activity in software development.

• Not all the same vendors will be there in few years.

• What if your DB is not maintained anymore?

• What if there is a better DB available?

7

NoSQL State of the Art

• Designing by Example

• Used in most tutorials

• Works well on small examples, like blogs

• Database with more tables needs a better way to capture the design

8

{ "_id" : ObjectId("508d27069cc1ae293b36928d"), "title" : "This is the title", "body" : "This is the body text.", "tags" : [ "chocolate", "spleen", "piano", "spatula" ], "created_date" : ISODate("2012-10-28T12:41:39.110Z"), "author_id" : ObjectId("508d280e9cc1ae293b36928e"), "category_id" : ObjectId("508d29709cc1ae293b369295"), "comments" : [ { "subject" : "This is comment 1", "body" : "This is the body of comment 1.", "author_id" : ObjectId("508d345f9cc1ae293b369296"), "created_date" : ISODate("2012-10-28T13:34:23.929Z") }, { "subject" : "This is comment 2", "body" : "This is the body of comment 2.", "author_id" : ObjectId("508d34739cc1ae293b369297"), "created_date" : ISODate("2012-10-28T13:34:43.192Z") }, ] }

9

NoSQL State of the Art

Complex ER Diagram

10

Northwind ER Diagram

11

Northwind Doc Diagram

11 tables in those 5 collectionsNo need for: - CustomerCustomerDemographics - EmployeeTerritoriesbecause they are N-to-N relationships, and don’t contain any data

Products

Suppliers

Orders Employees Customers

Customer Demographics

Shippers

OrderDetailsRegion

Categories

12

Territories

That was a bad example...

• Why?

13

That was a bad example...

• Why?

• With a document database, you don’t model data as your first step!

• Data is modeled based on the usage

• SQL’s model first approach leads to bad performance for every app.NOSQL does the opposite.

14

Modeling Steps

SQL NoSQL

Goal

Answer to

Step 1

Step 2

Step 3

Step 4

general usage current usage

what answer do I have? what questions do I have?

model data write queries

write application add indexes

write queries model data

add indexes write application

15

Step 1: Write Queries

• Basic fields to retrieve

• Frequency of the query, requested speed

• Criticality of the query for the system

• Design notes

➡ Sort the queries by importance

16

Step 2: Add Indexes

• Which indexes do you need for the queries to go fast?

• Attributes of your indexes

17

Step 3: Model Data

• List the collections

• How many documents per collection?

➡ NoSQL is all about size and performance, no?

• Attributes on the collections (capped, ...)

• List the fields, their types, constraints

➡ Only for the important fields

18

Step 4: Write Application

• Integration code/driver/queries/database

• Balance between using the product functionality and isolating the layer that deals with the database.

• Interesting new tools to normalize to a common query language: JSONiq, BigSQL, ...

19

Capturing the Model

• JSON is a cool format!

• Your document database is a cool storage facility!

• Language for the model: JSON Schema• supports things like: types, cardinality, references, acceptable values, ...

20

JSON Schema

{ "address": { "streetAddress": "21 2nd Street", "city":"New York" }, "phoneNumber": [ { "type":"home", "number":"212 555-1234" } ]}

{ "type": "object", "properties": { "address": { "type": "object", "properties": { "city": { "type": "string" }, "streetAddress": { "type": "string" } } }, "phoneNumber": { "type": "array", "items": { "properties": { "number": { "type": "string" }, "type": { "type": "string" } } } } }}

21

Model: Query

• Use:• the native DB notation

• or use SQL (everyone can read SQL)

• Avoid joins!!!

• Example:• Product by ProductID, ProductName, SupplierID

• Order by OrderID, CustomerID, ContactName

• Customer by CustomerID, ContactName, OrderID

22

Example

23

{! "id" : "REQ002",! "name" : "Get product by name",! "n" : “20000/day”, “t” : “2 ms”,! "notes" : [! ! "User asking about a product availability by product name"! ],! "sqlquery" : "select * from product where product.ProductName = abcde",! "mongoquery" : {! ! "ProductName" : "abcde"! }}

Model: Index

• Again, use the native DB notation

• Example:• Product.ProductID, .ProductName, .SupplierID

• Order.OrderID, .CustomerID, .ContactName

• Customer by .CustomerID, .ContactName, .OrderID

• Why is it useful, it looks so trivial?• If written a tool can validate it or create estimates

24

Example

25

{! "id" : "REQ002",! "name" : "Get product by name",! "n" : “20000/day”, “t” : “2 ms”,! "notes" : [! ! "User asking about a product availability by product name"! ],! "sqlquery" : "select * from product where product.ProductName = abcde",! "mongoquery" : {! ! "ProductName" : "abcde"! },! "index" : {! ! "collection" : "Products",! ! "field" : "ProductName"! }}

Model: Data

• Collection

• One JSON-Schema document per collection

• Fields for collection and database

• Optionally, add a version number

26

Example for ‘Orders’

27

{ “database” : “northwind”, “collection” : “Orders”, “version” : 1, "type":"object", "$schema": “http://json-schema.org/draft-03/schema”, "id": "http://jsonschema.net", “properties”: { "CustomerID": { "type":"string", "id": "http://jsonschema.net/CustomerID" }, “Details”: { "type":"array", "id": "http://jsonschema.net/Details", "items": { “type”: “object”, "id": "http://jsonschema.net/Details/0", “required”: [ “ProductID”, “Quantity” ], "properties": { "ProductID": { "type":"number", "id": "http://jsonschema.net/Details/0/ProductID" }, "Quantity": { “type”: “number", },

Simpler...

28

{ “database” : “northwind”, “collection” : “Orders”, “version” : 1, "type":"object", "properties": { "CustomerID": { "type":"string" }, "Details": { "type":"array", "items": { "type":"object", "properties": { "ProductID": { "type":"number" }, "Quantity": { "type":"number" },...

Model: Versioning

• Each modified version of a collection is a new document

• db.<database>.find({“version:2”})

➡shows all collections for version ‘2’ of the schema for the DB.

29

Partial Schema

• Example: you just want to validate the ‘version’ field which has values as ‘string’ and as ‘number’

30

{ "type": "object", "properties": { "version": { "type": "string", } }}

{ "version": 1.0, ...},{ "version": “1.0.1”, ...}

JSON SchemaJSON

Tools

• Get some JSON Schema from JSON:

• http://www.jsonschema.net/

• Validate your schema

• http://jsonschemalint.com/

• https://github.com/dcoupal/godbtools.git

• Validate/edit JSON

• http://jsonlint.com/ or RoboMongo

• Import SQL into NoSQL

• Pentaho, Talend

31

Tools considerations

• NoSQL often relies on data being in RAM. Scanning all your data can make your dataset in memory “cold”, instead of “hot”

• running incremental validations work better, ensure you have timestamps on insertions and updates

32

Document Validator

33

Schema(JSON Schema)

Collection(JSON)

Validator

“Eventual Integrity”

• NoSQL have eventual consistency

• With tools that validate and fix the data according to a set of rules, we get “eventual integrity”

34

Tools to be developed

• UI to manipulate a schema graphically

• More Complete Validators:

• constraints

• relationships

• Per language library to validate inserted/updated documents

35

Conclusion: Take Aways

• Design in this order: queries, indexes, data, application.

• Capture your model outside the application.

• Not having a schema is not a good thing!Use the attribute ‘schemaless’ wisely!

36

NoSQL

Goal

Answer to

Step 1

Step 2

Step 3

Step 4

current usage

what questions do I have?

write queries

add indexes

model data

write application

Questions?

• dcoupal@universia.com

37

top related