document databases

Document databasesThe mystery revealed

Contents noSQL

Culture shock

Document databases Concepts Benefits

Schema design

MongoDB Internals Use in .NET

noSQL

Collective term fora range of db’s

Non-relational

Key/value pairs key = field name

Document Databases

Comparison

Article- id- authori

d- title- content

Comment- id- articlei

d- messa

ge

Author- id- name- email

Article- _id- title- content- author

- comments[]

- _id- name- email

Relational Document db

Terminology In parallel with SQL:


Table Collection

Row Document

Column Field

Index Index

Join Embedding & linking

Schema N/A

Data integrity Shift of responsibilities to the app

Manage data integrity and validity yourself Database more efficient and more scalable

DB

data integrity & validity checks

APPLICATION

Concepts Joins

No joins Joins at "design time", not at "query time“ Due to embedded docs and arrays less joins are needed

Constraints No foreign key constraints Unique indexes

Transactions No commit/rollback Atomic operations

Multiple actions inside the same document Incl. embedded documents

Dynamic schema No schema Implied: definition in the app, not the db A field can exist in certain docs and not in

others When indexing null as a value Sparse index: exclude docs without that field

Writing to a non-existent collection or database Lazy creation

Reading from a non-existent collection Empty value returned

Relations Embedded fields

Can be queried, the parent doc is returned Can be indexed Can’t be used for ordering

Linking Get the 2nd doc yourself in de app via a reference Avoid where possible Use for:

Many-to-many relations Subdoc often needs to be modified

Benefits Scalable: good for a lot of data / traffic

Horizontal scaling: to more nodes Good for web-apps

Performance No joins and constraints

Dev/user friendly Data is modeled to how the app is going to use it No conversion between object oriented >

relational No static schema = agile

Drawbacks More mistake-prone

No data integrity checks Database is app-specific

Less flexibility for shared usage Data aggregation is harder

Less suitable for reporting

Schema Design

Schema design Start from application-specific queries

“What questions do I have?” vs “What answers” “Data like the application wants it”

Base parent documents on: The most common usage What do I want returned?

Schema design Hybrid embed / link

Changing the author name is a seldom occurring action First update author.name Then update the articles async

Article- _id- author

- content

- _id- name- email

Author- _id- name- email

Schema design Data duplication & denormalisation

Pro simplicity optimalisation (less IO operations) query processing

Con more disk usage data integrity

Embedded docs Recommended < 250 kB

Product

Single collection inheritance

Product- _id- price

Book- author- title

Album- artist- title

Jeans- size- color

Book- _id- price- author- title


Jeans- _id- price- size- color

Product

Single collection inheritance

Product- _id- price

Book- author- title

Album- artist- title

Jeans- size- color

_type: Book- _id- price- author- title


_type: Jeans- _id- price- size- color

One-to-many Embedded array / array keys

Some queries get harder You can index arrays!

Normalized approach More flexibility A lot less performance

Article- _id- content- tags: {“foo”, “bar”}- comments: {“id1”,

“id2”}

Many-to-many Using array keys No join table

References on both sides

Advantage: simple queriesarticles.Where(p =>

p.CategoryIds.Contains(categoryId))categories.Where(c =>

c.ArticleIds.Contains(articleId)) Disadvantage: duplication, update two docs

Article- _id- content- category_ids : {“id1”,

“id2”}

Category- _id- name- article_ids: {“id7”,

“id8”}

Many-to-many References on one side

Advantage: data in one place Disadvantage: 2 queries

articles.Where(p => p.CategoryIds.Contains(categoryId))

var article = articles.Single(p => p.Id == articleId)

categories.Where(c => c.Id.In(article.CategoryIds))

Article- _id- content- category_ids : {“id1”,

“id2”}

Category- _id- name

To sum up A new mind set

Serialize complex .NET objects directly to the db Data duplication and denormalisation are key

Big shift of responsibilities to the app No built-in data integrity checks

Database has a single responsibility: storing data Quicker and easier to scale

MongoDB Why MongoDB?

Largest user base, mature Platform independent Open source, free

Source: Google Trends

MongoDB: internals Durability

By default through replication Single server durability: less performance

Eventual consistency Configure fsync: sync between memory and disk

by default every 60 sec. Configure replicate before return

MongoDB: internals Safe mode

Turn off eventual consistency sync directly to the disk sufficiently replicate data, in replication sets

Calls GetLastError to determine whether the action was successful

Applies to actions without a return value On connection or action level

MongoDB: internals Replication sets

Nodes that are copies of each other Set-up of master and slave nodes If the master goes down, the slave automatically

takes over and promotes itself to master

Sharding Scale out Clusters of replica sets Connected to

a central proxy used by clients

config servers contain meta-data

Write to multiple nodes

MongoDB: internals

MongoDB: internals Sharding

Based on a shard key (= field) Commands are sent to the shard that includes the

relevant range of the data Data is evenly distributed across the shards Automatic reallocation of data when adding or

removing servers

MongoDB: internals BSON

Data storage and network transfer format Binary serialized JSON

System collections db.systems.collections db.systems.indexes

Geospatial indexing Find results closest to coordinate db.places.find({ loc: {$near: [50, 4], $maxDistance:

5} })

DEMOMongoDB in .NET

Links http://

www.mongodb.org/display/DOCS/CSharp+Language+Center Quick-start Documentation LINQ Serialization

http://mongly.com/ Free eBook Interactive tutorial

http://www.mongodb.org/display/DOCS/CSharp+Language+Center



http://mongly.com/

http://mongly.com/

document databases

Technology