document databases

32
Document databases The mystery revealed

Upload: qframe

Post on 18-Dec-2014

526 views

Category:

Technology


1 download

DESCRIPTION

You’ve heard the buzzwords noSQL, doc db. But what are they really? In this talk we will cover what you need to know to get started, starting off with a new mindset of approaching your data. We will look at the major differences, get a crash course into data structure design, and see how we can use all that in .NET. To top it off, we’ll dive into MongoDB to grasp some of the internals.

TRANSCRIPT

Page 1: Document databases

Document databasesThe mystery revealed

Page 2: Document databases

Contents noSQL

Culture shock

Document databases Concepts Benefits

Schema design

MongoDB Internals Use in .NET

Page 3: Document databases

noSQL

Collective term fora range of db’s

Non-relational

Key/value pairs key = field name

Page 4: Document databases

Document Databases

Page 5: Document databases

Comparison

Article- id- authori

d- title- content

Comment- id- articlei

d- messa

ge

Author- id- name- email

Article- _id- title- content- author

- comments[]

- _id- name- email

Relational Document db

Page 6: Document databases

Terminology In parallel with SQL:

Relational Document db

Table Collection

Row Document

Column Field

Index Index

Join Embedding & linking

Schema N/A

Page 7: Document databases

Data integrity Shift of responsibilities to the app

Manage data integrity and validity yourself Database more efficient and more scalable

DB

data integrity & validity checks

APPLICATION

Page 8: Document databases

Concepts Joins

No joins Joins at "design time", not at "query time“ Due to embedded docs and arrays less joins are needed

Constraints No foreign key constraints Unique indexes

Transactions No commit/rollback Atomic operations

Multiple actions inside the same document Incl. embedded documents

Page 9: Document databases

Dynamic schema No schema Implied: definition in the app, not the db A field can exist in certain docs and not in

others When indexing null as a value Sparse index: exclude docs without that field

Writing to a non-existent collection or database Lazy creation

Reading from a non-existent collection Empty value returned

Page 10: Document databases

Relations Embedded fields

Can be queried, the parent doc is returned Can be indexed Can’t be used for ordering

Linking Get the 2nd doc yourself in de app via a reference Avoid where possible Use for:

Many-to-many relations Subdoc often needs to be modified

Page 11: Document databases

Benefits Scalable: good for a lot of data / traffic

Horizontal scaling: to more nodes Good for web-apps

Performance No joins and constraints

Dev/user friendly Data is modeled to how the app is going to use it No conversion between object oriented >

relational No static schema = agile

Page 12: Document databases

Drawbacks More mistake-prone

No data integrity checks Database is app-specific

Less flexibility for shared usage Data aggregation is harder

Less suitable for reporting

Page 13: Document databases

Schema Design

Page 14: Document databases

Schema design Start from application-specific queries

“What questions do I have?” vs “What answers” “Data like the application wants it”

Base parent documents on: The most common usage What do I want returned?

Page 15: Document databases

Schema design Hybrid embed / link

Changing the author name is a seldom occurring action First update author.name Then update the articles async

Article- _id- author

- content

- _id- name- email

Author- _id- name- email

Page 16: Document databases

Schema design Data duplication & denormalisation

Pro simplicity optimalisation (less IO operations) query processing

Con more disk usage data integrity

Embedded docs Recommended < 250 kB

Page 17: Document databases

Product

Single collection inheritance

Product- _id- price

Book- author- title

Album- artist- title

Jeans- size- color

Book- _id- price- author- title

Relational Document db

Jeans- _id- price- size- color

Page 18: Document databases

Product

Single collection inheritance

Product- _id- price

Book- author- title

Album- artist- title

Jeans- size- color

_type: Book- _id- price- author- title

Relational Document db

_type: Jeans- _id- price- size- color

Page 19: Document databases

One-to-many Embedded array / array keys

Some queries get harder You can index arrays!

Normalized approach More flexibility A lot less performance

Article- _id- content- tags: {“foo”, “bar”}- comments: {“id1”,

“id2”}

Page 20: Document databases

Many-to-many Using array keys No join table

References on both sides

Advantage: simple queriesarticles.Where(p =>

p.CategoryIds.Contains(categoryId))categories.Where(c =>

c.ArticleIds.Contains(articleId)) Disadvantage: duplication, update two docs

Article- _id- content- category_ids : {“id1”,

“id2”}

Category- _id- name- article_ids: {“id7”,

“id8”}

Page 21: Document databases

Many-to-many References on one side

Advantage: data in one place Disadvantage: 2 queries

articles.Where(p => p.CategoryIds.Contains(categoryId))

var article = articles.Single(p => p.Id == articleId)

categories.Where(c => c.Id.In(article.CategoryIds))

Article- _id- content- category_ids : {“id1”,

“id2”}

Category- _id- name

Page 22: Document databases

To sum up A new mind set

Serialize complex .NET objects directly to the db Data duplication and denormalisation are key

Big shift of responsibilities to the app No built-in data integrity checks

Database has a single responsibility: storing data Quicker and easier to scale

Page 23: Document databases
Page 24: Document databases

MongoDB Why MongoDB?

Largest user base, mature Platform independent Open source, free

Source: Google Trends

Page 25: Document databases

MongoDB: internals Durability

By default through replication Single server durability: less performance

Eventual consistency Configure fsync: sync between memory and disk

by default every 60 sec. Configure replicate before return

Page 26: Document databases

MongoDB: internals Safe mode

Turn off eventual consistency sync directly to the disk sufficiently replicate data, in replication sets

Calls GetLastError to determine whether the action was successful

Applies to actions without a return value On connection or action level

Page 27: Document databases

MongoDB: internals Replication sets

Nodes that are copies of each other Set-up of master and slave nodes If the master goes down, the slave automatically

takes over and promotes itself to master

Page 28: Document databases

Sharding Scale out Clusters of replica sets Connected to

a central proxy used by clients

config servers contain meta-data

Write to multiple nodes

MongoDB: internals

Page 29: Document databases

MongoDB: internals Sharding

Based on a shard key (= field) Commands are sent to the shard that includes the

relevant range of the data Data is evenly distributed across the shards Automatic reallocation of data when adding or

removing servers

Page 30: Document databases

MongoDB: internals BSON

Data storage and network transfer format Binary serialized JSON

System collections db.systems.collections db.systems.indexes

Geospatial indexing Find results closest to coordinate db.places.find({ loc: {$near: [50, 4], $maxDistance:

5} })

Page 31: Document databases

DEMOMongoDB in .NET

Page 32: Document databases

Links http://

www.mongodb.org/display/DOCS/CSharp+Language+Center Quick-start Documentation LINQ Serialization

http://mongly.com/ Free eBook Interactive tutorial