document databases
DESCRIPTION
You’ve heard the buzzwords noSQL, doc db. But what are they really? In this talk we will cover what you need to know to get started, starting off with a new mindset of approaching your data. We will look at the major differences, get a crash course into data structure design, and see how we can use all that in .NET. To top it off, we’ll dive into MongoDB to grasp some of the internals.TRANSCRIPT
Document databasesThe mystery revealed
Contents noSQL
Culture shock
Document databases Concepts Benefits
Schema design
MongoDB Internals Use in .NET
noSQL
Collective term fora range of db’s
Non-relational
Key/value pairs key = field name
Document Databases
Comparison
Article- id- authori
d- title- content
Comment- id- articlei
d- messa
ge
Author- id- name- email
Article- _id- title- content- author
- comments[]
- _id- name- email
Relational Document db
Terminology In parallel with SQL:
Relational Document db
Table Collection
Row Document
Column Field
Index Index
Join Embedding & linking
Schema N/A
Data integrity Shift of responsibilities to the app
Manage data integrity and validity yourself Database more efficient and more scalable
DB
data integrity & validity checks
APPLICATION
Concepts Joins
No joins Joins at "design time", not at "query time“ Due to embedded docs and arrays less joins are needed
Constraints No foreign key constraints Unique indexes
Transactions No commit/rollback Atomic operations
Multiple actions inside the same document Incl. embedded documents
Dynamic schema No schema Implied: definition in the app, not the db A field can exist in certain docs and not in
others When indexing null as a value Sparse index: exclude docs without that field
Writing to a non-existent collection or database Lazy creation
Reading from a non-existent collection Empty value returned
Relations Embedded fields
Can be queried, the parent doc is returned Can be indexed Can’t be used for ordering
Linking Get the 2nd doc yourself in de app via a reference Avoid where possible Use for:
Many-to-many relations Subdoc often needs to be modified
Benefits Scalable: good for a lot of data / traffic
Horizontal scaling: to more nodes Good for web-apps
Performance No joins and constraints
Dev/user friendly Data is modeled to how the app is going to use it No conversion between object oriented >
relational No static schema = agile
Drawbacks More mistake-prone
No data integrity checks Database is app-specific
Less flexibility for shared usage Data aggregation is harder
Less suitable for reporting
Schema Design
Schema design Start from application-specific queries
“What questions do I have?” vs “What answers” “Data like the application wants it”
Base parent documents on: The most common usage What do I want returned?
Schema design Hybrid embed / link
Changing the author name is a seldom occurring action First update author.name Then update the articles async
Article- _id- author
- content
- _id- name- email
Author- _id- name- email
Schema design Data duplication & denormalisation
Pro simplicity optimalisation (less IO operations) query processing
Con more disk usage data integrity
Embedded docs Recommended < 250 kB
Product
Single collection inheritance
Product- _id- price
Book- author- title
Album- artist- title
Jeans- size- color
Book- _id- price- author- title
Relational Document db
Jeans- _id- price- size- color
Product
Single collection inheritance
Product- _id- price
Book- author- title
Album- artist- title
Jeans- size- color
_type: Book- _id- price- author- title
Relational Document db
_type: Jeans- _id- price- size- color
One-to-many Embedded array / array keys
Some queries get harder You can index arrays!
Normalized approach More flexibility A lot less performance
Article- _id- content- tags: {“foo”, “bar”}- comments: {“id1”,
“id2”}
Many-to-many Using array keys No join table
References on both sides
Advantage: simple queriesarticles.Where(p =>
p.CategoryIds.Contains(categoryId))categories.Where(c =>
c.ArticleIds.Contains(articleId)) Disadvantage: duplication, update two docs
Article- _id- content- category_ids : {“id1”,
“id2”}
Category- _id- name- article_ids: {“id7”,
“id8”}
Many-to-many References on one side
Advantage: data in one place Disadvantage: 2 queries
articles.Where(p => p.CategoryIds.Contains(categoryId))
var article = articles.Single(p => p.Id == articleId)
categories.Where(c => c.Id.In(article.CategoryIds))
Article- _id- content- category_ids : {“id1”,
“id2”}
Category- _id- name
To sum up A new mind set
Serialize complex .NET objects directly to the db Data duplication and denormalisation are key
Big shift of responsibilities to the app No built-in data integrity checks
Database has a single responsibility: storing data Quicker and easier to scale
MongoDB Why MongoDB?
Largest user base, mature Platform independent Open source, free
Source: Google Trends
MongoDB: internals Durability
By default through replication Single server durability: less performance
Eventual consistency Configure fsync: sync between memory and disk
by default every 60 sec. Configure replicate before return
MongoDB: internals Safe mode
Turn off eventual consistency sync directly to the disk sufficiently replicate data, in replication sets
Calls GetLastError to determine whether the action was successful
Applies to actions without a return value On connection or action level
MongoDB: internals Replication sets
Nodes that are copies of each other Set-up of master and slave nodes If the master goes down, the slave automatically
takes over and promotes itself to master
Sharding Scale out Clusters of replica sets Connected to
a central proxy used by clients
config servers contain meta-data
Write to multiple nodes
MongoDB: internals
MongoDB: internals Sharding
Based on a shard key (= field) Commands are sent to the shard that includes the
relevant range of the data Data is evenly distributed across the shards Automatic reallocation of data when adding or
removing servers
MongoDB: internals BSON
Data storage and network transfer format Binary serialized JSON
System collections db.systems.collections db.systems.indexes
Geospatial indexing Find results closest to coordinate db.places.find({ loc: {$near: [50, 4], $maxDistance:
5} })
DEMOMongoDB in .NET
Links http://
www.mongodb.org/display/DOCS/CSharp+Language+Center Quick-start Documentation LINQ Serialization
http://mongly.com/ Free eBook Interactive tutorial