apéro rubybdx - mongodb - 8-11-2011
TRANSCRIPT
![Page 1: Apéro RubyBdx - MongoDB - 8-11-2011](https://reader034.vdocuments.net/reader034/viewer/2022052311/556a541ed8b42a7a138b4961/html5/thumbnails/1.jpg)
Pierre-Louis GottfroisBastien MurzeauApéro Ruby Bordeaux, 8 novembre 2011
![Page 2: Apéro RubyBdx - MongoDB - 8-11-2011](https://reader034.vdocuments.net/reader034/viewer/2022052311/556a541ed8b42a7a138b4961/html5/thumbnails/2.jpg)
• Brève introduction
• Cas pratique
• Map / Reduce
![Page 3: Apéro RubyBdx - MongoDB - 8-11-2011](https://reader034.vdocuments.net/reader034/viewer/2022052311/556a541ed8b42a7a138b4961/html5/thumbnails/3.jpg)
Qu’est ce que mongoDB ?
mongoDB est une base de donnée de type NoSQL,
sans schéma
document-oriented
![Page 4: Apéro RubyBdx - MongoDB - 8-11-2011](https://reader034.vdocuments.net/reader034/viewer/2022052311/556a541ed8b42a7a138b4961/html5/thumbnails/4.jpg)
sans-schéma
• Très utile en développements ‘agiles’ (itérations, rapidité de modifications, flexibilité pour les développeurs)
• Supporte des fonctionnalités qui seraient, en BDDs relationnelles :• quasi-impossible (stockage d’éléments non finis, ex. tags)
• trop complexes pour ce qu’elles sont (migrations)
![Page 5: Apéro RubyBdx - MongoDB - 8-11-2011](https://reader034.vdocuments.net/reader034/viewer/2022052311/556a541ed8b42a7a138b4961/html5/thumbnails/5.jpg)
document-oriented
• mongoDB stocke des documents, pas de rows
• les documents sont stockés sous forme de JSON; binary JSON
• la syntaxe de requêtage est aussi fournie que SQL
• le mécanisme de documents ‘embedded’ résout bon nombre de problèmes rencontrés
![Page 6: Apéro RubyBdx - MongoDB - 8-11-2011](https://reader034.vdocuments.net/reader034/viewer/2022052311/556a541ed8b42a7a138b4961/html5/thumbnails/6.jpg)
document-oriented
• Les documents sont stockés dans une collection, en RoR = model
• une partie des ces données sont indexées pour optimiser les performances
• un document n’est pas une poubelle !
![Page 7: Apéro RubyBdx - MongoDB - 8-11-2011](https://reader034.vdocuments.net/reader034/viewer/2022052311/556a541ed8b42a7a138b4961/html5/thumbnails/7.jpg)
stockage de données volumineuses
• mongoDB (et autres NoSQL) sont plus performantes pour la scalabilité horizontale
• ajout de serveurs pour augmenter la capacité de stockage («sharding»)
• garantissant ainsi une meilleur disponibilité
• load-balancing optimisé entre les nodes
• augmentation transparente pour l’application
![Page 8: Apéro RubyBdx - MongoDB - 8-11-2011](https://reader034.vdocuments.net/reader034/viewer/2022052311/556a541ed8b42a7a138b4961/html5/thumbnails/8.jpg)
Cas pratique• ORM devient ODM, la gem de référence mongoid
• ou : mongoMapper, DataMapper
• Création d’une application a base de NoSQL MongoDB
• rails new nosql
• edition du Gemfile
• gem ‘mongoid’
• gem ‘bson_ext’
• bundle install
• rails generate mongoid:config
![Page 9: Apéro RubyBdx - MongoDB - 8-11-2011](https://reader034.vdocuments.net/reader034/viewer/2022052311/556a541ed8b42a7a138b4961/html5/thumbnails/9.jpg)
Cas pratique• edition du config/application.rb
• #require 'rails/all'
• require "action_controller/railtie"
• require "action_mailer/railtie"
• require "active_resource/railtie"
• require "rails/test_unit/railtie"
![Page 10: Apéro RubyBdx - MongoDB - 8-11-2011](https://reader034.vdocuments.net/reader034/viewer/2022052311/556a541ed8b42a7a138b4961/html5/thumbnails/10.jpg)
Cas pratique
class Conversation include Mongoid::Document include Mongoid::Timestamps
field :public, :type => Boolean, :default => false
has_many :scores, :as => :scorable, :dependent => :delete has_and_belongs_to_many :subjects belongs_to :timeline embeds_many :messages
class Subject include Mongoid::Document include Mongoid::Timestamps
has_many :scores, :as => :scorable, :dependent => :delete, :autosave => true has_many :requests, :dependent => :delete belongs_to :author, :class_name => 'User'
![Page 11: Apéro RubyBdx - MongoDB - 8-11-2011](https://reader034.vdocuments.net/reader034/viewer/2022052311/556a541ed8b42a7a138b4961/html5/thumbnails/11.jpg)
Map Reduce
![Page 12: Apéro RubyBdx - MongoDB - 8-11-2011](https://reader034.vdocuments.net/reader034/viewer/2022052311/556a541ed8b42a7a138b4961/html5/thumbnails/12.jpg)
Example
{“id” : 1,“day” : 20111017,“checkout” : 100
}
{“id” : 2,“day” : 20111017,“checkout” : 42
}
{“id” : 3,“day” : 20111017,“checkout” : 215
}
{“id” : 4,“day” : 20111017,“checkout” : 73
}
A “ticket” collection
![Page 13: Apéro RubyBdx - MongoDB - 8-11-2011](https://reader034.vdocuments.net/reader034/viewer/2022052311/556a541ed8b42a7a138b4961/html5/thumbnails/13.jpg)
Problematic
• We want to
• Calculate the ‘checkout’ sum of each object in our ticket’s collection
• Be able to distribute this operation over the network
• Be fast!
• We don’t want to
• Go over all objects again when an update is made
![Page 14: Apéro RubyBdx - MongoDB - 8-11-2011](https://reader034.vdocuments.net/reader034/viewer/2022052311/556a541ed8b42a7a138b4961/html5/thumbnails/14.jpg)
Map : emit(checkout)
{“id” : 1,“day” : 20111017,“checkout” : 100
}
{“id” : 2,“day” : 20111017,“checkout” : 42
}
{“id” : 3,“day” : 20111017,“checkout” : 215
}
{“id” : 4,“day” : 20111017,“checkout” : 73
}
100 42 215 73
The ‘map’ function emit (select) every checkout value of each object in our collection
![Page 15: Apéro RubyBdx - MongoDB - 8-11-2011](https://reader034.vdocuments.net/reader034/viewer/2022052311/556a541ed8b42a7a138b4961/html5/thumbnails/15.jpg)
Reduce : sum(checkout)
{“id” : 1,“day” : 20111017,“checkout” : 100
}
{“id” : 2,“day” : 20111017,“checkout” : 42
}
{“id” : 3,“day” : 20111017,“checkout” : 215
}
{“id” : 4,“day” : 20111017,“checkout” : 73
}
100 42 215 73
142 288
430
![Page 16: Apéro RubyBdx - MongoDB - 8-11-2011](https://reader034.vdocuments.net/reader034/viewer/2022052311/556a541ed8b42a7a138b4961/html5/thumbnails/16.jpg)
Reduce function
The ‘reduce’ function apply the algorithmic logic for each key/value received from ‘map’ function
This function has to be ‘idempotent’ to be called recursively or in a distributed system
reduce(k, A, B) == reduce(k, B, A)reduce(k, A, B) == reduce(k, reduce(A, B))
![Page 17: Apéro RubyBdx - MongoDB - 8-11-2011](https://reader034.vdocuments.net/reader034/viewer/2022052311/556a541ed8b42a7a138b4961/html5/thumbnails/17.jpg)
Inherently Distributed
{“id” : 1,“day” : 20111017,“checkout” : 100
}
{“id” : 2,“day” : 20111017,“checkout” : 42
}
{“id” : 3,“day” : 20111017,“checkout” : 215
}
{“id” : 4,“day” : 20111017,“checkout” : 73
}
100 42 215 73
142 288
430
![Page 18: Apéro RubyBdx - MongoDB - 8-11-2011](https://reader034.vdocuments.net/reader034/viewer/2022052311/556a541ed8b42a7a138b4961/html5/thumbnails/18.jpg)
Distributed
Since ‘map’ function emits objects to be reduced and ‘reduce’ function processes for each emitted
objects independently, it can be distributed through multiple workers.
map reduce
![Page 19: Apéro RubyBdx - MongoDB - 8-11-2011](https://reader034.vdocuments.net/reader034/viewer/2022052311/556a541ed8b42a7a138b4961/html5/thumbnails/19.jpg)
Logaritmic Update
For the same reason, when updating an object, we don’t have to reprocess for each obejcts.
We can call ‘map’ function only on updated objects.
![Page 20: Apéro RubyBdx - MongoDB - 8-11-2011](https://reader034.vdocuments.net/reader034/viewer/2022052311/556a541ed8b42a7a138b4961/html5/thumbnails/20.jpg)
Logaritmic Update
{“id” : 1,“day” : 20111017,“checkout” : 100
}
{“id” : 2,“day” : 20111017,“checkout” : 42
}
{“id” : 3,“day” : 20111017,“checkout” : 210
}
{“id” : 4,“day” : 20111017,“checkout” : 73
}
100 42 215 73
142 288
430
![Page 21: Apéro RubyBdx - MongoDB - 8-11-2011](https://reader034.vdocuments.net/reader034/viewer/2022052311/556a541ed8b42a7a138b4961/html5/thumbnails/21.jpg)
Logaritmic Update
{“id” : 1,“day” : 20111017,“checkout” : 100
}
{“id” : 2,“day” : 20111017,“checkout” : 42
}
{“id” : 3,“day” : 20111017,“checkout” : 210
}
{“id” : 4,“day” : 20111017,“checkout” : 73
}
100 42 210 73
142 288
430
![Page 22: Apéro RubyBdx - MongoDB - 8-11-2011](https://reader034.vdocuments.net/reader034/viewer/2022052311/556a541ed8b42a7a138b4961/html5/thumbnails/22.jpg)
Logaritmic Update
{“id” : 1,“day” : 20111017,“checkout” : 100
}
{“id” : 2,“day” : 20111017,“checkout” : 42
}
{“id” : 3,“day” : 20111017,“checkout” : 210
}
{“id” : 4,“day” : 20111017,“checkout” : 73
}
100 42 210 73
142 283
430
![Page 23: Apéro RubyBdx - MongoDB - 8-11-2011](https://reader034.vdocuments.net/reader034/viewer/2022052311/556a541ed8b42a7a138b4961/html5/thumbnails/23.jpg)
Logarithmic Update
{“id” : 1,“day” : 20111017,“checkout” : 100
}
{“id” : 2,“day” : 20111017,“checkout” : 42
}
{“id” : 3,“day” : 20111017,“checkout” : 210
}
{“id” : 4,“day” : 20111017,“checkout” : 73
}
100 42 210 73
142 283
425
![Page 24: Apéro RubyBdx - MongoDB - 8-11-2011](https://reader034.vdocuments.net/reader034/viewer/2022052311/556a541ed8b42a7a138b4961/html5/thumbnails/24.jpg)
Let’s do some code!
![Page 25: Apéro RubyBdx - MongoDB - 8-11-2011](https://reader034.vdocuments.net/reader034/viewer/2022052311/556a541ed8b42a7a138b4961/html5/thumbnails/25.jpg)
$> mongo
> db.tickets.save({ "_id": 1, "day": 20111017, "checkout": 100 })> db.tickets.save({ "_id": 2, "day": 20111017, "checkout": 42 })> db.tickets.save({ "_id": 3, "day": 20111017, "checkout": 215 })> db.tickets.save({ "_id": 4, "day": 20111017, "checkout": 73 })
> db.tickets.count()4
> db.tickets.find(){ "_id" : 1, "day" : 20111017, "checkout" : 100 }...
> db.tickets.find({ "_id": 1 }){ "_id" : 1, "day" : 20111017, "checkout" : 100 }
![Page 26: Apéro RubyBdx - MongoDB - 8-11-2011](https://reader034.vdocuments.net/reader034/viewer/2022052311/556a541ed8b42a7a138b4961/html5/thumbnails/26.jpg)
> var map = function() {... emit(null, this.checkout)}
> var reduce = function(key, values) {... var sum = 0... for (var index in values) sum += values[index]... return sum}
![Page 27: Apéro RubyBdx - MongoDB - 8-11-2011](https://reader034.vdocuments.net/reader034/viewer/2022052311/556a541ed8b42a7a138b4961/html5/thumbnails/27.jpg)
Temporary Collection> sumOfCheckouts = db.tickets.mapReduce(map, reduce){ "result" : "tmp.mr.mapreduce_123456789_4", "timeMills" : 8, "counts" : { "input" : 4, "emit" : 4, "output" : 1 }, "ok" : 1}
> db.getCollectionNames()[ "tickets", "tmp.mr.mapreduce_123456789_4"]
> db[sumOfCheckouts.result].find(){ "_id" : null, "value" : 430 }
![Page 28: Apéro RubyBdx - MongoDB - 8-11-2011](https://reader034.vdocuments.net/reader034/viewer/2022052311/556a541ed8b42a7a138b4961/html5/thumbnails/28.jpg)
Persistent Collection> db.tickets.mapReduce(map, reduce, { "out" : "sumOfCheckouts" })
> db.getCollectionNames()[ "sumOfCheckouts", "tickets", "tmp.mr.mapreduce_123456789_4"]
> db.sumOfCheckouts.find(){ "_id" : null, "value" : 430 }
> db.sumOfCheckouts.findOne().value430
![Page 29: Apéro RubyBdx - MongoDB - 8-11-2011](https://reader034.vdocuments.net/reader034/viewer/2022052311/556a541ed8b42a7a138b4961/html5/thumbnails/29.jpg)
Reduce by Date
![Page 30: Apéro RubyBdx - MongoDB - 8-11-2011](https://reader034.vdocuments.net/reader034/viewer/2022052311/556a541ed8b42a7a138b4961/html5/thumbnails/30.jpg)
> var map = function() {... emit(this.date, this.checkout)}
> var reduce = function(key, values) {... var sum = 0... for (var index in values) sum += values[index]... return sum}
![Page 31: Apéro RubyBdx - MongoDB - 8-11-2011](https://reader034.vdocuments.net/reader034/viewer/2022052311/556a541ed8b42a7a138b4961/html5/thumbnails/31.jpg)
> db.tickets.mapReduce(map, reduce, { "out" : "sumOfCheckouts" })
> db.sumOfCheckouts.find(){ "_id" : 20111017, "value" : 430 }
![Page 32: Apéro RubyBdx - MongoDB - 8-11-2011](https://reader034.vdocuments.net/reader034/viewer/2022052311/556a541ed8b42a7a138b4961/html5/thumbnails/32.jpg)
What we can do
![Page 33: Apéro RubyBdx - MongoDB - 8-11-2011](https://reader034.vdocuments.net/reader034/viewer/2022052311/556a541ed8b42a7a138b4961/html5/thumbnails/33.jpg)
Scored Subjects per User
Subject User Score
1 1 2
1 1 2
1 2 2
2 1 2
2 2 10
2 2 5
![Page 34: Apéro RubyBdx - MongoDB - 8-11-2011](https://reader034.vdocuments.net/reader034/viewer/2022052311/556a541ed8b42a7a138b4961/html5/thumbnails/34.jpg)
Scored Subjects per User (reduced)
Subject User Score
1 1 4
1 2 2
2 1 2
2 2 15
![Page 35: Apéro RubyBdx - MongoDB - 8-11-2011](https://reader034.vdocuments.net/reader034/viewer/2022052311/556a541ed8b42a7a138b4961/html5/thumbnails/35.jpg)
$> mongo
> db.scores.save({ "_id": 1, "subject_id": 1, "user_id": 1, "score": 2 })> db.scores.save({ "_id": 2, "subject_id": 1, "user_id": 1, "score": 2 })> db.scores.save({ "_id": 3, "subject_id": 1, "user_id": 2, "score": 2 })> db.scores.save({ "_id": 4, "subject_id": 2, "user_id": 1, "score": 2 })> db.scores.save({ "_id": 5, "subject_id": 2, "user_id": 2, "score": 10 })> db.scores.save({ "_id": 6, "subject_id": 2, "user_id": 2, "score": 5 })
> db.scores.count()6
> db.scores.find(){ "_id": 1, "subject_id": 1, "user_id": 1, "score": 2 }...
> db.scores.find({ "_id": 1 }){ "_id": 1, "subject_id": 1, "user_id": 1, "score": 2 }
![Page 36: Apéro RubyBdx - MongoDB - 8-11-2011](https://reader034.vdocuments.net/reader034/viewer/2022052311/556a541ed8b42a7a138b4961/html5/thumbnails/36.jpg)
> var map = function() {... emit([this.user_id, this.subject_id].join("-"), {subject_id:this.subject_id,... user_id:this.user_id, score:this.score});}
> var reduce = function(key, values) {... var result = {user_id:"", subject_id:"", score:0};... values.forEach(function (value) {result.score += value.score;result.user_id = ... value.user_id;result.subject_id = value.subject_id;});... return result}
![Page 37: Apéro RubyBdx - MongoDB - 8-11-2011](https://reader034.vdocuments.net/reader034/viewer/2022052311/556a541ed8b42a7a138b4961/html5/thumbnails/37.jpg)
ReducedScores Collection
> db.scores.mapReduce(map, reduce, { "out" : "reduced_scores" })
> db.getCollectionNames()[ "reduced_scores", "scores"]
> db.reduced_scores.find(){ "_id" : "1-1", "value" : { "user_id" : 1, "subject_id" : 1, "score" : 4 } }{ "_id" : "1-2", "value" : { "user_id" : 1, "subject_id" : 2, "score" : 2 } }{ "_id" : "2-1", "value" : { "user_id" : 2, "subject_id" : 1, "score" : 2 } }{ "_id" : "2-2", "value" : { "user_id" : 2, "subject_id" : 2, "score" : 15 } }
> db.reduced_scores.findOne().score4
![Page 38: Apéro RubyBdx - MongoDB - 8-11-2011](https://reader034.vdocuments.net/reader034/viewer/2022052311/556a541ed8b42a7a138b4961/html5/thumbnails/38.jpg)
Dealing with Rails Query
ruby-1.9.2-p180 :007 > ReducedScores.first => #<ReducedScores _id: 1-1, _type: nil, value: {"user_id"=>BSON::ObjectId('...'), "subject_id"=>BSON::ObjectId('...'), "score"=>4.0}>
ruby-1.9.2-p180 :008 > ReducedScores.where("value.user_id" => u1.id).count => 2
ruby-1.9.2-p180 :009 > ReducedScores.where("value.user_id" => u1.id).first.value['score'] => 4.0
ruby-1.9.2-p180 :010 > ReducedScores.where("value.user_id" => u1.id).last.value['score'] => 2.0
![Page 39: Apéro RubyBdx - MongoDB - 8-11-2011](https://reader034.vdocuments.net/reader034/viewer/2022052311/556a541ed8b42a7a138b4961/html5/thumbnails/39.jpg)
Questions ?