cs-695nosqldatabase mongodb(part2of2) dr. chuckcartledge ...ccartled/teaching/2015... ·...
TRANSCRIPT
1/17
Miscellanea Assignment #4 DB comparisons Extensions Summary Midterm Conclusion References
CS-695 NoSQL DatabaseMongoDB (part 2 of 2)
Dr. Chuck CartledgeDr. Chuck CartledgeDr. Chuck CartledgeDr. Chuck CartledgeDr. Chuck CartledgeDr. Chuck CartledgeDr. Chuck CartledgeDr. Chuck CartledgeDr. Chuck CartledgeDr. Chuck CartledgeDr. Chuck CartledgeDr. Chuck CartledgeDr. Chuck CartledgeDr. Chuck CartledgeDr. Chuck CartledgeDr. Chuck CartledgeDr. Chuck CartledgeDr. Chuck CartledgeDr. Chuck CartledgeDr. Chuck CartledgeDr. Chuck Cartledge
15 Oct. 201515 Oct. 201515 Oct. 201515 Oct. 201515 Oct. 201515 Oct. 201515 Oct. 201515 Oct. 201515 Oct. 201515 Oct. 201515 Oct. 201515 Oct. 201515 Oct. 201515 Oct. 201515 Oct. 201515 Oct. 201515 Oct. 201515 Oct. 201515 Oct. 201515 Oct. 201515 Oct. 2015
2/17
Miscellanea Assignment #4 DB comparisons Extensions Summary Midterm Conclusion References
Table of contents I
1 Miscellanea
2 Assignment #4
3 DB comparisons
4 Extensions
5 Summary
6 Midterm
7 Conclusion
8 References
3/17
Miscellanea Assignment #4 DB comparisons Extensions Summary Midterm Conclusion References
Corrections and additions since last lecture.
Assignment #04 is available
Corrected typos in lecture#007
4/17
Miscellanea Assignment #4 DB comparisons Extensions Summary Midterm Conclusion References
Words of explanation.
The full text is available at:http://www.cs.odu.edu/ ~ccartled/
Teaching/2015-Fall/NoSQL/
Assignments/04/
In general terms:
1 Parse data
2 Create document database
3 Update documents based on number ofmovies filmed in each city
4 Query database
5 Create list of 10 most used locations
6 Solve TSP with 10 cities
7 Plot route on a map
8 List locations and movies
9 What about when a popular locationdoesn’t have a city??
5/17
Miscellanea Assignment #4 DB comparisons Extensions Summary Midterm Conclusion References
How different DBs compare to a RDBMS
We have some terms to compare now[2]
RDBMS K/V Columnar Doc.
DB. instance cluster cluster instancedatabase — namespace —table bucket table collectionrow key-value row documentrowid key — idcol. — col. fam. —schema — — databasejoin — — DBRef
6/17
Miscellanea Assignment #4 DB comparisons Extensions Summary Midterm Conclusion References
Aggregations
“Aggregationpipeline gives you a wayto transform andcombine documents inyour collection. You doit by passing thedocuments through apipeline thatssomewhat analogous tothe Unix pipe whereyou send output fromone command toanother to a third, etc.”
Karl Seguin [3]
General format of the aggregatefunction:db.collection.aggregate
([< stage >, ...])
7/17
Miscellanea Assignment #4 DB comparisons Extensions Summary Midterm Conclusion References
Aggregations
A partial list of aggregation operators:1
$match — filters thestream$limit — passes first n docs$skip — skips first n docs$group — groups docsbased on id and appliesaccumulator(s)$geonear — returns docsclose to a location$and, $or, $not — typicalBoolean operators for
arrays of comparisonoperators
$eq, $gt, $gte, $lt, $lte —usual math comparisons
$add, $subtract, $multiply,$divide, $mod — mathoperators
$cmp, $ne — comparesand returns -1, 0, 1
Note: see how all operators start with a $.
1http://docs.mongodb.org/manual/reference/operator/aggregation/
8/17
Miscellanea Assignment #4 DB comparisons Extensions Summary Midterm Conclusion References
Aggregations
Examples[1]:
db.employees.aggregate(
...{
... "$project" : {
... "totalPay" : {
... "$subtract" : [{"$add" : ["$salary", "$bonus"]},"$401k"]
... }}})db.employees.aggregate(
... {
... "$project" : {
... "email" : {
... "$concat" : [
... "$substr" : ["$firstName", 0, 1],
... ".",
... "$lastName",
... "@example.com"
... ]}}})
9/17
Miscellanea Assignment #4 DB comparisons Extensions Summary Midterm Conclusion References
Indices
MongoDB has indices.
They can be created:db.unicorns.ensureIndex({name: 1});Note: the 1 means ascending, -1 is descending.They can be dropped:db.unicorns.dropIndex({name: 1});They can be unique:db.unicorns.ensureIndex({name: 1},{unique: true});They can be on embedded fields, arrays, and compounded.db.unicorns.ensureIndex({name: 1,vampires: -1});
10/17
Miscellanea Assignment #4 DB comparisons Extensions Summary Midterm Conclusion References
Indices
Special indices[5]:
Capped collections — collectionsact like circular queues
TTL indices for caches —documents older than TTL arepurged (automatically)
Full-text indices — all strings aresplit, stemmed, and stored(heavyweight operations)
Geospatial (2d and 2dsphere) —supports: inclusion, intersection,and proximity
GridFS for large files — virtualfile system, supports sharding,large files (2G block size)
Image from [4].
11/17
Miscellanea Assignment #4 DB comparisons Extensions Summary Midterm Conclusion References
Indices
Classic (a.k.a, Hadoop) MapReduce
The classic Map Reduce ecosystem operations:
1 Split the input file intochunks
2 Present each split to itsown map function
3 Map function emits 0 ormore <key, value> sets
4 Ecosystem collects are all<key, value> sets
5 Ecosystem merges <key,values> sets
6 Ecosystem presents <key,values> set to reducers
7 Ecosystem collects output
Image from [6].
12/17
Miscellanea Assignment #4 DB comparisons Extensions Summary Midterm Conclusion References
Indices
MongoDB MapReduce:
13/17
Miscellanea Assignment #4 DB comparisons Extensions Summary Midterm Conclusion References
Strengths and weaknesses
Good and not so good
Strengths:
Handling huge amounts of data byreplication and horizontal scaling
Very flexible data model
Ease of use (object oriented bent)
Weaknesses:
Discourages normalization
Items can be inserted anywhere (lack ofschema)
May require large infrastructure
14/17
Miscellanea Assignment #4 DB comparisons Extensions Summary Midterm Conclusion References
Applicabilities
Good for, and not so good for
Good fit;
Event logging, especially where the data changesrapidly
Anything that can communicate via JSONdocuments (CRM, publishing websites, usercomments, profiles, web-facing documents, . . . )
Web-analytics detailing transactions and changes
Intermediate database when transitioning fromone db to another
Not so good fit:
Anything requiring complex transactions spanningmultiple documents
Anything that required normalized data
15/17
Miscellanea Assignment #4 DB comparisons Extensions Summary Midterm Conclusion References
General notes and ideas
It will be about things wecovered in class. Including:
Information and ideas
Variouslectures/presentations
Different databasetechnologies
Open book, open notes (notopen neighbor)
It will be about Sarah and Hasta.Sarah will have an idea for Hasta.Your task will be to discuss whichDB technology to use and why.
16/17
Miscellanea Assignment #4 DB comparisons Extensions Summary Midterm Conclusion References
What have we covered?
Reviewed assignment #04Covered some of themongDB “querying”capabilitiesRemember Assignment #04due before next class
Next time: CouchDB
17/17
Miscellanea Assignment #4 DB comparisons Extensions Summary Midterm Conclusion References
References I
[1] Kristina Chodorow, Mongodb, the definitive guide, O’Reilly Media, Inc.,2013.
[2] Eric Redmond and Jim R Wilson, Seven databases in seven weeks,Pragmatic Bookshelf, 2012.
[3] Karl Seguin, The little mongodb book,http://github.com/karlseguin/the-little-mongodb-book, 2015.
[4] Alessandro Siletto, A quick start with mongodb geospatial queries,http://www.siletto.it/blog/alessandro/2013/03/19/quick-start-mongodb-
2013.
[5] MongoDB Staff, Mongodb documentation, MongoDB DocumentationProject, 2015.
[6] Tom White, Hadoop: The lay of the land,http://www.drdobbs.com/database/hadoop-the-lay-of-the-land/240150854
2013.