nosql and mongodbeldawy/20scs167/slides/cs167-10-nosql.pdfprior to mongodb 3.2, only b-tree was...
TRANSCRIPT
![Page 1: NoSQL and MongoDBeldawy/20SCS167/slides/CS167-10-NoSQL.pdfPrior to MongoDB 3.2, only B-tree was available in the storage layer To increase its scalability, MongoDB added LSM Tree in](https://reader034.vdocuments.net/reader034/viewer/2022042711/5f8522984de94233f42d8775/html5/thumbnails/1.jpg)
NoSQL and MongoDB
1
![Page 2: NoSQL and MongoDBeldawy/20SCS167/slides/CS167-10-NoSQL.pdfPrior to MongoDB 3.2, only B-tree was available in the storage layer To increase its scalability, MongoDB added LSM Tree in](https://reader034.vdocuments.net/reader034/viewer/2022042711/5f8522984de94233f42d8775/html5/thumbnails/2.jpg)
2
![Page 3: NoSQL and MongoDBeldawy/20SCS167/slides/CS167-10-NoSQL.pdfPrior to MongoDB 3.2, only B-tree was available in the storage layer To increase its scalability, MongoDB added LSM Tree in](https://reader034.vdocuments.net/reader034/viewer/2022042711/5f8522984de94233f42d8775/html5/thumbnails/3.jpg)
Introduction to NoSQL
3
Based on a presentation by Traversy Media
![Page 4: NoSQL and MongoDBeldawy/20SCS167/slides/CS167-10-NoSQL.pdfPrior to MongoDB 3.2, only B-tree was available in the storage layer To increase its scalability, MongoDB added LSM Tree in](https://reader034.vdocuments.net/reader034/viewer/2022042711/5f8522984de94233f42d8775/html5/thumbnails/4.jpg)
What is NoSQL?Not only SQLSQL means
Relational modelStrong typingACID complianceNormalization…
NoSQL means more freedom or flexibility
4
![Page 5: NoSQL and MongoDBeldawy/20SCS167/slides/CS167-10-NoSQL.pdfPrior to MongoDB 3.2, only B-tree was available in the storage layer To increase its scalability, MongoDB added LSM Tree in](https://reader034.vdocuments.net/reader034/viewer/2022042711/5f8522984de94233f42d8775/html5/thumbnails/5.jpg)
Relevance to Big DataData gets biggerTraditional RDBMS cannot scale wellRDBMS is tied to its data and query processing modelsNoSQL relaxes some of the restrictions of RDBMS to provide a better performance
5
![Page 6: NoSQL and MongoDBeldawy/20SCS167/slides/CS167-10-NoSQL.pdfPrior to MongoDB 3.2, only B-tree was available in the storage layer To increase its scalability, MongoDB added LSM Tree in](https://reader034.vdocuments.net/reader034/viewer/2022042711/5f8522984de94233f42d8775/html5/thumbnails/6.jpg)
Advantages of NoSQLHandles Big DataData Models – No predefined schemaData Structure – NoSQL handles semi-structured dataCheaper to manageScaling – Scale out / horizonal scaling
6
![Page 7: NoSQL and MongoDBeldawy/20SCS167/slides/CS167-10-NoSQL.pdfPrior to MongoDB 3.2, only B-tree was available in the storage layer To increase its scalability, MongoDB added LSM Tree in](https://reader034.vdocuments.net/reader034/viewer/2022042711/5f8522984de94233f42d8775/html5/thumbnails/7.jpg)
Advantages of RDBMSBetter for relational dataData normalizationWell-established query language (SQL)Data IntegrityACID Compliance
7
![Page 8: NoSQL and MongoDBeldawy/20SCS167/slides/CS167-10-NoSQL.pdfPrior to MongoDB 3.2, only B-tree was available in the storage layer To increase its scalability, MongoDB added LSM Tree in](https://reader034.vdocuments.net/reader034/viewer/2022042711/5f8522984de94233f42d8775/html5/thumbnails/8.jpg)
Types of NoSQL DatabasesDocument Databases [MongoDB, CouchDB]Column Databases [Apache Cassandra]Key-Value Stores [Redis, Couchbase Server]Cache Systems [Redis, Memcached]Graph Databases [Neo4J]Streaming Systems [FlinkDB, Storm]
8
![Page 9: NoSQL and MongoDBeldawy/20SCS167/slides/CS167-10-NoSQL.pdfPrior to MongoDB 3.2, only B-tree was available in the storage layer To increase its scalability, MongoDB added LSM Tree in](https://reader034.vdocuments.net/reader034/viewer/2022042711/5f8522984de94233f42d8775/html5/thumbnails/9.jpg)
Structured/Semi-structured
9
ID Name Email …
1 Jack [email protected]
2 Jill [email protected]
3 Alex [email protected]
Document 1
{ “id”: 1, “name”:”Jack”, “email”: “[email protected]”, “address”: {“street”: “900 university ave”, “city”: “Riverside”, state: “CA”}, “friend_ids”: [3, 55, 123]}
Document 2
{ “id”: 2, “name”: “Jill”, “email”: “[email protected]”, “hobbies”: [“hiking”, “cooking”]}
![Page 10: NoSQL and MongoDBeldawy/20SCS167/slides/CS167-10-NoSQL.pdfPrior to MongoDB 3.2, only B-tree was available in the storage layer To increase its scalability, MongoDB added LSM Tree in](https://reader034.vdocuments.net/reader034/viewer/2022042711/5f8522984de94233f42d8775/html5/thumbnails/10.jpg)
Columnar Data Store
10
ID
1
2
3
Name
Jack
Jill
Alex
…
…
…
ID Name Email …
1 Jack [email protected]
2 Jill [email protected]
3 Alex [email protected]
![Page 11: NoSQL and MongoDBeldawy/20SCS167/slides/CS167-10-NoSQL.pdfPrior to MongoDB 3.2, only B-tree was available in the storage layer To increase its scalability, MongoDB added LSM Tree in](https://reader034.vdocuments.net/reader034/viewer/2022042711/5f8522984de94233f42d8775/html5/thumbnails/11.jpg)
Key-value Stores
11
1 à Jack [email protected] …
2 à Jill [email protected] …
3 à Alex [email protected] …
ID Name Email …
1 Jack [email protected]
2 Jill [email protected]
3 Alex [email protected]
![Page 12: NoSQL and MongoDBeldawy/20SCS167/slides/CS167-10-NoSQL.pdfPrior to MongoDB 3.2, only B-tree was available in the storage layer To increase its scalability, MongoDB added LSM Tree in](https://reader034.vdocuments.net/reader034/viewer/2022042711/5f8522984de94233f42d8775/html5/thumbnails/12.jpg)
Document DatabaseMongoDB
12
![Page 13: NoSQL and MongoDBeldawy/20SCS167/slides/CS167-10-NoSQL.pdfPrior to MongoDB 3.2, only B-tree was available in the storage layer To increase its scalability, MongoDB added LSM Tree in](https://reader034.vdocuments.net/reader034/viewer/2022042711/5f8522984de94233f42d8775/html5/thumbnails/13.jpg)
Document Data ModelRelational model (RDBMS)
DatabaseRelation (Table) : Schema
Record (Tuple) : Data
Document ModelDatabase
Collection : No predefined schemaDocument : Schema+data
No need to define/update schemaNo need to create collections
13
Document 1
{ “id”: 1, “name”:”Jack”, “email”: “[email protected]”, “address”: {“street”: “900 university ave”, “city”: “Riverside”, state: “CA”}, “friend_ids”: [3, 55, 123]}
![Page 14: NoSQL and MongoDBeldawy/20SCS167/slides/CS167-10-NoSQL.pdfPrior to MongoDB 3.2, only B-tree was available in the storage layer To increase its scalability, MongoDB added LSM Tree in](https://reader034.vdocuments.net/reader034/viewer/2022042711/5f8522984de94233f42d8775/html5/thumbnails/14.jpg)
Document FormatMongoDB natively works with JSON documentsFor efficiency, documents are stored in a binary format called BSON (i.e., binary JSON)Like JSON, both schema and data are stored in each document
14
![Page 15: NoSQL and MongoDBeldawy/20SCS167/slides/CS167-10-NoSQL.pdfPrior to MongoDB 3.2, only B-tree was available in the storage layer To increase its scalability, MongoDB added LSM Tree in](https://reader034.vdocuments.net/reader034/viewer/2022042711/5f8522984de94233f42d8775/html5/thumbnails/15.jpg)
How to Use MongoDB
15
db.users.insert({name: “Jack”, email: “[email protected]”});
Install: Check the MongoDB websitehttps://docs.mongodb.com/manual/installation/
db.users.find();db.users.find({name: “Jack”});
db.users.update({name: "Jack"}, {$set: {hobby: "cooking"}});updateOne, updateMany, replaceOne
db.users.remove({name: "Alex"});deleteOne, deleteMany
Create collection and insert a document
Retrieve all/some documents
Update
Delete
https://docs.mongodb.com/manual/crud/
![Page 16: NoSQL and MongoDBeldawy/20SCS167/slides/CS167-10-NoSQL.pdfPrior to MongoDB 3.2, only B-tree was available in the storage layer To increase its scalability, MongoDB added LSM Tree in](https://reader034.vdocuments.net/reader034/viewer/2022042711/5f8522984de94233f42d8775/html5/thumbnails/16.jpg)
Schema ValidationYou can still explicitly create collections and enforce schema validation
16
db.createCollection("students", {validator: { $jsonSchema: {bsonType: "object",required: [ "name", "year", "major", "address" ],properties: {name: {bsonType: "string",description: "must be a string and is required" },
…}
}}}
https://docs.mongodb.com/manual/core/schema-validation/
![Page 17: NoSQL and MongoDBeldawy/20SCS167/slides/CS167-10-NoSQL.pdfPrior to MongoDB 3.2, only B-tree was available in the storage layer To increase its scalability, MongoDB added LSM Tree in](https://reader034.vdocuments.net/reader034/viewer/2022042711/5f8522984de94233f42d8775/html5/thumbnails/17.jpg)
Storage LayerPrior to MongoDB 3.2, only B-tree was available in the storage layerTo increase its scalability, MongoDB added LSM Tree in later versions after it acquired WiredTiger
17
mongod --wiredTigerIndexConfigString "type=lsm,block_compressor=zlib"
Override default configuration
![Page 18: NoSQL and MongoDBeldawy/20SCS167/slides/CS167-10-NoSQL.pdfPrior to MongoDB 3.2, only B-tree was available in the storage layer To increase its scalability, MongoDB added LSM Tree in](https://reader034.vdocuments.net/reader034/viewer/2022042711/5f8522984de94233f42d8775/html5/thumbnails/18.jpg)
LSM Vs B-tree
18https://github.com/wiredtiger/wiredtiger/wiki/Btree-vs-LSM
![Page 19: NoSQL and MongoDBeldawy/20SCS167/slides/CS167-10-NoSQL.pdfPrior to MongoDB 3.2, only B-tree was available in the storage layer To increase its scalability, MongoDB added LSM Tree in](https://reader034.vdocuments.net/reader034/viewer/2022042711/5f8522984de94233f42d8775/html5/thumbnails/19.jpg)
IndexingLike RDBMS, document databases use indexes to speed up some queries
MongoDB uses B-tree as an index structure
19https://docs.mongodb.com/manual/indexes/
![Page 20: NoSQL and MongoDBeldawy/20SCS167/slides/CS167-10-NoSQL.pdfPrior to MongoDB 3.2, only B-tree was available in the storage layer To increase its scalability, MongoDB added LSM Tree in](https://reader034.vdocuments.net/reader034/viewer/2022042711/5f8522984de94233f42d8775/html5/thumbnails/20.jpg)
Index TypesDefault unique _id indexSingle field index
db.collection.createIndex({name: -1});Compound index (multiple fields)
db.collection.createIndex( { name: 1, score: -1});Multikey indexes (for array fields)
Creates an index entry for each value
20https://docs.mongodb.com/manual/indexes/
![Page 21: NoSQL and MongoDBeldawy/20SCS167/slides/CS167-10-NoSQL.pdfPrior to MongoDB 3.2, only B-tree was available in the storage layer To increase its scalability, MongoDB added LSM Tree in](https://reader034.vdocuments.net/reader034/viewer/2022042711/5f8522984de94233f42d8775/html5/thumbnails/21.jpg)
Index TypesGeospatial index (for geospatial points)
Uses geohash to convert two dimensions to one dimension2d indexes: For Euclidean spaces2d sphere: spherical (earth) geometryWorks with multikey indexes for multiple locations (e.g., pickup and dropoff locations for taxis)
Text Indexes (for string fields)Automatically removes stop wordsStems the works to store the root only
Hashed Indexes (for point lookups)21
![Page 22: NoSQL and MongoDBeldawy/20SCS167/slides/CS167-10-NoSQL.pdfPrior to MongoDB 3.2, only B-tree was available in the storage layer To increase its scalability, MongoDB added LSM Tree in](https://reader034.vdocuments.net/reader034/viewer/2022042711/5f8522984de94233f42d8775/html5/thumbnails/22.jpg)
Geohashes
22
![Page 23: NoSQL and MongoDBeldawy/20SCS167/slides/CS167-10-NoSQL.pdfPrior to MongoDB 3.2, only B-tree was available in the storage layer To increase its scalability, MongoDB added LSM Tree in](https://reader034.vdocuments.net/reader034/viewer/2022042711/5f8522984de94233f42d8775/html5/thumbnails/23.jpg)
Additional Index FeaturesUnique indexes: Rejects duplicate keysSparse Indexes: Skips documents without the index field
In contrast, non-sparse indexes assume a null value if the index field does not exist
Partial indexes: Indexes only a subset of records based on a filter.
23
db.restaurants.createIndex({ cuisine: 1, name: 1 },{ partialFilterExpression: { rating: { $gt: 5 } } }
)
![Page 24: NoSQL and MongoDBeldawy/20SCS167/slides/CS167-10-NoSQL.pdfPrior to MongoDB 3.2, only B-tree was available in the storage layer To increase its scalability, MongoDB added LSM Tree in](https://reader034.vdocuments.net/reader034/viewer/2022042711/5f8522984de94233f42d8775/html5/thumbnails/24.jpg)
Comparison of data typesMin key (internal type)NullNumbers (32-bit integer, 64-bit integer, double)Symbol, StringObjectArrayBinary dataObject IDBooleanDate, timestampRegular expressionMax key (internal type)
24https://docs.mongodb.com/v3.6/reference/bson-type-comparison-order/
![Page 25: NoSQL and MongoDBeldawy/20SCS167/slides/CS167-10-NoSQL.pdfPrior to MongoDB 3.2, only B-tree was available in the storage layer To increase its scalability, MongoDB added LSM Tree in](https://reader034.vdocuments.net/reader034/viewer/2022042711/5f8522984de94233f42d8775/html5/thumbnails/25.jpg)
Comparison of data typesNumbers: All converted to a common typeStrings
Alphabetically (default)Collation (i.e., locale and language)
Arrays<: Smallest value of the array>: Largest value of the arrayEmpty arrays are treated as null
ObjectCompare fields in the order of appearanceCompare <name,value> for each field
25
![Page 26: NoSQL and MongoDBeldawy/20SCS167/slides/CS167-10-NoSQL.pdfPrior to MongoDB 3.2, only B-tree was available in the storage layer To increase its scalability, MongoDB added LSM Tree in](https://reader034.vdocuments.net/reader034/viewer/2022042711/5f8522984de94233f42d8775/html5/thumbnails/26.jpg)
Distributed ProcessingTwo methods for distributed processing
Replication (Similar to MySQL)Sharding (True horizontal scaling)
26
Replicationhttps://docs.mongodb.com/manual/replication/
Shardinghttps://docs.mongodb.com/manual/sharding/
![Page 27: NoSQL and MongoDBeldawy/20SCS167/slides/CS167-10-NoSQL.pdfPrior to MongoDB 3.2, only B-tree was available in the storage layer To increase its scalability, MongoDB added LSM Tree in](https://reader034.vdocuments.net/reader034/viewer/2022042711/5f8522984de94233f42d8775/html5/thumbnails/27.jpg)
Distributed Index StructureLog-structured Merge Tree (LSM)
27
![Page 28: NoSQL and MongoDBeldawy/20SCS167/slides/CS167-10-NoSQL.pdfPrior to MongoDB 3.2, only B-tree was available in the storage layer To increase its scalability, MongoDB added LSM Tree in](https://reader034.vdocuments.net/reader034/viewer/2022042711/5f8522984de94233f42d8775/html5/thumbnails/28.jpg)
Big Data IndexingHadoop and Spark are good in scanning large filesWe would like to speed up point and range queries on big data for some queriesHDFS limitation: Random updates are not allowedLog-structured Merge Tree (LSM-Tree) is adopted to address this problem.
28
![Page 29: NoSQL and MongoDBeldawy/20SCS167/slides/CS167-10-NoSQL.pdfPrior to MongoDB 3.2, only B-tree was available in the storage layer To increase its scalability, MongoDB added LSM Tree in](https://reader034.vdocuments.net/reader034/viewer/2022042711/5f8522984de94233f42d8775/html5/thumbnails/29.jpg)
RDBMS Indexing
29
New record
Index
Log
![Page 30: NoSQL and MongoDBeldawy/20SCS167/slides/CS167-10-NoSQL.pdfPrior to MongoDB 3.2, only B-tree was available in the storage layer To increase its scalability, MongoDB added LSM Tree in](https://reader034.vdocuments.net/reader034/viewer/2022042711/5f8522984de94233f42d8775/html5/thumbnails/30.jpg)
Index Update
30
New record
Randomly updated disk page(s)
Append a disk page
![Page 31: NoSQL and MongoDBeldawy/20SCS167/slides/CS167-10-NoSQL.pdfPrior to MongoDB 3.2, only B-tree was available in the storage layer To increase its scalability, MongoDB added LSM Tree in](https://reader034.vdocuments.net/reader034/viewer/2022042711/5f8522984de94233f42d8775/html5/thumbnails/31.jpg)
LSM TreeKey idea: Use the log as the indexRegularly: Merge the logs to consolidate the index (i.e., remove redundant entries)
31
New records LogLogLogLogLog
Flush Merge
Bigger log
O’Neil, Patrick, Edward Cheng, Dieter Gawlick, and Elizabeth O’Neil. "The log-structured merge-tree (LSM-tree)." Acta Informatica 33, no. 4 (1996): 351-385.
![Page 32: NoSQL and MongoDBeldawy/20SCS167/slides/CS167-10-NoSQL.pdfPrior to MongoDB 3.2, only B-tree was available in the storage layer To increase its scalability, MongoDB added LSM Tree in](https://reader034.vdocuments.net/reader034/viewer/2022042711/5f8522984de94233f42d8775/html5/thumbnails/32.jpg)
LSM in Big DataFirst major application: BigTable (Google)
32
0
20
40
60
80
100
120
1997199819992000200120022003200420052006200720082009201020112012201320142015201620172018
Citations
Citations
First report from Google mentioning LSM
BigTable paper
![Page 33: NoSQL and MongoDBeldawy/20SCS167/slides/CS167-10-NoSQL.pdfPrior to MongoDB 3.2, only B-tree was available in the storage layer To increase its scalability, MongoDB added LSM Tree in](https://reader034.vdocuments.net/reader034/viewer/2022042711/5f8522984de94233f42d8775/html5/thumbnails/33.jpg)
LSM in Big DataBuffer data in memory (memory component)Flush records to disk into an LSM as a disk component (sequential write)Disk components are sorted by keyCompact (merge) disk components in the background (sequential read/write)
33
![Page 34: NoSQL and MongoDBeldawy/20SCS167/slides/CS167-10-NoSQL.pdfPrior to MongoDB 3.2, only B-tree was available in the storage layer To increase its scalability, MongoDB added LSM Tree in](https://reader034.vdocuments.net/reader034/viewer/2022042711/5f8522984de94233f42d8775/html5/thumbnails/34.jpg)
ConclusionMongoDB is a document database that is geared towards high update rates and transactional queriesIt adopts JSON as a data modelIt provides the flexibility to insert any kind of data without schema definitionLSM Tree is used for indexingWeak types are handled using a special comparison method for all types
34