cs 405g: introduction to database systems 24 nosql reuse some slides of jennifer widom chen qian...
TRANSCRIPT
![Page 1: CS 405G: Introduction to Database Systems 24 NoSQL Reuse some slides of Jennifer Widom Chen Qian University of Kentucky](https://reader036.vdocuments.net/reader036/viewer/2022062516/56649d975503460f94a8095c/html5/thumbnails/1.jpg)
CS 405G: Introduction to Database Systems
24 NoSQLReuse some slides of Jennifer Widom
Chen Qian University of Kentucky
![Page 2: CS 405G: Introduction to Database Systems 24 NoSQL Reuse some slides of Jennifer Widom Chen Qian University of Kentucky](https://reader036.vdocuments.net/reader036/viewer/2022062516/56649d975503460f94a8095c/html5/thumbnails/2.jpg)
Summary
Tree-based indexes: O(logN) for search and update, support range queries
Hash-based indexes: best for equality searches O(1), cannot support range searches.
Static and dynamic
04/19/23 Chen Qian @ University of Kentucky 2
![Page 3: CS 405G: Introduction to Database Systems 24 NoSQL Reuse some slides of Jennifer Widom Chen Qian University of Kentucky](https://reader036.vdocuments.net/reader036/viewer/2022062516/56649d975503460f94a8095c/html5/thumbnails/3.jpg)
NoSQL: The Name
“SQL” = Traditional relational DBMS Recognition over past decade or so: Not every data management/analysis problem is best solved using a traditional relational DBMS “NoSQL” = “No SQL” = Not using traditional relational DBMS
“No SQL” Don’t use SQL language
NoSQL Systems: Motivation
![Page 4: CS 405G: Introduction to Database Systems 24 NoSQL Reuse some slides of Jennifer Widom Chen Qian University of Kentucky](https://reader036.vdocuments.net/reader036/viewer/2022062516/56649d975503460f94a8095c/html5/thumbnails/4.jpg)
NoSQL: The Name
“SQL” = Traditional relational DBMS Recognition over past decade or so: Not every data management/analysis problem is best solved using a traditional relational DBMS “NoSQL” = “No SQL” = Not using traditional relational DBMS
“No SQL” Don’t use SQL language
“NoSQL” = “Not Only SQL”
NoSQL Systems: Motivation
![Page 5: CS 405G: Introduction to Database Systems 24 NoSQL Reuse some slides of Jennifer Widom Chen Qian University of Kentucky](https://reader036.vdocuments.net/reader036/viewer/2022062516/56649d975503460f94a8095c/html5/thumbnails/5.jpg)
Not every data management/analysisproblem is best solved using a traditional DBMS
Database Management System (DBMS) provides….
… efficient, reliable, convenient, and safe multi-user storage of and access to massive amounts of persistent data.
Database Management System (DBMS) provides….
… efficient, reliable, convenient, and safe multi-user storage of and access to massive amounts of persistent data.
NoSQL Systems: Motivation
![Page 6: CS 405G: Introduction to Database Systems 24 NoSQL Reuse some slides of Jennifer Widom Chen Qian University of Kentucky](https://reader036.vdocuments.net/reader036/viewer/2022062516/56649d975503460f94a8095c/html5/thumbnails/6.jpg)
NoSQL Systems
Alternative to traditional relational DBMS+ Flexible schema+ Quicker/cheaper to set up+ Massive scalability+ Relaxed consistency higher performance & availability
– No declarative query language more programming– Relaxed consistency fewer guarantees
NoSQL Systems: Motivation
![Page 7: CS 405G: Introduction to Database Systems 24 NoSQL Reuse some slides of Jennifer Widom Chen Qian University of Kentucky](https://reader036.vdocuments.net/reader036/viewer/2022062516/56649d975503460f94a8095c/html5/thumbnails/7.jpg)
Example #1: Web log analysis
Each record: UserID, URL, timestamp, additional-info
Task: Load into database systemData cleaningData extractionVerificationSchemaNothing above is needed for noSQL!
NoSQL Systems: Motivation
![Page 8: CS 405G: Introduction to Database Systems 24 NoSQL Reuse some slides of Jennifer Widom Chen Qian University of Kentucky](https://reader036.vdocuments.net/reader036/viewer/2022062516/56649d975503460f94a8095c/html5/thumbnails/8.jpg)
Example #1: Web log analysis
Each record: UserID, URL, timestamp, additional-info
Task: Find all records for… Given UserID Given URL Given timestamp Certain construct appearing in additional-info
NoSQL Systems: Motivation
![Page 9: CS 405G: Introduction to Database Systems 24 NoSQL Reuse some slides of Jennifer Widom Chen Qian University of Kentucky](https://reader036.vdocuments.net/reader036/viewer/2022062516/56649d975503460f94a8095c/html5/thumbnails/9.jpg)
Example #1: Web log analysis
Each record: UserID, URL, timestamp, additional-infoSeparate records: UserID, name, age, gender, …
Task: Find average age of user accessing given URL
May not require strict consistency.
NoSQL Systems: Motivation
![Page 10: CS 405G: Introduction to Database Systems 24 NoSQL Reuse some slides of Jennifer Widom Chen Qian University of Kentucky](https://reader036.vdocuments.net/reader036/viewer/2022062516/56649d975503460f94a8095c/html5/thumbnails/10.jpg)
Example #2: Social-network graph
Each record: UserID1, UserID2
Separate records: UserID, name, age, gender, …
Task: Find all friends of friends of friends of … friends of given user
Large number of joins?Not efficient at all!Specially designed graph database may be better
NoSQL Systems: Motivation
![Page 11: CS 405G: Introduction to Database Systems 24 NoSQL Reuse some slides of Jennifer Widom Chen Qian University of Kentucky](https://reader036.vdocuments.net/reader036/viewer/2022062516/56649d975503460f94a8095c/html5/thumbnails/11.jpg)
Example #3: Wikipedia pages
Large collection of documentsCombination of structured and unstructured data
Task: Retrieve introductory paragraph of all pages about U.S. presidents before 1900
Mix of structured and unstructured data
NoSQL Systems: Motivation
![Page 12: CS 405G: Introduction to Database Systems 24 NoSQL Reuse some slides of Jennifer Widom Chen Qian University of Kentucky](https://reader036.vdocuments.net/reader036/viewer/2022062516/56649d975503460f94a8095c/html5/thumbnails/12.jpg)
NoSQL Systems
Alternative to traditional relational DBMS+ Flexible schema+ Quicker/cheaper to set up+ Massive scalability+ Relaxed consistency higher performance & availability
– No declarative query language more programming– Relaxed consistency fewer guarantees
NoSQL Systems: Motivation
![Page 13: CS 405G: Introduction to Database Systems 24 NoSQL Reuse some slides of Jennifer Widom Chen Qian University of Kentucky](https://reader036.vdocuments.net/reader036/viewer/2022062516/56649d975503460f94a8095c/html5/thumbnails/13.jpg)
NoSQL Systems
Overview
![Page 14: CS 405G: Introduction to Database Systems 24 NoSQL Reuse some slides of Jennifer Widom Chen Qian University of Kentucky](https://reader036.vdocuments.net/reader036/viewer/2022062516/56649d975503460f94a8095c/html5/thumbnails/14.jpg)
NoSQL Systems
Several incarnations MapReduce framework: OLAP Key-value stores: OLTP Document stores Graph database systems
NoSQL Systems: Overview
![Page 15: CS 405G: Introduction to Database Systems 24 NoSQL Reuse some slides of Jennifer Widom Chen Qian University of Kentucky](https://reader036.vdocuments.net/reader036/viewer/2022062516/56649d975503460f94a8095c/html5/thumbnails/15.jpg)
MapReduce Framework
Originally from Google, open source Hadoop No data model, data stored in files User provides specific functions
map() reduce()
System provides data processing “glue”, fault-tolerance, scalability
NoSQL Systems: Overview
![Page 16: CS 405G: Introduction to Database Systems 24 NoSQL Reuse some slides of Jennifer Widom Chen Qian University of Kentucky](https://reader036.vdocuments.net/reader036/viewer/2022062516/56649d975503460f94a8095c/html5/thumbnails/16.jpg)
Map and Reduce Functions
Map: Divide problem into subproblems
Reduce: Do work on subproblems, combine results
NoSQL Systems: Overview
![Page 17: CS 405G: Introduction to Database Systems 24 NoSQL Reuse some slides of Jennifer Widom Chen Qian University of Kentucky](https://reader036.vdocuments.net/reader036/viewer/2022062516/56649d975503460f94a8095c/html5/thumbnails/17.jpg)
MapReduce Architecture
NoSQL Systems: Overview
![Page 18: CS 405G: Introduction to Database Systems 24 NoSQL Reuse some slides of Jennifer Widom Chen Qian University of Kentucky](https://reader036.vdocuments.net/reader036/viewer/2022062516/56649d975503460f94a8095c/html5/thumbnails/18.jpg)
MapReduce Example: Web log analysis
Each record: UserID, URL, timestamp, additional-infoTask: Count number of accesses for each domain (inside
URL)
NoSQL Systems: Overview
![Page 19: CS 405G: Introduction to Database Systems 24 NoSQL Reuse some slides of Jennifer Widom Chen Qian University of Kentucky](https://reader036.vdocuments.net/reader036/viewer/2022062516/56649d975503460f94a8095c/html5/thumbnails/19.jpg)
MapReduce Example (modified #1)
Each record: UserID, URL, timestamp, additional-infoTask: Total “value” of accesses for each domain based on
additional-info
NoSQL Systems: Overview
![Page 20: CS 405G: Introduction to Database Systems 24 NoSQL Reuse some slides of Jennifer Widom Chen Qian University of Kentucky](https://reader036.vdocuments.net/reader036/viewer/2022062516/56649d975503460f94a8095c/html5/thumbnails/20.jpg)
MapReduce Framework
No data model, data stored in files User provides specific functions System provides data processing “glue”, fault-tolerance, scalability
NoSQL Systems: Overview
![Page 21: CS 405G: Introduction to Database Systems 24 NoSQL Reuse some slides of Jennifer Widom Chen Qian University of Kentucky](https://reader036.vdocuments.net/reader036/viewer/2022062516/56649d975503460f94a8095c/html5/thumbnails/21.jpg)
MapReduce Framework
Schemas and declarative queries are missedHive – schemas, SQL-like query language
Pig – more imperative but with relational operators Both compile to “workflow” of Hadoop (MapReduce) jobs
NoSQL Systems: Overview
![Page 22: CS 405G: Introduction to Database Systems 24 NoSQL Reuse some slides of Jennifer Widom Chen Qian University of Kentucky](https://reader036.vdocuments.net/reader036/viewer/2022062516/56649d975503460f94a8095c/html5/thumbnails/22.jpg)
Key-Value Stores
Extremely simple interface Data model: (key, value) pairs Operations: Insert(key,value), Fetch(key), Update(key), Delete(key)
Implementation: efficiency, scalability, fault-tolerance Records distributed to nodes based on key Replication Single-record transactions, “eventual consistency”
NoSQL Systems: Overview
![Page 23: CS 405G: Introduction to Database Systems 24 NoSQL Reuse some slides of Jennifer Widom Chen Qian University of Kentucky](https://reader036.vdocuments.net/reader036/viewer/2022062516/56649d975503460f94a8095c/html5/thumbnails/23.jpg)
Key-Value Stores
Extremely simple interface Data model: (key, value) pairs Operations: Insert(key,value), Fetch(key), Update(key), Delete(key) Some allow (non-uniform) columns within value Some allow Fetch on range of keys
Example systems Google BigTable, Amazon Dynamo, Cassandra, Voldemort, HBase, …
NoSQL Systems: Overview
![Page 24: CS 405G: Introduction to Database Systems 24 NoSQL Reuse some slides of Jennifer Widom Chen Qian University of Kentucky](https://reader036.vdocuments.net/reader036/viewer/2022062516/56649d975503460f94a8095c/html5/thumbnails/24.jpg)
Document Stores
Like Key-Value Stores except value is document Data model: (key, document) pairs Document: JSON, XML, other semistructured formats Basic operations: Insert(key,document), Fetch(key), Update(key), Delete(key) Also Fetch based on document contents
Example systems CouchDB, MongoDB, SimpleDB, …
NoSQL Systems: Overview
![Page 25: CS 405G: Introduction to Database Systems 24 NoSQL Reuse some slides of Jennifer Widom Chen Qian University of Kentucky](https://reader036.vdocuments.net/reader036/viewer/2022062516/56649d975503460f94a8095c/html5/thumbnails/25.jpg)
Graph Database Systems Data model: nodes and edges Nodes may have properties (including ID) Edges may have labels or roles
NoSQL Systems: Overview
![Page 26: CS 405G: Introduction to Database Systems 24 NoSQL Reuse some slides of Jennifer Widom Chen Qian University of Kentucky](https://reader036.vdocuments.net/reader036/viewer/2022062516/56649d975503460f94a8095c/html5/thumbnails/26.jpg)
Graph Database Systems Interfaces and query languages vary Single-step versus “path expressions” versus full recursion Example systems
Neo4j, FlockDB, Pregel, … RDF “triple stores” can map to graph databases
NoSQL Systems: Overview
![Page 27: CS 405G: Introduction to Database Systems 24 NoSQL Reuse some slides of Jennifer Widom Chen Qian University of Kentucky](https://reader036.vdocuments.net/reader036/viewer/2022062516/56649d975503460f94a8095c/html5/thumbnails/27.jpg)
NoSQL Systems
“NoSQL” = “Not Only SQL” Not every data management/analysis problem is best solved exclusively using a traditional DBMS
Current incarnations– MapReduce framework– Key-value stores– Document stores– Graph database systems
NoSQL Systems: Overview