torodb internals: how to create a nosql database on top of sql
TRANSCRIPT
ToroDB internalsHow to create a NoSQL db
on top of SQL
@NoSQLonSQL
Álvaro Hernández <[email protected]>
ToroDB @NoSQLonSQL
About *8Kdata*
● Research & Development in databases
● Consulting, Training and Support in PostgreSQL
● Founders of PostgreSQL España, 5th largest PUG in the world (>500 members as of today)
● About myself: CTO at 8Kdata:@ahachetehttp://linkd.in/1jhvzQ3
www.8kdata.com
ToroDB @NoSQLonSQL
ToroDB in one slide
● Document-oriented, JSON, NoSQL db
● Open source (AGPL)
● MongoDB compatibility (wire protocol level)
● Uses PostgreSQL as a storage backend
ToroDB @NoSQLonSQL
How to make a NoSQL database
● Varying schema (aka “schema-less”)
● Document-oriented API
● Replication
● Horizontal scalability
ToroDB @NoSQLonSQL
ToroDB's Document 2 Relational transformation
● Data is stored in tables. No blobs
● JSON documents are split by hierarchy levels into “subdocuments”, which contain no nested structures. Each subdocument level is stored separately
● Subdocuments are classified by “type”. Each “type” maps to a different table
ToroDB @NoSQLonSQL
● A “structure” table keeps the subdocument “schema”
● Keys in JSON are mapped to attributes, which retain the original name
● Tables are created dinamically and transparently to match the exact types of the documents
ToroDB's Document 2 Relational transformation
ToroDB @NoSQLonSQL
ToroDB storage internals
{ "name": "ToroDB", "data": { "a": 42, "b": "hello world!" }, "nested": { "j": 42, "deeper": { "a": 21, "b": "hello" } }}
ToroDB @NoSQLonSQL
ToroDB storage internals
The document is split into the following subdocuments:
{ "name": "ToroDB", "data": {}, "nested": {} }
{ "a": 42, "b": "hello world!"}
{ "j": 42, "deeper": {}}
{ "a": 21, "b": "hello"}
ToroDB @NoSQLonSQL
ToroDB storage internals
select * from demo.t_3┌─────┬───────┬────────────────────────────┬────────┐│ did │ index │ _id │ name │├─────┼───────┼────────────────────────────┼────────┤│ 0 │ ¤ │ \x5451a07de7032d23a908576d │ ToroDB │└─────┴───────┴────────────────────────────┴────────┘select * from demo.t_1┌─────┬───────┬────┬──────────────┐│ did │ index │ a │ b │├─────┼───────┼────┼──────────────┤│ 0 │ ¤ │ 42 │ hello world! ││ 0 │ 1 │ 21 │ hello │└─────┴───────┴────┴──────────────┘select * from demo.t_2┌─────┬───────┬────┐│ did │ index │ j │├─────┼───────┼────┤│ 0 │ ¤ │ 42 │└─────┴───────┴────┘
ToroDB @NoSQLonSQL
ToroDB storage internals
select * from demo.structures┌─────┬────────────────────────────────────────────────────────────────────────────┐│ sid │ _structure │├─────┼────────────────────────────────────────────────────────────────────────────┤│ 0 │ {"t": 3, "data": {"t": 1}, "nested": {"t": 2, "deeper": {"i": 1, "t": 1}}} │└─────┴────────────────────────────────────────────────────────────────────────────┘
select * from demo.root;┌─────┬─────┐│ did │ sid │├─────┼─────┤│ 0 │ 0 │└─────┴─────┘
ToroDB @NoSQLonSQL
Why not jsonb?
● jsonb is really cool
● But keys (metadata) are repeated most of the time (storage, I/O savings)
● Docuements are not classified (normalized)
● ToroDB does all that
ToroDB @NoSQLonSQL
Metadata repetition
{ “a”: 1, “b”: 2 }{ “a”: 3 }{ “a”: 4, “c”: 5 }{ “a”: 6, “b”: 7 }{ “b”: 8 }{ “a”: 9, “b”: 10 }{ “a”: 11, “b”: 12, “j”: 13 }{ “a”: 14, “c”: 15 }
Counting “document types” in collections of millions: at most, 1000s of different types
ToroDB @NoSQLonSQL
ToroDB: query “by structure”
● ToroDB is effectively partitioning by type
● Structures (schemas, partitioning types) are cached in ToroDB memory
● Queries only scan a subset of the data
● Negative queries are served directly from memory
ToroDB @NoSQLonSQL
MongoDB WP and API
● ToroDB speaks MongoDB protocol natively, and implements Mongo API
● All MongoDB tools, drivers, GUIs, whould work as is on ToroDB
● Offer a widely used API on top of your current infrastructure
ToroDB @NoSQLonSQL
● NoSQL is trying to get back to SQL
● ToroDB is SQL native!
● Insert with Mongo, query with SQL!
● Powerful PostgreSQL SQL: window functions, recursive queries, hypothetical aggregates, lateral joins, CTEs, etc
ToroDB: native SQL
ToroDB @NoSQLonSQL
db.toroviews.insert({id:5, person: {
name: "Alvaro", surname: "Hernandez",contact: { email: "[email protected]", verified: true }}
})
db.toroviews.insert({id:6, person: {
name: "Name", surname: "Surname", age: 31,contact: { email: "[email protected]" }}
})
ToroDB VIEWs
ToroDB @NoSQLonSQL
torodb$ select * from toroviews.person ;┌─────┬───────────┬────────┬─────┐│ did │ surname │ name │ age │├─────┼───────────┼────────┼─────┤│ 0 │ Hernandez │ Alvaro │ ¤ ││ 1 │ Surname │ Name │ 31 │└─────┴───────────┴────────┴─────┘(2 rows)
torodb$ select * from toroviews."person.contact";┌─────┬──────────┬────────────────────────┐│ did │ verified │ email │├─────┼──────────┼────────────────────────┤│ 0 │ t │ [email protected] ││ 1 │ ¤ │ [email protected] │└─────┴──────────┴────────────────────────┘(2 rows)
ToroDB VIEWs
ToroDB @NoSQLonSQL
Replication protocol choice
● ToroDB is based on PostgreSQL
● PostgreSQL has either binary streaming replication (async or sync) or logical replication
● MongoDB has logical replication
● ToroDB uses MongoDB's protocol
ToroDB @NoSQLonSQL
MongoDB's replication protocol
● Every change is recorded in JSON documents, idempotent format (collection: local.oplog.rs)
● Slaves pull these documents from master (or other slaves) asynchronously
● Changes are applied and feedback is sent upstream
ToroDB @NoSQLonSQL
MongoDB replication implementation
https://github.com/stripe/mosql
ToroDB @NoSQLonSQL
ToroDB v0.4● ToroDB works as a secondary slave of a MongoDB master (or slave, chained rep)
● Implements the full replication protocol (not as an oplog tailable query)
● Replicates from Mongo to a PostgreSQL
● Open source github.com/torodb/torodb (repl branch, version 0.4-SNAPSHOT)
ToroDB @NoSQLonSQL
MongoDB's oplog & tailable cursors
Oplog is a system, capped collection which contains idempotent DML commands
Tailable cursors are “open queries” where data is streamed as it appears on the database
Both are used a lot (replication, Meteor...)
ToroDB @NoSQLonSQL
How to do oplog & tailable with PG?
Thanks to 9.4's logical decoding, it's easy
Replication (master) is just plain logical decoding, transformed to MongoDB's json format
Tailable cursors would be logical decoding plus filtering (to make sure streamed data matches query filters)
ToroDB @NoSQLonSQL
Write scalability(sharding)
● MongoDB's sharding API not implemented yet (roadmap: ToroDB 0.8)
● Will use MongoDB's mongos without modification, as well as config servers
● That might change in the future (pg_shard?)
ToroDB @NoSQLonSQL
Horizontal scalability(storage level)
● Another non-exclusive option is to have ToroDB store data in a distributed database
● Requires a distributed database like GreenPlum, CitusDb or RedShift
● Paired with replication as a slave:DW in NoSQL enabler
ToroDB @NoSQLonSQL
● GreenPlum was open sourced Oct/15
● ToroDB already runs on GP!CitusDB 5.0 port coming soon!
● Goal: ToroDB v0.4 replicates from a MongoDB master onto Toro-Greenplum and runs the reporting queries using distributed SQL
ToroDB on GP: SQL DW for MongoDB
ToroDB @NoSQLonSQL
● Amazon reviews datasetImage-based recommendations on styles and substitutesJ. McAuley, C. Targett, J. Shi, A. van den HengelSIGIR, 2015
● AWS c4.xlarge (4vCPU, 8GB RAM) 4KIOPS SSD EBS
● 4x shards, 3x config; 4x segments GP
● 83M records, 65GB plain json
Benchmark
ToroDB @NoSQLonSQL
Disk usage
Mongo 3.0, WT, SnappyGP columnar, zlib level 9
table size index size total size0
10000000000
20000000000
30000000000
40000000000
50000000000
60000000000
70000000000
80000000000
Storage requirements
MongoDB vs ToroDB on Greenplum
Mongo
ToroDB on GP
byt
es
ToroDB @NoSQLonSQL
SELECT count( distinct( "reviewerID" ))FROM reviews;
Queries: which one is easier?
db.reviews.aggregate([{ $group: { _id: "reviewerID"}},{ $group: {_id: 1, count: { $sum: 1}}}])
ToroDB @NoSQLonSQL
SELECT "reviewerName", count(*) as reviews FROM reviews GROUP BY "reviewerName" ORDER BY reviews DESC LIMIT 10;
Queries: which one is easier?
db.reviews.aggregate([ { $group : { _id : '$reviewerName', r : { $sum : 1 } } }, { $sort : { r : -1 } }, { $limit : 10 } ], {allowDiskUse: true})
ToroDB @NoSQLonSQL
Query times
3 different queriesQ3 on MongoDB: aggregate fails
27,95 74,87 00
200
400
600
800
1000
1200
9691007
035 13 31
Query duration (s)
MongoDB vs ToroDB on Greenplum
MongoDB
ToroDB on GP
speedup
seco
nd
s
ToroDB @NoSQLonSQL
Atomic operations
● There is no support for atomic bulk insert/update/delete operations
● Not even with $isolated:“Prevents a write operation that affects multiple documents from yielding to other reads or writes […] You can ensure that no client sees the changes until the operation completes or errors out. The $isolated isolation operator does not provide “all-or-nothing” atomicity for write operations.”http://docs.mongodb.org/manual/reference/operator/update/isolated/
ToroDB @NoSQLonSQL
Cheap single-node durability
● Without journaling, MongoDB is not durable nor crash-safe
● MongoDB requires “j: true” for true single-node durability. But who guarantees its consistent usage? Who uses it by default?j:true creates I/O storms equivalent to SQL CHECKPOINTs
ToroDB @NoSQLonSQL
“Clean” reads
http://docs.mongodb.org/manual/reference/write-concern/#read-isolation-behavior
“MongoDB will allow clients to read the results of a write operation before the write operation returns.”
“If the mongod terminates before the journal commits, even if a write returns successfully, queries may have read data that will not exist after the mongod restarts.”
Thus, MongoDB suffers from dirty reads. But let's call them just “tainted reads”.
ToroDB @NoSQLonSQL
“Clean” reads
What about $snapshot? Nope:
“The snapshot() does not guarantee that the data returned by the query will reflect a single moment in time nor does it provide isolation from insert or delete operations.”
http://docs.mongodb.org/manual/faq/developers/#faq-developers-isolate-cursors
Cursors in ToroDB run in repeatable read, read-only mode:globalCursorDataSource.setTransactionIsolation("TRANSACTION_REPEATABLE_READ"); globalCursorDataSource.setReadOnly(true);
ToroDB @NoSQLonSQL
https://aphyr.com/posts/322-call-me-maybe-mongodb-stale-reads
Replication dirty and stale reads(“call me maybe”)
ToroDB @NoSQLonSQL
MongoDB's dirty and stale reads (repl)
Dirty readsA primary in minority accepts a write that other clients see, but it later steps down, write is rolled back (fixed in 3.2?)
Stale readsA primary in minority serves a value that ought to be current, but a newer value was written to the other primary in minority
ToroDB @NoSQLonSQL
Data discoverability, SQL connectors
● They are two of the major announcements for MongoDB 3.2
● To discover data, MongoDB samples data. ToroDB: just look at table structures! (and join with root if you want a count)
● SQL connectors: native, no emulation