torodb internals: how to create a nosql database on top of sql

ToroDB internalsHow to create a NoSQL db

on top of SQL

@NoSQLonSQL

Álvaro Hernández <[email protected]>

ToroDB @NoSQLonSQL

About *8Kdata*

● Research & Development in databases

● Consulting, Training and Support in PostgreSQL

● Founders of PostgreSQL España, 5th largest PUG in the world (>500 members as of today)

● About myself: CTO at 8Kdata:@ahachetehttp://linkd.in/1jhvzQ3

www.8kdata.com

ToroDB @NoSQLonSQL

ToroDB in one slide

● Document-oriented, JSON, NoSQL db

● Open source (AGPL)

● MongoDB compatibility (wire protocol level)

● Uses PostgreSQL as a storage backend

ToroDB @NoSQLonSQL

How to make a NoSQL database

● Varying schema (aka “schema-less”)

● Document-oriented API

● Replication

● Horizontal scalability

ToroDB @NoSQLonSQL

Varying schema(aka “schema-less”)

ToroDB @NoSQLonSQL

ToroDB's Document 2 Relational transformation

● Data is stored in tables. No blobs

● JSON documents are split by hierarchy levels into “subdocuments”, which contain no nested structures. Each subdocument level is stored separately

● Subdocuments are classified by “type”. Each “type” maps to a different table

ToroDB @NoSQLonSQL

● A “structure” table keeps the subdocument “schema”

● Keys in JSON are mapped to attributes, which retain the original name

● Tables are created dinamically and transparently to match the exact types of the documents

ToroDB's Document 2 Relational transformation

ToroDB @NoSQLonSQL

ToroDB storage internals

{ "name": "ToroDB", "data": { "a": 42, "b": "hello world!" }, "nested": { "j": 42, "deeper": { "a": 21, "b": "hello" } }}

ToroDB @NoSQLonSQL


The document is split into the following subdocuments:

{ "name": "ToroDB", "data": {}, "nested": {} }

{ "a": 42, "b": "hello world!"}

{ "j": 42, "deeper": {}}

{ "a": 21, "b": "hello"}

ToroDB @NoSQLonSQL


select * from demo.t_3┌─────┬───────┬────────────────────────────┬────────┐│ did │ index │ _id │ name │├─────┼───────┼────────────────────────────┼────────┤│ 0 │ ¤ │ \x5451a07de7032d23a908576d │ ToroDB │└─────┴───────┴────────────────────────────┴────────┘select * from demo.t_1┌─────┬───────┬────┬──────────────┐│ did │ index │ a │ b │├─────┼───────┼────┼──────────────┤│ 0 │ ¤ │ 42 │ hello world! ││ 0 │ 1 │ 21 │ hello │└─────┴───────┴────┴──────────────┘select * from demo.t_2┌─────┬───────┬────┐│ did │ index │ j │├─────┼───────┼────┤│ 0 │ ¤ │ 42 │└─────┴───────┴────┘

ToroDB @NoSQLonSQL


select * from demo.structures┌─────┬────────────────────────────────────────────────────────────────────────────┐│ sid │ _structure │├─────┼────────────────────────────────────────────────────────────────────────────┤│ 0 │ {"t": 3, "data": {"t": 1}, "nested": {"t": 2, "deeper": {"i": 1, "t": 1}}} │└─────┴────────────────────────────────────────────────────────────────────────────┘

select * from demo.root;┌─────┬─────┐│ did │ sid │├─────┼─────┤│ 0 │ 0 │└─────┴─────┘

ToroDB @NoSQLonSQL

Why not jsonb?

● jsonb is really cool

● But keys (metadata) are repeated most of the time (storage, I/O savings)

● Docuements are not classified (normalized)

● ToroDB does all that

ToroDB @NoSQLonSQL

Metadata repetition

{ “a”: 1, “b”: 2 }{ “a”: 3 }{ “a”: 4, “c”: 5 }{ “a”: 6, “b”: 7 }{ “b”: 8 }{ “a”: 9, “b”: 10 }{ “a”: 11, “b”: 12, “j”: 13 }{ “a”: 14, “c”: 15 }

Counting “document types” in collections of millions: at most, 1000s of different types

ToroDB @NoSQLonSQL

ToroDB storage and I/O savings

29% - 68% storage required,compared to Mongo 2.6

ToroDB @NoSQLonSQL

ToroDB: query “by structure”

● ToroDB is effectively partitioning by type

● Structures (schemas, partitioning types) are cached in ToroDB memory

● Queries only scan a subset of the data

● Negative queries are served directly from memory

ToroDB @NoSQLonSQL

How data is stored in schema-less

Data normalization

ToroDB @NoSQLonSQL

This is how we store in ToroDB

ToroDB @NoSQLonSQL

Document oriented-API

ToroDB @NoSQLonSQL

MongoDB WP and API

● ToroDB speaks MongoDB protocol natively, and implements Mongo API

● All MongoDB tools, drivers, GUIs, whould work as is on ToroDB

● Offer a widely used API on top of your current infrastructure

ToroDB @NoSQLonSQL

● NoSQL is trying to get back to SQL

● ToroDB is SQL native!

● Insert with Mongo, query with SQL!

● Powerful PostgreSQL SQL: window functions, recursive queries, hypothetical aggregates, lateral joins, CTEs, etc

ToroDB: native SQL

ToroDB @NoSQLonSQL

ToroDB: native SQL

ToroDB @NoSQLonSQL

db.toroviews.insert({id:5, person: {

name: "Alvaro", surname: "Hernandez",contact: { email: "[email protected]", verified: true }}

})

db.toroviews.insert({id:6, person: {

name: "Name", surname: "Surname", age: 31,contact: { email: "[email protected]" }}

})

ToroDB VIEWs

ToroDB @NoSQLonSQL

torodb$ select * from toroviews.person ;┌─────┬───────────┬────────┬─────┐│ did │ surname │ name │ age │├─────┼───────────┼────────┼─────┤│ 0 │ Hernandez │ Alvaro │ ¤ ││ 1 │ Surname │ Name │ 31 │└─────┴───────────┴────────┴─────┘(2 rows)

torodb$ select * from toroviews."person.contact";┌─────┬──────────┬────────────────────────┐│ did │ verified │ email │├─────┼──────────┼────────────────────────┤│ 0 │ t │ [email protected] ││ 1 │ ¤ │ [email protected] │└─────┴──────────┴────────────────────────┘(2 rows)

ToroDB VIEWs

ToroDB @NoSQLonSQL

VIEWs, ToroDB from any SQL tool

ToroDB @NoSQLonSQL

Replication

ToroDB @NoSQLonSQL

Replication protocol choice

● ToroDB is based on PostgreSQL

● PostgreSQL has either binary streaming replication (async or sync) or logical replication

● MongoDB has logical replication

● ToroDB uses MongoDB's protocol

ToroDB @NoSQLonSQL

MongoDB's replication protocol

● Every change is recorded in JSON documents, idempotent format (collection: local.oplog.rs)

● Slaves pull these documents from master (or other slaves) asynchronously

● Changes are applied and feedback is sent upstream

ToroDB @NoSQLonSQL

MongoDB replication implementation

https://github.com/stripe/mosql

https://github.com/stripe/mosql

ToroDB @NoSQLonSQL

ToroDB v0.4● ToroDB works as a secondary slave of a MongoDB master (or slave, chained rep)

● Implements the full replication protocol (not as an oplog tailable query)

● Replicates from Mongo to a PostgreSQL

● Open source github.com/torodb/torodb (repl branch, version 0.4-SNAPSHOT)

ToroDB @NoSQLonSQL

MongoDB's oplog & tailable cursors

Oplog is a system, capped collection which contains idempotent DML commands

Tailable cursors are “open queries” where data is streamed as it appears on the database

Both are used a lot (replication, Meteor...)

ToroDB @NoSQLonSQL

How to do oplog & tailable with PG?

Thanks to 9.4's logical decoding, it's easy

Replication (master) is just plain logical decoding, transformed to MongoDB's json format

Tailable cursors would be logical decoding plus filtering (to make sure streamed data matches query filters)

ToroDB @NoSQLonSQL

Horizontal scalability(aka sharding)

ToroDB @NoSQLonSQL

Write scalability(sharding)

● MongoDB's sharding API not implemented yet (roadmap: ToroDB 0.8)

● Will use MongoDB's mongos without modification, as well as config servers

● That might change in the future (pg_shard?)

ToroDB @NoSQLonSQL

Horizontal scalability(storage level)

● Another non-exclusive option is to have ToroDB store data in a distributed database

● Requires a distributed database like GreenPlum, CitusDb or RedShift

● Paired with replication as a slave:DW in NoSQL enabler

ToroDB @NoSQLonSQL

● GreenPlum was open sourced Oct/15

● ToroDB already runs on GP!CitusDB 5.0 port coming soon!

● Goal: ToroDB v0.4 replicates from a MongoDB master onto Toro-Greenplum and runs the reporting queries using distributed SQL

ToroDB on GP: SQL DW for MongoDB

ToroDB @NoSQLonSQL

● Amazon reviews datasetImage-based recommendations on styles and substitutesJ. McAuley, C. Targett, J. Shi, A. van den HengelSIGIR, 2015

● AWS c4.xlarge (4vCPU, 8GB RAM) 4KIOPS SSD EBS

● 4x shards, 3x config; 4x segments GP

● 83M records, 65GB plain json

Benchmark

ToroDB @NoSQLonSQL

Disk usage

Mongo 3.0, WT, SnappyGP columnar, zlib level 9

table size index size total size0

10000000000

20000000000

30000000000

40000000000

50000000000

60000000000

70000000000

80000000000

Storage requirements

MongoDB vs ToroDB on Greenplum

Mongo

ToroDB on GP

byt

es

ToroDB @NoSQLonSQL

SELECT count( distinct( "reviewerID" ))FROM reviews;

Queries: which one is easier?

db.reviews.aggregate([{ $group: { _id: "reviewerID"}},{ $group: {_id: 1, count: { $sum: 1}}}])

ToroDB @NoSQLonSQL

SELECT "reviewerName", count(*) as reviews FROM reviews GROUP BY "reviewerName" ORDER BY reviews DESC LIMIT 10;

Queries: which one is easier?

db.reviews.aggregate([ { $group : { _id : '$reviewerName', r : { $sum : 1 } } }, { $sort : { r : -1 } }, { $limit : 10 } ], {allowDiskUse: true})

ToroDB @NoSQLonSQL

Query times

3 different queriesQ3 on MongoDB: aggregate fails

27,95 74,87 00

200

400

600

800

1000

1200

9691007

035 13 31

Query duration (s)

MongoDB vs ToroDB on Greenplum

MongoDB

ToroDB on GP

speedup

seco

nd

s

ToroDB @NoSQLonSQL

Doing what MongoDBcan't offer

ToroDB @NoSQLonSQL

Atomic operations

● There is no support for atomic bulk insert/update/delete operations

● Not even with $isolated:“Prevents a write operation that affects multiple documents from yielding to other reads or writes […] You can ensure that no client sees the changes until the operation completes or errors out. The $isolated isolation operator does not provide “all-or-nothing” atomicity for write operations.”http://docs.mongodb.org/manual/reference/operator/update/isolated/

http://docs.mongodb.org/manual/reference/operator/update/isolated/

ToroDB @NoSQLonSQL

Cheap single-node durability

● Without journaling, MongoDB is not durable nor crash-safe

● MongoDB requires “j: true” for true single-node durability. But who guarantees its consistent usage? Who uses it by default?j:true creates I/O storms equivalent to SQL CHECKPOINTs

ToroDB @NoSQLonSQL

“Clean” reads

Oh really?

ToroDB @NoSQLonSQL

“Clean” reads

http://docs.mongodb.org/manual/reference/write-concern/#read-isolation-behavior

“MongoDB will allow clients to read the results of a write operation before the write operation returns.”

“If the mongod terminates before the journal commits, even if a write returns successfully, queries may have read data that will not exist after the mongod restarts.”

Thus, MongoDB suffers from dirty reads. But let's call them just “tainted reads”.

http://docs.mongodb.org/manual/reference/write-concern/#read-isolation-behavior

ToroDB @NoSQLonSQL

“Clean” reads

What about $snapshot? Nope:

“The snapshot() does not guarantee that the data returned by the query will reflect a single moment in time nor does it provide isolation from insert or delete operations.”

http://docs.mongodb.org/manual/faq/developers/#faq-developers-isolate-cursors

Cursors in ToroDB run in repeatable read, read-only mode:globalCursorDataSource.setTransactionIsolation("TRANSACTION_REPEATABLE_READ"); globalCursorDataSource.setReadOnly(true);

http://docs.mongodb.org/manual/faq/developers/#faq-developers-isolate-cursors

ToroDB @NoSQLonSQL

https://aphyr.com/posts/322-call-me-maybe-mongodb-stale-reads

Replication dirty and stale reads(“call me maybe”)

https://aphyr.com/posts/322-call-me-maybe-mongodb-stale-reads

ToroDB @NoSQLonSQL

MongoDB's dirty and stale reads (repl)

Dirty readsA primary in minority accepts a write that other clients see, but it later steps down, write is rolled back (fixed in 3.2?)

Stale readsA primary in minority serves a value that ought to be current, but a newer value was written to the other primary in minority

ToroDB @NoSQLonSQL

Data discoverability, SQL connectors

● They are two of the major announcements for MongoDB 3.2

● To discover data, MongoDB samples data. ToroDB: just look at table structures! (and join with root if you want a count)

● SQL connectors: native, no emulation

ToroDB @NoSQLonSQL

Download, clone, PR, star it!

https://github.com/torodb/torodb

torodb internals: how to create a nosql database on top of sql

Software