introducing tokumx: the performance engine for mongodb (nyc.rb 2013-12-10)

16
® Introducing TokuMX: The Performance Engine for MongoDB Leif Walsh Senior Engineer, Tokutek [email protected] @leifwalsh

Upload: leifwalsh

Post on 03-Jul-2015

1.179 views

Category:

Technology


3 download

DESCRIPTION

Talk given to NYC.rb meetup on 2013-12-10 about TokuMX, a replacement engine for MongoDB.

TRANSCRIPT

Page 1: Introducing TokuMX: The Performance Engine for MongoDB (NYC.rb 2013-12-10)

®

Introducing TokuMX: The Performance Engine for

MongoDB

Leif Walsh Senior Engineer, Tokutek

[email protected] @leifwalsh

Page 2: Introducing TokuMX: The Performance Engine for MongoDB (NYC.rb 2013-12-10)

®

What is TokuMX?

!• TokuMX = MongoDB with improved storage

!• Drop in replacement for MongoDB v2.4 applications • Including replication and sharding • Same data model • Same query language • Drivers just work • No Full Text or Geospatial !

• Open Source – http://github.com/Tokutek/mongo

Page 3: Introducing TokuMX: The Performance Engine for MongoDB (NYC.rb 2013-12-10)

®

B-tree Limitations

RAM

RAM

DISK

22

10 99

2, 3, 4 10,20 22,25 99

Performance is IO limited when bigger than RAM: try to fit all internal nodes and some leaf nodes

Plus, mmap.

Page 4: Introducing TokuMX: The Performance Engine for MongoDB (NYC.rb 2013-12-10)

®

TokuMX : Indexed Insertion

�4

Page 5: Introducing TokuMX: The Performance Engine for MongoDB (NYC.rb 2013-12-10)

®

TokuMX : Indexed Insertion

�5

Page 6: Introducing TokuMX: The Performance Engine for MongoDB (NYC.rb 2013-12-10)

®�6

TokuMX : Concurrency (>RAM)

Page 7: Introducing TokuMX: The Performance Engine for MongoDB (NYC.rb 2013-12-10)

®�7

TokuMX : Concurrency (<RAM)

Page 8: Introducing TokuMX: The Performance Engine for MongoDB (NYC.rb 2013-12-10)

®

TokuMX : Raw Compression

�8

bittorrent data, size on disk, ~31 million inserts (lower is better)

TokuMX achieved 11.6:1 compression

Page 9: Introducing TokuMX: The Performance Engine for MongoDB (NYC.rb 2013-12-10)

®

TokuMX : Compression : Field Names

�9

synthetic data, size on disk, 100 million inserts (lower is better)

TokuMX is substantially smaller, even without

compression

Page 10: Introducing TokuMX: The Performance Engine for MongoDB (NYC.rb 2013-12-10)

®

TokuMX : Compression : Field Names

�10

synthetic data, size on disk, 100 million inserts (lower is better)

In TokuMX, field name length has almost no impact on size due to

compression

MongoDB was ~10% smaller

Page 11: Introducing TokuMX: The Performance Engine for MongoDB (NYC.rb 2013-12-10)

®

TokuMX : ACID + MVCC

• ACID – In MongoDB, multi-insertion operations allow for partial

success o Asked to store 5 documents, 3 succeeded

– In TokuMX, offer “all or nothing” behavior (atomic) • MVCC – In MongoDB, queries can be interrupted by writers. o The effect of these writers are visible to the reader

– We offer MVCC o Reads are consistent as of the operation start

�11

Page 12: Introducing TokuMX: The Performance Engine for MongoDB (NYC.rb 2013-12-10)

®

Leif Walsh Senior Engineer, Tokutek

[email protected] @leifwalsh

Questions?

Page 13: Introducing TokuMX: The Performance Engine for MongoDB (NYC.rb 2013-12-10)

®�13

!• indexed insertion workload (iibench)

• http://github.com/tmcallaghan/iibench-mongodb !{ dateandtime: <date-time>,! cashregisterid: 1..1000,! customerid: 1..100000,! productid: 1..10000,! price: <double> }!

!• insert only, 1000 documents per insert, 100 million inserts • indexes

• price + customerid • cashregister + price + customerid • price + dateandtime + customerid

!

TokuMX : Indexed Insertion

Page 14: Introducing TokuMX: The Performance Engine for MongoDB (NYC.rb 2013-12-10)

®

!• Sysbench read-write workload • point and range queries, update, delete, insert

• http://github.com/tmcallaghan/sysbench-mongodb !{ _id: 1..10000000,! k: 1..10000000,! c: <120 char random string ###-###-###>,! pad: <60 char random string ###-###-###>}

�14

TokuMX : Concurrency

Page 15: Introducing TokuMX: The Performance Engine for MongoDB (NYC.rb 2013-12-10)

®

• BitTorrent Peer Snapshot Data (~31 million documents) • 3 Indexes : peer_id + created, torrent_snapshot_id + created, created !{ id: 1,!  peer_id: 9222,!  torrent_snapshot_id: 4,!  upload_speed: 0.0000,!  download_speed: 0.0000,!  payload_upload_speed: 0.0000,!  payload_download_speed: 0.0000,!  total_upload: 0,!  total_download: 0,!  fail_count: 0,!  hashfail_count: 0,!  progress: 0.0000,!  created: "2008-10-28 01:57:35" }!!

http://cs.brown.edu/~pavlo/torrent/

�15

TokuMX : Raw Compression

Page 16: Introducing TokuMX: The Performance Engine for MongoDB (NYC.rb 2013-12-10)

®

TokuMX : Compression : Field Names

�16

!schema 1 - long field names (10/20/20) { first_name : “Tim”, ! last_name : “Callaghan”, ! email_address : “[email protected]” }

!schema 2 - short field names (26 less bytes per doc) { fn : “Tim”, ! ln : “Callaghan”, ! ea : “[email protected]” }

!