mongodb and fractal tree indexes
DESCRIPTION
Interested in learning more about MongoDB? Sign up for MongoSV, the largest annual user conference dedicated to MongoDB. Learn more at MongoSV.comTRANSCRIPT
![Page 1: MongoDB and Fractal Tree Indexes](https://reader030.vdocuments.net/reader030/viewer/2022013121/5562536bd8b42a1b4b8b4f4d/html5/thumbnails/1.jpg)
1
MongoDB and Fractal Tree® Indexes
Tim Callaghan*!VP/Engineering, Tokutek!
MongoDB Boston 2012
* not [yet] a MongoDB expert
![Page 2: MongoDB and Fractal Tree Indexes](https://reader030.vdocuments.net/reader030/viewer/2022013121/5562536bd8b42a1b4b8b4f4d/html5/thumbnails/2.jpg)
2
B-trees
![Page 3: MongoDB and Fractal Tree Indexes](https://reader030.vdocuments.net/reader030/viewer/2022013121/5562536bd8b42a1b4b8b4f4d/html5/thumbnails/3.jpg)
B-tree Definition
In computer science, a B-tree is a tree data structure that keeps data sorted and allows searches,
sequential access, insertions, and deletions in logarithmic time.
http://en.wikipedia.org/wiki/B-tree
![Page 4: MongoDB and Fractal Tree Indexes](https://reader030.vdocuments.net/reader030/viewer/2022013121/5562536bd8b42a1b4b8b4f4d/html5/thumbnails/4.jpg)
B-tree Overview
I will use a simple single-pivot example throughout this presentation
![Page 5: MongoDB and Fractal Tree Indexes](https://reader030.vdocuments.net/reader030/viewer/2022013121/5562536bd8b42a1b4b8b4f4d/html5/thumbnails/5.jpg)
5
Basic B-tree
Internal Nodes - Path to data
Leaf Nodes - Actual Data
Pointers
Pivots
![Page 6: MongoDB and Fractal Tree Indexes](https://reader030.vdocuments.net/reader030/viewer/2022013121/5562536bd8b42a1b4b8b4f4d/html5/thumbnails/6.jpg)
B-tree example
22
10 99
2, 3, 4 10,20 22,25 99
* Pivot Rule is >=
![Page 7: MongoDB and Fractal Tree Indexes](https://reader030.vdocuments.net/reader030/viewer/2022013121/5562536bd8b42a1b4b8b4f4d/html5/thumbnails/7.jpg)
B-tree - insert
22
10 99
2, 3, 4 10,15,20 22,25 99
“Insert 15”
Value stored in leaf node
![Page 8: MongoDB and Fractal Tree Indexes](https://reader030.vdocuments.net/reader030/viewer/2022013121/5562536bd8b42a1b4b8b4f4d/html5/thumbnails/8.jpg)
B-tree - search
22
10 99
2, 3, 4 10,20 22,25 99
“Find 25”
![Page 9: MongoDB and Fractal Tree Indexes](https://reader030.vdocuments.net/reader030/viewer/2022013121/5562536bd8b42a1b4b8b4f4d/html5/thumbnails/9.jpg)
DISK
RAM
RAM
B-tree - storage
22
10 99
2, 3, 4 10,20 22,25 99
Performance is IO limited when bigger than RAM: try to fit all internal nodes and some leaf nodes
![Page 10: MongoDB and Fractal Tree Indexes](https://reader030.vdocuments.net/reader030/viewer/2022013121/5562536bd8b42a1b4b8b4f4d/html5/thumbnails/10.jpg)
DISK
RAM
RAM
B-tree – serial insertions
22
10 99
2, 3, 4 10,20 22,25 99
Serial insertion workloads are in-memory, think MongoDB’s “_id” index
![Page 11: MongoDB and Fractal Tree Indexes](https://reader030.vdocuments.net/reader030/viewer/2022013121/5562536bd8b42a1b4b8b4f4d/html5/thumbnails/11.jpg)
11
Fractal Tree Indexes
![Page 12: MongoDB and Fractal Tree Indexes](https://reader030.vdocuments.net/reader030/viewer/2022013121/5562536bd8b42a1b4b8b4f4d/html5/thumbnails/12.jpg)
Fractal Tree Indexes
similar to B-trees - store data in leaf nodes - use PK for ordering
message buffer
message buffer
message buffer
All internal nodes have message buffers
different than B-trees - message buffer in all internal nodes - doesn’t need to update leaf node immediately - much larger nodes (4MB vs. 8KB*)
![Page 13: MongoDB and Fractal Tree Indexes](https://reader030.vdocuments.net/reader030/viewer/2022013121/5562536bd8b42a1b4b8b4f4d/html5/thumbnails/13.jpg)
13
Fractal Tree Indexes – “insert 15”
22
10 99
2, 3, 4 10, 20 22, 25 99
insert(15)
No IO is required, all internal nodes usually fit in RAM
![Page 14: MongoDB and Fractal Tree Indexes](https://reader030.vdocuments.net/reader030/viewer/2022013121/5562536bd8b42a1b4b8b4f4d/html5/thumbnails/14.jpg)
14
Fractal Tree Indexes – “find 25”
22
10 99
2, 3, 4 10 22, 25 99
insert(15)
insert(20) insert(25)
delete(3)
![Page 15: MongoDB and Fractal Tree Indexes](https://reader030.vdocuments.net/reader030/viewer/2022013121/5562536bd8b42a1b4b8b4f4d/html5/thumbnails/15.jpg)
15
Fractal Tree Indexes – “insert 8”
22
10 99
2, 3, 4 10 22, 25 99
insert(15)
Buffer is full, push messages down to next level.
insert(20) insert(25)
delete(3)
![Page 16: MongoDB and Fractal Tree Indexes](https://reader030.vdocuments.net/reader030/viewer/2022013121/5562536bd8b42a1b4b8b4f4d/html5/thumbnails/16.jpg)
16
Fractal Tree Indexes – “insert 8”
22
10 99
2, 4, 8 10, 20, 25 22, 25 99
insert(15)
Inserted 8, 20, 25. Deleted 3.
![Page 17: MongoDB and Fractal Tree Indexes](https://reader030.vdocuments.net/reader030/viewer/2022013121/5562536bd8b42a1b4b8b4f4d/html5/thumbnails/17.jpg)
17
Fractal Tree Indexes – compression
• Large node size (4MB) leads to high compression ratios.
• Supports zlib, quicklz, and lzma compression algorithms.
• Compression is generally 5x to 25x, similar to what gzip and 7z can do to your data.
• Significantly less disk space needed • Less writes, bigger writes • Both of which are great for SSDs
• Reads are highly compressed, more data per IO
![Page 18: MongoDB and Fractal Tree Indexes](https://reader030.vdocuments.net/reader030/viewer/2022013121/5562536bd8b42a1b4b8b4f4d/html5/thumbnails/18.jpg)
18
So what does this have to do with MongoDB?
![Page 19: MongoDB and Fractal Tree Indexes](https://reader030.vdocuments.net/reader030/viewer/2022013121/5562536bd8b42a1b4b8b4f4d/html5/thumbnails/19.jpg)
19
So what does this have to do with MongoDB?
* Watch Tyler Brock’s presentation “Indexing and Query Optimization”
![Page 20: MongoDB and Fractal Tree Indexes](https://reader030.vdocuments.net/reader030/viewer/2022013121/5562536bd8b42a1b4b8b4f4d/html5/thumbnails/20.jpg)
20
MongoDB Storage
25
10 99
(2,ptr2), (4,ptr4)
(10,ptr10) (25,ptr25), (98,ptr98)
(101,ptr101)
85
40 120
(2,ptr10), (35,ptr101)
(55,ptr4) (90,ptr2) (2599,ptr98)
db.test.insert({foo:55}) db.test.ensureIndex({foo:1})
PK index (_id + pointer) Secondary Index (foo + pointer)
The “pointer” tells MongoDB where to look in the data files for the actual document data.
![Page 21: MongoDB and Fractal Tree Indexes](https://reader030.vdocuments.net/reader030/viewer/2022013121/5562536bd8b42a1b4b8b4f4d/html5/thumbnails/21.jpg)
21
MongoDB Storage
25
10 99
(2,ptr2), (4,ptr4)
(10,ptr10) (25,ptr25), (98,ptr98)
(101,ptr101)
85
40 120
(2,ptr10), (35,ptr101)
(55,ptr4) (90,ptr2) (2599,ptr98)
B-trees
![Page 22: MongoDB and Fractal Tree Indexes](https://reader030.vdocuments.net/reader030/viewer/2022013121/5562536bd8b42a1b4b8b4f4d/html5/thumbnails/22.jpg)
22
• Tokutek’s Fractal Tree Index Implementations • MySQL Storage Engine (TokuDB) • BerkeleyDB API • File System (TokuFS)
• Recently added Fractal Tree Indexes to MongoDB 2.2
• Existing indexes are still supported • Source changes are available via our blog at
www.tokutek.com/tokuview • This is a work in progress (see roadmap
slides)
Who is Tokutek and what have we done?
![Page 23: MongoDB and Fractal Tree Indexes](https://reader030.vdocuments.net/reader030/viewer/2022013121/5562536bd8b42a1b4b8b4f4d/html5/thumbnails/23.jpg)
23
as simple as
db.test.ensureIndex({foo:1}, {v:2})
MongoDB and Fractal Tree Indexes
![Page 24: MongoDB and Fractal Tree Indexes](https://reader030.vdocuments.net/reader030/viewer/2022013121/5562536bd8b42a1b4b8b4f4d/html5/thumbnails/24.jpg)
24
db.test.ensureIndex({foo:1},{v:2, blocksize:4194304, basementsize=131072, compression:quicklz, clustering:false})
• Node size, defaults to 4MB.
Indexing Options #1
![Page 25: MongoDB and Fractal Tree Indexes](https://reader030.vdocuments.net/reader030/viewer/2022013121/5562536bd8b42a1b4b8b4f4d/html5/thumbnails/25.jpg)
25
db.test.ensureIndex({foo:1},{v:2, blocksize:4194304, basementsize=131072, compression:quicklz, clustering:false})
• Basement node size, defaults to 128K. • Smallest retrievable unit of a leaf node,
efficient point queries
Indexing Options #2
![Page 26: MongoDB and Fractal Tree Indexes](https://reader030.vdocuments.net/reader030/viewer/2022013121/5562536bd8b42a1b4b8b4f4d/html5/thumbnails/26.jpg)
26
db.test.ensureIndex({foo:1},{v:2, blocksize:4194304, basementsize=131072, compression:quicklz, clustering:false})
• Compression algorithm, defaults to quicklz. • Supports quicklz, lzma, zlib, and none. • LZMA provides 40% additional compression
beyond quicklz, needs more CPU. • Decompression is of quicklz and lzma are
similar.
Indexing Options #3
![Page 27: MongoDB and Fractal Tree Indexes](https://reader030.vdocuments.net/reader030/viewer/2022013121/5562536bd8b42a1b4b8b4f4d/html5/thumbnails/27.jpg)
27
db.test.ensureIndex({foo:1},{v:2, blocksize:4194304, basementsize=131072, compression:quicklz, clustering:false})
• Clustering indexes store data by key and
include the entire document as the payload (rather than a pointer to the document)
• Always “cover” a query, no need to retrieve the document data
Indexing Options #4
![Page 28: MongoDB and Fractal Tree Indexes](https://reader030.vdocuments.net/reader030/viewer/2022013121/5562536bd8b42a1b4b8b4f4d/html5/thumbnails/28.jpg)
28
How well does it perform?
Three Benchmarks • Benchmark 1 : Raw insertion performance • Benchmark 2 : Insertion plus queries • Benchmark 3 : Covered indexes vs. clustering
indexes
![Page 29: MongoDB and Fractal Tree Indexes](https://reader030.vdocuments.net/reader030/viewer/2022013121/5562536bd8b42a1b4b8b4f4d/html5/thumbnails/29.jpg)
29
Benchmarks…
Race Results • First Place = John • Second Place = Tim • Third Place = Frank
![Page 30: MongoDB and Fractal Tree Indexes](https://reader030.vdocuments.net/reader030/viewer/2022013121/5562536bd8b42a1b4b8b4f4d/html5/thumbnails/30.jpg)
30
Benchmarks…
Race Results • First Place = John • Second Place = Tim • Third Place = Frank Frank can say the following: “I finished third, but Tim was second to last.”
![Page 31: MongoDB and Fractal Tree Indexes](https://reader030.vdocuments.net/reader030/viewer/2022013121/5562536bd8b42a1b4b8b4f4d/html5/thumbnails/31.jpg)
31
Benchmarks…
Race Results • First Place = John • Second Place = Tim • Third Place = Frank Frank can say the following: “I finished third, but Tim was second to last.” Understand benchmark specifics and review all results.
![Page 32: MongoDB and Fractal Tree Indexes](https://reader030.vdocuments.net/reader030/viewer/2022013121/5562536bd8b42a1b4b8b4f4d/html5/thumbnails/32.jpg)
32
Benchmark 1 : Overview
• Measure single threaded insertion performance • Document is URI (character), name (character),
origin (character), creation date (timestamp), and expiration date (timestamp)
• Secondary indexes on URI, name, origin, expiration • Machine specifics: – Sun x4150, (2) Xeon 5460, 8GB RAM, StorageTek
Controller (256MB, write-back), 4x10K SAS/RAID 0 – Ubuntu 10.04 Server (64-bit), ext4 filesystem – MongoDB v2.2.RC0
![Page 33: MongoDB and Fractal Tree Indexes](https://reader030.vdocuments.net/reader030/viewer/2022013121/5562536bd8b42a1b4b8b4f4d/html5/thumbnails/33.jpg)
33
Benchmark 1 : Without Journaling
![Page 34: MongoDB and Fractal Tree Indexes](https://reader030.vdocuments.net/reader030/viewer/2022013121/5562536bd8b42a1b4b8b4f4d/html5/thumbnails/34.jpg)
34
Benchmark 1 : With Journaling
![Page 35: MongoDB and Fractal Tree Indexes](https://reader030.vdocuments.net/reader030/viewer/2022013121/5562536bd8b42a1b4b8b4f4d/html5/thumbnails/35.jpg)
35
Benchmark 1 : Observations
• Fractal Tree Indexing insertion performance is 8x better than standard MongoDB indexing with journaling, and 11x without journaling
• Fractal Tree Indexing insertion performance reaches steady state, even at 200 million insertions. MongoDB insertion performance seems to be in continual decline at only 50 million insertions
• B-tree performance is great until the working data set > RAM
![Page 36: MongoDB and Fractal Tree Indexes](https://reader030.vdocuments.net/reader030/viewer/2022013121/5562536bd8b42a1b4b8b4f4d/html5/thumbnails/36.jpg)
36
Benchmark 2 : Overview
• Measure single threaded insertion performance while querying for 1000 documents with a URI greater than or equal to a randomly selected value once every 60 seconds
• Document is same as benchmark 1 • Secondary indexes on URI, name, origin, expiration • Fractal Tree Index on URI is clustering – clustering indexes store entire document inline – Compression controls disk usage – no need to get document data from elsewhere – db.tokubench.ensureIndex({URI:1}, {v:2, clustering:true})
• Same hardware as benchmark 1
![Page 37: MongoDB and Fractal Tree Indexes](https://reader030.vdocuments.net/reader030/viewer/2022013121/5562536bd8b42a1b4b8b4f4d/html5/thumbnails/37.jpg)
37
Benchmark 2 : Insertion Performance
![Page 38: MongoDB and Fractal Tree Indexes](https://reader030.vdocuments.net/reader030/viewer/2022013121/5562536bd8b42a1b4b8b4f4d/html5/thumbnails/38.jpg)
38
Benchmark 2 : Query Latency
![Page 39: MongoDB and Fractal Tree Indexes](https://reader030.vdocuments.net/reader030/viewer/2022013121/5562536bd8b42a1b4b8b4f4d/html5/thumbnails/39.jpg)
39
Benchmark 2 : Observations
• Fractal Tree Indexing insertion performance is 10x better than standard MongoDB indexing
• Fractal Tree Indexing query latency is 268x better than standard MongoDB indexing
• B-tree performance is great until the working data set > RAM
• Random lookups are bad
...but what about MongoDB’s covered indexes?
![Page 40: MongoDB and Fractal Tree Indexes](https://reader030.vdocuments.net/reader030/viewer/2022013121/5562536bd8b42a1b4b8b4f4d/html5/thumbnails/40.jpg)
40
Benchmark 3 : Overview
• Same workload and hardware as benchmark 2 • Create a MongoDB covered index on URI to
eliminate lookups in the data files. – db.tokubench.ensureIndex({URI:1,creation:1,name:1,origin:1})
![Page 41: MongoDB and Fractal Tree Indexes](https://reader030.vdocuments.net/reader030/viewer/2022013121/5562536bd8b42a1b4b8b4f4d/html5/thumbnails/41.jpg)
41
Benchmark 3 : Insertion Performance
![Page 42: MongoDB and Fractal Tree Indexes](https://reader030.vdocuments.net/reader030/viewer/2022013121/5562536bd8b42a1b4b8b4f4d/html5/thumbnails/42.jpg)
42
Benchmark 3 : Query Latency
![Page 43: MongoDB and Fractal Tree Indexes](https://reader030.vdocuments.net/reader030/viewer/2022013121/5562536bd8b42a1b4b8b4f4d/html5/thumbnails/43.jpg)
43
Benchmark 3 : Observations
• Fractal Tree Indexing insertion performance is still 3.7x better than standard MongoDB indexing
• Fractal Tree Indexing query latency is 3.2x better than standard MongoDB indexing (although the MongoDB performance is highly variable)
• B-tree performance is great until the working data set > RAM
• MongoDB’s covered indexes can help a lot – But what happens when I add new fields to my
document? o Do I drop and re-create by including my new field? o Do I live without it?
– Clustered Fractal Tree Indexes keep on covering your queries!
![Page 44: MongoDB and Fractal Tree Indexes](https://reader030.vdocuments.net/reader030/viewer/2022013121/5562536bd8b42a1b4b8b4f4d/html5/thumbnails/44.jpg)
44
Roadmap : Continuing the Implementation
• Optimize Indexing Insert/Update/Delete Operations – Each of our secondary indexes is currently creating and
committing a transaction for each operation – A single transaction envelope will improve performance
![Page 45: MongoDB and Fractal Tree Indexes](https://reader030.vdocuments.net/reader030/viewer/2022013121/5562536bd8b42a1b4b8b4f4d/html5/thumbnails/45.jpg)
45
Roadmap : Continuing the Implementation
• Add Support for Parallel Array Indexes – MongoDB does not support indexing the following two
fields: o {a: [1, 2], b: [1, 2]}
– “it could get out of hand” – Ticketed on 3/24/2010,
jira.mongodb.org/browse/SERVER-826 – Benchmark coming soon…
![Page 46: MongoDB and Fractal Tree Indexes](https://reader030.vdocuments.net/reader030/viewer/2022013121/5562536bd8b42a1b4b8b4f4d/html5/thumbnails/46.jpg)
46
Roadmap : Continuing the Implementation
• Add Crash Safety – Our implementation is not [yet] crash safe with the
MongoDB PK/heap storage mechanism. – MongoDB journal is separate from Fractal Tree Index
logs. – Need to create a transactional envelope around both of
them
![Page 47: MongoDB and Fractal Tree Indexes](https://reader030.vdocuments.net/reader030/viewer/2022013121/5562536bd8b42a1b4b8b4f4d/html5/thumbnails/47.jpg)
47
Roadmap : Continuing the Implementation
• Replace MongoDB data store and PK index – A clustering index on _id eliminates the need for two
storage systems – Compression greatly reduces disk footprint – This is a large task
![Page 49: MongoDB and Fractal Tree Indexes](https://reader030.vdocuments.net/reader030/viewer/2022013121/5562536bd8b42a1b4b8b4f4d/html5/thumbnails/49.jpg)
49
Questions?
Tim Callaghan [email protected]
@tmcallaghan
More detailed benchmark information in my blogs at
www.tokutek.com/tokuview