Новая архитектура шардинга mongodb, leif walsh (tokutek)
DESCRIPTION
Доклад Лейфа Уолша на HighLoad++ 2014.TRANSCRIPT
![Page 1: Новая архитектура шардинга MongoDB, Leif Walsh (Tokutek)](https://reader034.vdocuments.net/reader034/viewer/2022052507/5585ba73d8b42a40548b4c3f/html5/thumbnails/1.jpg)
A New MongoDB Sharding Architecture for Higher Availability and Better Resource Utilization Leif Walsh @leifwalsh
![Page 2: Новая архитектура шардинга MongoDB, Leif Walsh (Tokutek)](https://reader034.vdocuments.net/reader034/viewer/2022052507/5585ba73d8b42a40548b4c3f/html5/thumbnails/2.jpg)
A Traditional MongoDB Cluster
• 3 shards. • 3 replicas per shard.
![Page 3: Новая архитектура шардинга MongoDB, Leif Walsh (Tokutek)](https://reader034.vdocuments.net/reader034/viewer/2022052507/5585ba73d8b42a40548b4c3f/html5/thumbnails/3.jpg)
A Traditional MongoDB Cluster
• 3x write throughput. • 3x read throughput.
![Page 4: Новая архитектура шардинга MongoDB, Leif Walsh (Tokutek)](https://reader034.vdocuments.net/reader034/viewer/2022052507/5585ba73d8b42a40548b4c3f/html5/thumbnails/4.jpg)
A Traditional MongoDB Cluster
• 1 node can go down
without losing availability.
![Page 5: Новая архитектура шардинга MongoDB, Leif Walsh (Tokutek)](https://reader034.vdocuments.net/reader034/viewer/2022052507/5585ba73d8b42a40548b4c3f/html5/thumbnails/5.jpg)
A Traditional MongoDB Cluster
• Data can survive
destruction of 2 nodes.
![Page 6: Новая архитектура шардинга MongoDB, Leif Walsh (Tokutek)](https://reader034.vdocuments.net/reader034/viewer/2022052507/5585ba73d8b42a40548b4c3f/html5/thumbnails/6.jpg)
General MongoDB Cluster
• Sx write throughput. • Rx read throughput. • R/2 nodes can go down
without losing availability. • Data can survive
destruction of R-1 nodes. • S×R hardware &
maintenance cost.
![Page 7: Новая архитектура шардинга MongoDB, Leif Walsh (Tokutek)](https://reader034.vdocuments.net/reader034/viewer/2022052507/5585ba73d8b42a40548b4c3f/html5/thumbnails/7.jpg)
TokuMX: MongoDB with Fractal Trees • MongoDB fork. • Compression, performance, transactions. • Details about Fractal Trees after lunch.
![Page 8: Новая архитектура шардинга MongoDB, Leif Walsh (Tokutek)](https://reader034.vdocuments.net/reader034/viewer/2022052507/5585ba73d8b42a40548b4c3f/html5/thumbnails/8.jpg)
TokuMX: MongoDB with Fractal Trees • Read-free Replication • Fast Updates • Optimized Sharding Migrations • Ark Consensus for Replication Failover • Partitioned Collections • Clustering Indexes & Primary Keys • tokutek.com/tokumx
![Page 9: Новая архитектура шардинга MongoDB, Leif Walsh (Tokutek)](https://reader034.vdocuments.net/reader034/viewer/2022052507/5585ba73d8b42a40548b4c3f/html5/thumbnails/9.jpg)
Writes are cheap: • O(1/B) I/Os per op. • ≈10k/s Reads are expensive: • Ω(1) I/O per op. • ≈100/s
Fractal Tree Performance Basics
![Page 10: Новая архитектура шардинга MongoDB, Leif Walsh (Tokutek)](https://reader034.vdocuments.net/reader034/viewer/2022052507/5585ba73d8b42a40548b4c3f/html5/thumbnails/10.jpg)
Read-free Replication Updates are reads + writes. Secondaries can trust the primary, only do writes.
![Page 11: Новая архитектура шардинга MongoDB, Leif Walsh (Tokutek)](https://reader034.vdocuments.net/reader034/viewer/2022052507/5585ba73d8b42a40548b4c3f/html5/thumbnails/11.jpg)
Read-free Replication Updates are reads + writes. Secondaries can trust the primary, only do writes. Looking at I/O utilization, secondaries are very cheap compared to primaries.
![Page 12: Новая архитектура шардинга MongoDB, Leif Walsh (Tokutek)](https://reader034.vdocuments.net/reader034/viewer/2022052507/5585ba73d8b42a40548b4c3f/html5/thumbnails/12.jpg)
A Traditional TokuMX Cluster
• 9 machines, only 3x throughput benefit.
• Secondaries are under-utilized.
![Page 13: Новая архитектура шардинга MongoDB, Leif Walsh (Tokutek)](https://reader034.vdocuments.net/reader034/viewer/2022052507/5585ba73d8b42a40548b4c3f/html5/thumbnails/13.jpg)
A TokuMX Cluster With Read-free Replication
• 3x write throughput. • 3x read throughput.
• (maybe separately)
![Page 14: Новая архитектура шардинга MongoDB, Leif Walsh (Tokutek)](https://reader034.vdocuments.net/reader034/viewer/2022052507/5585ba73d8b42a40548b4c3f/html5/thumbnails/14.jpg)
A TokuMX Cluster With Read-free Replication
• 1 node can go down without losing availability.
![Page 15: Новая архитектура шардинга MongoDB, Leif Walsh (Tokutek)](https://reader034.vdocuments.net/reader034/viewer/2022052507/5585ba73d8b42a40548b4c3f/html5/thumbnails/15.jpg)
A TokuMX Cluster With Read-free Replication
• Data can survive destruction of 2 nodes.
![Page 16: Новая архитектура шардинга MongoDB, Leif Walsh (Tokutek)](https://reader034.vdocuments.net/reader034/viewer/2022052507/5585ba73d8b42a40548b4c3f/html5/thumbnails/16.jpg)
A TokuMX Cluster With Read-free Replication
• Only 3x hardware cost, down from 9x.
![Page 17: Новая архитектура шардинга MongoDB, Leif Walsh (Tokutek)](https://reader034.vdocuments.net/reader034/viewer/2022052507/5585ba73d8b42a40548b4c3f/html5/thumbnails/17.jpg)
Dynamo Architecture • Developed at Amazon. • Used by Cassandra, Riak, Voldemort. • Many components, I will focus on data
partitioning.
http://www.allthingsdistributed.com/2007/10/amazons_dynamo.html
![Page 18: Новая архитектура шардинга MongoDB, Leif Walsh (Tokutek)](https://reader034.vdocuments.net/reader034/viewer/2022052507/5585ba73d8b42a40548b4c3f/html5/thumbnails/18.jpg)
Dynamo Architecture • Servers are equal peers, not separate
primaries and secondaries. • Store overlapping subsets of data
(MongoDB shards store disjoint subsets). • Data partitioning determined by
consistent hashing.
![Page 19: Новая архитектура шардинга MongoDB, Leif Walsh (Tokutek)](https://reader034.vdocuments.net/reader034/viewer/2022052507/5585ba73d8b42a40548b4c3f/html5/thumbnails/19.jpg)
Dynamo Partitioning • N servers in a ring. • hash(K) is a location
around the ring. • Store data for K on the
next R servers on the ring.
![Page 20: Новая архитектура шардинга MongoDB, Leif Walsh (Tokutek)](https://reader034.vdocuments.net/reader034/viewer/2022052507/5585ba73d8b42a40548b4c3f/html5/thumbnails/20.jpg)
Dynamo Partitioning • All nodes accept writes:
~linear write scaling. • Data replicated R times:
Rx read performance/reliability.
![Page 21: Новая архитектура шардинга MongoDB, Leif Walsh (Tokutek)](https://reader034.vdocuments.net/reader034/viewer/2022052507/5585ba73d8b42a40548b4c3f/html5/thumbnails/21.jpg)
Dynamo-style Sharding in TokuMX
• Each node is primary for some chunks, secondary for others.
• Nodes store overlapping subsets of the data set.
![Page 22: Новая архитектура шардинга MongoDB, Leif Walsh (Tokutek)](https://reader034.vdocuments.net/reader034/viewer/2022052507/5585ba73d8b42a40548b4c3f/html5/thumbnails/22.jpg)
Dynamo-style Sharding in TokuMX
• S primaries in the ring: Sx write throughput.
• R copies of each chunk on separate machines: Rx read throughput, availability & recovery guarantees.
![Page 23: Новая архитектура шардинга MongoDB, Leif Walsh (Tokutek)](https://reader034.vdocuments.net/reader034/viewer/2022052507/5585ba73d8b42a40548b4c3f/html5/thumbnails/23.jpg)
Dynamo-style Sharding in TokuMX
• Adding a node: – Move one secondary from each
of next 2 nodes to the new node. – Initialize a new replica set on the
new node and next 2 nodes.
![Page 24: Новая архитектура шардинга MongoDB, Leif Walsh (Tokutek)](https://reader034.vdocuments.net/reader034/viewer/2022052507/5585ba73d8b42a40548b4c3f/html5/thumbnails/24.jpg)
Future Work Chunk balancer is not sophisticated: • Adding/removing machines is
rough, overloads the machine’s neighbors.
• Can we use ideas from Cassandra & Riak to improve this?
MongoDB architecture requires managing multiple processes on each machine. • We can do better with good
tools. Talk to me if you want to write them.
![Page 25: Новая архитектура шардинга MongoDB, Leif Walsh (Tokutek)](https://reader034.vdocuments.net/reader034/viewer/2022052507/5585ba73d8b42a40548b4c3f/html5/thumbnails/25.jpg)
Thanks! Come to my talk after lunch for details about Fractal Trees.
[email protected] @leifwalsh
tokutek.com/tokumx slidesha.re/13pxgH8