Новая архитектура шардинга mongodb, leif walsh (tokutek)

A New MongoDB Sharding Architecture for Higher Availability and Better Resource Utilization Leif Walsh @leifwalsh

A Traditional MongoDB Cluster

•  3 shards. •  3 replicas per shard.

•  3x write throughput. •  3x read throughput.

•  1 node can go down

without losing availability.

•  Data can survive

destruction of 2 nodes.

General MongoDB Cluster

•  Sx write throughput. •  Rx read throughput. •  R/2 nodes can go down

without losing availability. •  Data can survive

destruction of R-1 nodes. •  S×R hardware &

maintenance cost.

TokuMX: MongoDB with Fractal Trees •  MongoDB fork. •  Compression, performance, transactions. •  Details about Fractal Trees after lunch.

TokuMX: MongoDB with Fractal Trees •  Read-free Replication •  Fast Updates •  Optimized Sharding Migrations •  Ark Consensus for Replication Failover •  Partitioned Collections •  Clustering Indexes & Primary Keys •  tokutek.com/tokumx

Writes are cheap: •  O(1/B) I/Os per op. •  ≈10k/s Reads are expensive: •  Ω(1) I/O per op. •  ≈100/s

Fractal Tree Performance Basics

Read-free Replication Updates are reads + writes. Secondaries can trust the primary, only do writes.

Read-free Replication Updates are reads + writes. Secondaries can trust the primary, only do writes. Looking at I/O utilization, secondaries are very cheap compared to primaries.

A Traditional TokuMX Cluster

•  9 machines, only 3x throughput benefit.

•  Secondaries are under-utilized.

A TokuMX Cluster With Read-free Replication

•  3x write throughput. •  3x read throughput.

•  (maybe separately)

•  1 node can go down without losing availability.

•  Data can survive destruction of 2 nodes.

•  Only 3x hardware cost, down from 9x.

Dynamo Architecture •  Developed at Amazon. •  Used by Cassandra, Riak, Voldemort. •  Many components, I will focus on data

partitioning.

http://www.allthingsdistributed.com/2007/10/amazons_dynamo.html

Dynamo Architecture •  Servers are equal peers, not separate

primaries and secondaries. •  Store overlapping subsets of data

(MongoDB shards store disjoint subsets). •  Data partitioning determined by

consistent hashing.

Dynamo Partitioning •  N servers in a ring. •  hash(K) is a location

around the ring. •  Store data for K on the

next R servers on the ring.

Dynamo Partitioning •  All nodes accept writes:

~linear write scaling. •  Data replicated R times:

Rx read performance/reliability.

Dynamo-style Sharding in TokuMX

•  Each node is primary for some chunks, secondary for others.

•  Nodes store overlapping subsets of the data set.

•  S primaries in the ring: Sx write throughput.

•  R copies of each chunk on separate machines: Rx read throughput, availability & recovery guarantees.

•  Adding a node: –  Move one secondary from each

of next 2 nodes to the new node. –  Initialize a new replica set on the

new node and next 2 nodes.

Future Work Chunk balancer is not sophisticated: •  Adding/removing machines is

rough, overloads the machine’s neighbors.

•  Can we use ideas from Cassandra & Riak to improve this?

MongoDB architecture requires managing multiple processes on each machine. •  We can do better with good

tools. Talk to me if you want to write them.

Thanks! Come to my talk after lunch for details about Fractal Trees.

leif@tokutek.com @leifwalsh

tokutek.com/tokumx slidesha.re/13pxgH8

Новая архитектура шардинга mongodb, leif walsh (tokutek)

traditionaltokumx cluster

store data

data partitioning

dynamostyle sharding

free replicationupdates

data set

r2 nodes

tokumx s primaries

Internet

Архитектура ИС

Новгородская архитектура

Древнерусская архитектура

архитектура востока

архитектура монголии

современная архитектура

Архитектура компьютера

Архитектура openstack

Иерархия Архитектура

русская архитектура

культовая архитектура

Адаптивная архитектура

Архитектура osa

архитектура компьютера

by: dp leif ericson’s early work early life leif...

Архитектура Византии

Архитектура ros

Слоистая архитектура

Архитектура предприятий

arhitektura , архитектура