Download - Mongosv 2011 - Sharding
Sharding
Jared Rosoff (@forjared)
Overview
• Architecture• How it works• Use Cases
SHARDING ARCHITECTURE
Architecture
mongos
• Shard Router• Acts just like a MongoD• 1 or as many as you
want• Can run on App
Servers• Caches meta-data from
config servers
Config Server
• 3 of them• Changes use 2 phase
commit • If any are down, meta
data goes read only • System is online as
long as 1/3 is up
HOW IT WORKS
Keys
{ name: “Jared”, email: “[email protected]”,}{ name: “Scott”, email: “[email protected]”,}{ name: “Dan”, email: “[email protected]”,}
> db.runCommand( { shardcollection: “test.users”, key: { email: 1 }} )
Chunks
-∞ +∞
Chunks
-∞ +∞
Split!This is a chunk
This is a chunk
ChunksMin Key Max Key Shard
-∞ [email protected] 1
[email protected] [email protected] 1
[email protected] [email protected] 1
[email protected] +∞ 1
• Stored in the config servers• Cached in MongoS • Used to route requests and keep cluster
balanced
Balancing
Shard 1 Shard 2 Shard 3 Shard 4
5
9
1
6
10
2
7
11
3
8
12
4
17
21
13
18
22
14
19
23
15
20
24
16
29
33
25
30
34
26
31
35
27
32
36
28
41
45
37
42
46
38
43
47
39
44
48
40
mongos
balancerconfig
config
config
Chunks!
Balancingmongos
balancerconfig
config
config
Shard 1 Shard 2 Shard 3 Shard 4
5
9
1
6
10
2
7
11
3
8
12
4
21 22 23 24 33 34 35 36 45 46 47 48
ImbalanceImbalance
Balancingmongos
balancer
Move chunk 1 to Shard 2
config
config
config
Shard 1 Shard 2 Shard 3 Shard 4
5
9
1
6
10
2
7
11
3
8
12
4
21 22 23 24 33 34 35 36 45 46 47 48
Balancingmongos
balancerconfig
config
config
Shard 1 Shard 2 Shard 3 Shard 4
5
9
6
10
2
7
11
3
8
12
4
21 22 23 24 33 34 35 36 45 46 47 48
1
Balancingmongos
balancer
Chunk 1 now lives on Shard 2
config
config
config
Shard 1 Shard 2 Shard 3 Shard 4
5
9
16
10
2
7
11
3
8
12
4
21 22 23 24 33 34 35 36 45 46 47 48
ROUTING
Routed Request
mongos
Shard 1 Shard 2 Shard 3
1
2
3
41. Query arrives at
MongoS2. MongoS routes query
to a single shard3. Shard returns results
of query4. Results returned to
client
Scatter Gather
mongos
Shard 1 Shard 2 Shard 3
1
4 1. Query arrives at MongoS
2. MongoS broadcasts query to all shards
3. Each shard returns results for query
4. Results combined and returned to client2 2
33
2
3
Distributed Merge Sort
mongos
Shard 1 Shard 2 Shard 3
1
3
6 1. Query arrives at MongoS
2. MongoS broadcasts query to all shards
3. Each shard locally sorts results
4. Results returned to mongos
5. MongoS merge sorts individual results
6. Combined sorted result returned to client
2 2
3 3
4 4
5
2
4
Writes
Inserts Requires shard key
db.users.insert({ name: “Jared”, email: “[email protected]”})
Removes Routed db.users.delete({ email: “[email protected]”})
Scattered db.users.delete({name: “Jared”})
Updates Routed db.users.update( {email: “[email protected]”}, {$set: { state: “CA”}})
Scattered db.users.update( {state: “FZ”}, {$set:{ state: “CA”}} )
Queries
By Shard Key
Routed db.users.find( {email: “[email protected]”})
Sorted by shard key
Routed in order db.users.find().sort({email:-1})
Find by non shard key
Scatter Gather db.users.find({state:”CA”})
Sorted by non shard key
Distributed merge sort
db.users.find().sort({state:1})
EXAMPLES
User Profiles{ name: “Jared”, email: “[email protected]”, addresses: [ {state: “CA”} ]}
• Shard by email• Lookup by email hits
1 node • Index on
{“addresses.state”:1}
Activity Stream{ user_id: “[email protected]”, event_id: “Logged in”, data: “…”}
• Shard by user_id• Looking up a stream
hits 1 node• Writing is evenly
distributed• Index on {“event_id”:1}
for deletes
Photos{ photo_id: ???, data: BinData(…)}
• What’s the right key? – Auto Increment?– MD5( data )– Now() + MD5(data)– Month() + MD5(data)