introduction to sharding

52
Introduction to Sharding Software Engineer, MongoDB Craig Wilson #MongoDBDays @craiggwilson

Upload: mongodb

Post on 09-May-2015

1.172 views

Category:

Technology


0 download

DESCRIPTION

Sharding allows you to distribute load across multiple servers and keep your data balanced across those servers. This session will review MongoDB’s sharding support, including an architectural overview, design principles, and automation.

TRANSCRIPT

Page 1: Introduction to Sharding

Introduction to Sharding

Software Engineer, MongoDB

Craig Wilson

#MongoDBDays

@craiggwilson

Page 2: Introduction to Sharding

Sharding is a Solution for scalability

Page 3: Introduction to Sharding

Examining Growth

•  User Growth –  1995: 0.4% of the world’s population –  Today: 30% of the world is online (~2.2B) –  Emerging Markets & Mobile

•  Data Set Growth –  Facebook’s data set is around 100 petabytes –  4 billion photos taken in the last year (4x a decade ago)

Page 4: Introduction to Sharding

Do you need to Shard?

Page 5: Introduction to Sharding

Read/Write Throughput Exceeds I/O

Page 6: Introduction to Sharding

Working Set Exceeds Physical Memory

Page 7: Introduction to Sharding

Sharding in MongoDB

Page 8: Introduction to Sharding

Horizontally Scalable

Page 9: Introduction to Sharding

Application Independent

Page 10: Introduction to Sharding

One API

Page 11: Introduction to Sharding

What is a Shard?

Page 12: Introduction to Sharding

Replica Set

Primary Secondary

Secondary

Page 13: Introduction to Sharding

Single Node in a Cluster

P S

S

P S

S

P S

S

Shard Shard Shard

Page 14: Introduction to Sharding

Composed of Chunks

•  Grouping of data based on a range

•  Default Max Size: 64 MB

Page 15: Introduction to Sharding

Chunks Have Ranges

A-B

M

S-Z

Page 16: Introduction to Sharding

Chunks Get Split

A-B

M

S-V

W-Z

Page 17: Introduction to Sharding

Chunks Get Migrated

•  One shard has 7 more chunks than another

•  Triggered manually

Page 18: Introduction to Sharding

Chunks Get Migrated

•  One shard has 7 more chunks than another

•  Triggered manually

Page 19: Introduction to Sharding

Chunks Get Migrated

•  One shard has 7 more chunks than another

•  Triggered manually

Page 20: Introduction to Sharding

How does it all work?

Page 21: Introduction to Sharding

Configuration

•  3 Config Servers –  Just mongod –  Stores chunk ranges and location –  Not a replica set

Config Config Config

Page 22: Introduction to Sharding

Routers

•  Mongos –  Both a router and a balancer –  No local data –  Can have 1 or many

Mongos

Page 23: Introduction to Sharding

Cluster

P S

S

P S

S

P S

S

Shard Shard Shard

Mongos Mongos

Config

Config

Config

Application Application

Page 24: Introduction to Sharding

Query Routing

Page 25: Introduction to Sharding

Shard Key

•  Defines the range of data called a Key Space

•  Defines the distribution of documents in a collection

•  Every document must contain the Shard Key

•  Shard Keys are immutable

Page 26: Introduction to Sharding

Chunks

•  Each chunk contains a non-overlapping range of Shard Key values

Page 27: Introduction to Sharding

3 Types of Queries

•  Targeted Queries

•  Scatter Gather Queries

•  Scatter Gather Queries with Sorting

Page 28: Introduction to Sharding

Targeted Queries

•  Query contains the shard key

P S

S

P S

S

P S

S

Mongos

Page 29: Introduction to Sharding

Scatter Gather Queries

•  Query does not contain the shard key

P S

S

P S

S

P S

S

Mongos

Page 30: Introduction to Sharding

Scatter Gather Queries with Sort

•  Query does not contain the shard key

•  Sorting is done first on the Shard

•  Results are merged in Mongos

P S

S

P S

S

P S

S

Mongos

Page 31: Introduction to Sharding

How do I pick a good Shard Key?

Page 32: Introduction to Sharding

Considerations

•  Cardinality

•  Write Distribution

•  Query Isolation

•  Reliability

•  Index Locality

Page 33: Introduction to Sharding

>  db.emails.find({  user:  123  })  

{  

     _id:  ObjectId(),    

     user:  123,  

     time:  Date(),    

     subject:  “...”,    

     recipients:  [],    

     body:  “...”,    

     attachments:  []  

}  

 

Example: Email Storage

Page 34: Introduction to Sharding

Cardinality Write Scaling

Query Isolation

Reliability Index

Locality

Example: Email Storage

Page 35: Introduction to Sharding

Cardinality Write Scaling

Query Isolation

Reliability Index

Locality

_id Doc level One shard Scatter/gather

All users affected

Good

Example: Email Storage

Page 36: Introduction to Sharding

Cardinality Write Scaling

Query Isolation

Reliability Index

Locality

_id Doc level One shard Scatter/gather

All users affected

Good

hash(_id) Hash level All Shards Scatter/gather

All users affected

Poor

Example: Email Storage

Page 37: Introduction to Sharding

Cardinality Write Scaling

Query Isolation

Reliability Index

Locality

_id Doc level One shard Scatter/gather

All users affected

Good

hash(_id) Hash level All Shards Scatter/gather

All users affected

Poor

user Many docs All Shards Targeted Some users affected Good

Example: Email Storage

Page 38: Introduction to Sharding

Cardinality Write Scaling

Query Isolation

Reliability Index

Locality

_id Doc level One shard Scatter/gather

All users affected

Good

hash(_id) Hash level All Shards Scatter/gather

All users affected

Poor

user Many docs All Shards Targeted Some users affected Good

user, time Doc level All Shards Targeted Some users affected Good

Example: Email Storage

Page 39: Introduction to Sharding

How do I get up and running?

Page 40: Introduction to Sharding

5 Steps

•  Launch Config Servers

•  Launch Mongos

•  Launch Shards

•  Add Shards

•  Enable Sharding

Page 41: Introduction to Sharding

Launch Config Servers

•  mongod  –configsvr  

•  Starts 1 config server on the default port 27019

Config

Config

Config

Page 42: Introduction to Sharding

Launch Mongos

•  mongos  –configdb  hostname:27019,hostname2:27019,hostname3:27019  

Mongos Config

Config

Config

Page 43: Introduction to Sharding

Launch Shards

•  Nothing special, just like a normal replica set

P S

S

Shard

Mongos Config

Config

Config

Page 44: Introduction to Sharding

Add Shards

•  Connect to mongos via the shell

•  sh.addShard(“<rsname>/<seedlist>”)  

P S

S

Shard

Mongos Config

Config

Config

Page 45: Introduction to Sharding

db.runCommand({  listShards:  1  })  {    

   shards  :  [  

       {  _id:  “shard0000”,  host:  “<hostname>:27017”  }    

   ],  

   “ok”  :  1  }  

 

Verify that the shard was added

Page 46: Introduction to Sharding

Enable Sharding

•  Enable sharding on a database –  sh.enableSharding(“<dbname>”)  

•  Shard a collection with the given key –  sh.shardCollection(“<dbname>.people”,  {  country:  1  })  –  sh.shardCollection(“<dbname>”.cars”,  {  year:  1,  uniqueid:  1})  

Page 47: Introduction to Sharding

Tag Aware Sharding

•  Tag aware sharding allows you to control the distribution of your data

•  Tag a range of shard keys –  sh.addTagRange(<collection>,<min>,<max>,<tag>)  

•  Tag a shard –  sh.addShardTag(<shard>,<tag>)  

Page 48: Introduction to Sharding

Conclusion

Page 49: Introduction to Sharding

Read/Write Throughput Exceeds I/O

Page 50: Introduction to Sharding

Working Set Exceeds Physical Memory

Page 51: Introduction to Sharding

Sharding Enables Scale

MongoDB’s Auto-Sharding

–  Easy to Configure –  Consistent Interface –  Free and Open Source

Page 52: Introduction to Sharding

Thank You

Software Engineer, MongoDB

Craig Wilson

#MongoDBDays

@craiggwilson