challenges with mongodb
DESCRIPTION
Slides for MongoDB Beijing 2012TRANSCRIPT
![Page 1: Challenges with MongoDB](https://reader033.vdocuments.net/reader033/viewer/2022052820/54b4ec6a4a79598f728b4594/html5/thumbnails/1.jpg)
Challenges with MongoDB
MongoDB Beijing 2012
Stone Gao
Monday, April 2, 2012
![Page 2: Challenges with MongoDB](https://reader033.vdocuments.net/reader033/viewer/2022052820/54b4ec6a4a79598f728b4594/html5/thumbnails/2.jpg)
About Me
Tech Lead at Umeng.com
Monday, April 2, 2012
![Page 3: Challenges with MongoDB](https://reader033.vdocuments.net/reader033/viewer/2022052820/54b4ec6a4a79598f728b4594/html5/thumbnails/3.jpg)
MongoDB is Awesome• Document-oriented storage
• Full Index Support
• Replication & High Availability
• Auto-Sharding
• Querying
• Fast In-Place Updates
• Map/Reduce
• GridFS
Monday, April 2, 2012
![Page 4: Challenges with MongoDB](https://reader033.vdocuments.net/reader033/viewer/2022052820/54b4ec6a4a79598f728b4594/html5/thumbnails/4.jpg)
But...This talk is not Yet Another Talk about it’s Awesomeness
but
challenges with MongoDB
Monday, April 2, 2012
![Page 5: Challenges with MongoDB](https://reader033.vdocuments.net/reader033/viewer/2022052820/54b4ec6a4a79598f728b4594/html5/thumbnails/5.jpg)
Outline
1. Global Write Lock Sucks
2. Auto-Sharding is not that Reliable
3. Schema-less is Over Rated
4. Community Contribution is Quite Low
5. Attitude Matters
Monday, April 2, 2012
![Page 6: Challenges with MongoDB](https://reader033.vdocuments.net/reader033/viewer/2022052820/54b4ec6a4a79598f728b4594/html5/thumbnails/6.jpg)
1. Global Write Lock Sucks
http://www.clker.com/cliparts/3/3/5/D/X/b/locked-exclamation-mark-padlock-hi.png
Monday, April 2, 2012
![Page 7: Challenges with MongoDB](https://reader033.vdocuments.net/reader033/viewer/2022052820/54b4ec6a4a79598f728b4594/html5/thumbnails/7.jpg)
1. Global Write Lock Sucks
mongod
db-ndoc2doc1
collection1
doc2doc1
collection2
db-1doc2doc1
collection1
doc2doc1
collection2
mysqld
db-ndoc2doc1
table1
doc2doc1
table2
db-1doc2doc1
table1
doc2doc1
table2
VS.
DB Process Lock VS. Row Lock
single global write lock for the entire server (process)
Monday, April 2, 2012
![Page 8: Challenges with MongoDB](https://reader033.vdocuments.net/reader033/viewer/2022052820/54b4ec6a4a79598f728b4594/html5/thumbnails/8.jpg)
1. Global Write Lock SucksIntel SSD 320 RAID10 & mongostat
Nearly all data in RAM, lock ratio is pretty high and bunch of Queued Writes(qw)
39.5K Rread IOPS / 23K Write IOPS
Monday, April 2, 2012
![Page 9: Challenges with MongoDB](https://reader033.vdocuments.net/reader033/viewer/2022052820/54b4ec6a4a79598f728b4594/html5/thumbnails/9.jpg)
1. Global Write Lock SucksIntel SSD 320 RAID10 & mongostat
Nearly all data in RAM, lock ratio is pretty high and bunch of Queued Writes(qw)
39.5K Rread IOPS / 23K Write IOPS
Monday, April 2, 2012
![Page 10: Challenges with MongoDB](https://reader033.vdocuments.net/reader033/viewer/2022052820/54b4ec6a4a79598f728b4594/html5/thumbnails/10.jpg)
1. Global Write Lock SucksIntel SSD 320 RAID10 & mongostat
Nearly all data in RAM, lock ratio is pretty high and bunch of Queued Writes(qw)
39.5K Rread IOPS / 23K Write IOPS
Monday, April 2, 2012
![Page 11: Challenges with MongoDB](https://reader033.vdocuments.net/reader033/viewer/2022052820/54b4ec6a4a79598f728b4594/html5/thumbnails/11.jpg)
1. Global Write Lock SucksIntel SSD 320 RAID10 & mongostat
Nearly all data in RAM, lock ratio is pretty high and bunch of Queued Writes(qw)
39.5K Rread IOPS / 23K Write IOPS
Monday, April 2, 2012
![Page 12: Challenges with MongoDB](https://reader033.vdocuments.net/reader033/viewer/2022052820/54b4ec6a4a79598f728b4594/html5/thumbnails/12.jpg)
Possible Solutions/Workarounds #1Wait for lock related issues on JIRA
•SERVER-1240 : Collection level lockinghttps://jira.mongodb.org/browse/SERVER-1240 Planning Bucket A Vote (154)
•SERVER-1241 : Intra collection locking (maybe extent)https://jira.mongodb.org/browse/SERVER-1241 Planning Bucket A Vote (25)
•SERVER-2563 : When hitting disk, yield lock - phase 1https://jira.mongodb.org/browse/SERVER-2563 Fixed in 1.9.1 Vote (25)
• any time we actually have to hit disk. so if a memory mapped page is not in ram, then we should yield
update by _id, remove, long cursor iteration
•SERVER-1169 : Record level lockinghttps://jira.mongodb.org/browse/SERVER-1169 Rejected Vote (1)
and more ...
Monday, April 2, 2012
![Page 13: Challenges with MongoDB](https://reader033.vdocuments.net/reader033/viewer/2022052820/54b4ec6a4a79598f728b4594/html5/thumbnails/13.jpg)
Possible Solutions/Workarounds #2
One Collection per DB to Reduce Lock Ratio
But you can go no further
Use Auto-Sharding to the rescue ?
Monday, April 2, 2012
![Page 14: Challenges with MongoDB](https://reader033.vdocuments.net/reader033/viewer/2022052820/54b4ec6a4a79598f728b4594/html5/thumbnails/14.jpg)
2. Auto-Sharding is not that Reliable
http://www.autoinsurancecompanies.com/wp-content/uploads/2011/11/reliable.jpg
Monday, April 2, 2012
![Page 15: Challenges with MongoDB](https://reader033.vdocuments.net/reader033/viewer/2022052820/54b4ec6a4a79598f728b4594/html5/thumbnails/15.jpg)
Auto-Sharding is not that Reliable
Monday, April 2, 2012
![Page 16: Challenges with MongoDB](https://reader033.vdocuments.net/reader033/viewer/2022052820/54b4ec6a4a79598f728b4594/html5/thumbnails/16.jpg)
Problems with Auto-Sharding
• MongoDB can’t figure out how many docs in a collection after sharding
• Balancer dead lock [Balancer] skipping balancing round during ongoing split or move activity.)[Balancer] dist_lock lock failed because taken by....[Balancer] Assertion failure cm s/balance.cpp...
• Uneven shard load distribution
• ...
(Note: I did the experiment before 2.0. So some of the issues might be fixed or improved in new versions of MongoDB coz it’s evolving very fast)
Monday, April 2, 2012
![Page 17: Challenges with MongoDB](https://reader033.vdocuments.net/reader033/viewer/2022052820/54b4ec6a4a79598f728b4594/html5/thumbnails/17.jpg)
0) Turn off the balancer (balancing won't understand your locations, but it shouldn't matter b/c you're using hashed shard keys)
1) Shard the empty collection over the shard key { location : 1, hash : 1 }
2) run db.runCommand({ split : "<coll>", middle : { "location":"DEN", "hash": "8000...0" }})
3) run db.runCommand({ split : "<coll>", middle : { "location":"SC", "hash": "0000...0" }})
4) move those empty chunks to whatever shards you want
- Greg Studer
Possible Solutions/Workarounds #1Manual Chunk Pre-Splitting
http://www.mongodb.org/display/DOCS/Splitting+Shard+Chunkshttps://groups.google.com/d/msg/mongodb-user/tYBFKSMM3cU/TiYtoOiNMgEJhttp://blog.zawodny.com/2011/03/06/mongodb-pre-splitting-for-faster-data-loading-and-importing/
Monday, April 2, 2012
![Page 18: Challenges with MongoDB](https://reader033.vdocuments.net/reader033/viewer/2022052820/54b4ec6a4a79598f728b4594/html5/thumbnails/18.jpg)
Possible Solutions/Workarounds #2
https://github.com/twitter/gizzard/raw/master/doc/forwarding_table.png
SERVER-2001 : Option to hash shard key https://jira.mongodb.org/browse/SERVER-2001 Unresolved Fix Version/s: 2.1.1 Vote (27)
“The lack of hashing based read/write distribution amongst available shards is a huge issue for us now. We're actually considering implementing an app-side layer to do this but that obviously has a number of serious drawbacks.”- Remon van Vliet
“Seems like a good idea : we implemented hashed shard key on client-side : operation rate sky rocked ( x3 and less variability). Balancing is moreover quicker and done during our very heavy insertion process : perfect !”- Grégoire Seux
Monday, April 2, 2012
![Page 19: Challenges with MongoDB](https://reader033.vdocuments.net/reader033/viewer/2022052820/54b4ec6a4a79598f728b4594/html5/thumbnails/19.jpg)
Possible Solutions/Workarounds #3
Plain-old Application Level Sharding
https://github.com/twitter/gizzard/raw/master/doc/forwarding_table.png
Monday, April 2, 2012
![Page 20: Challenges with MongoDB](https://reader033.vdocuments.net/reader033/viewer/2022052820/54b4ec6a4a79598f728b4594/html5/thumbnails/20.jpg)
3. Schema-less is Over Rated
http://images.sodahead.com/polls/001635729/1863780_overrated_answer_2_xlarge.jpeg
Monday, April 2, 2012
![Page 21: Challenges with MongoDB](https://reader033.vdocuments.net/reader033/viewer/2022052820/54b4ec6a4a79598f728b4594/html5/thumbnails/21.jpg)
Schema-less is Over Rated
Schema-Free (schema-less) is not free. It means repeat the schema in every docs (records) !
Monday, April 2, 2012
![Page 22: Challenges with MongoDB](https://reader033.vdocuments.net/reader033/viewer/2022052820/54b4ec6a4a79598f728b4594/html5/thumbnails/22.jpg)
Possible Solutions/Workarounds #1Use Short Key Names
ref : http://christophermaier.name/blog/2011/05/22/MongoDB-key-names
{"sequence":"AHAHSPGPGSAVKLPAPHSVGKSALR", "location":{ "chromosome":"19", "strand":"-", "begin":"51067007", "end":"51067085" }}
{"s":"AHAHSPGPGSAVKLPAPHSVGKSALR", "l":{ "c":"19", "s":"-", "b":"51067007", "e":"51067085" }}
1.6 billion documents
243 GB
183 GB
60 GB saved!
Monday, April 2, 2012
![Page 23: Challenges with MongoDB](https://reader033.vdocuments.net/reader033/viewer/2022052820/54b4ec6a4a79598f728b4594/html5/thumbnails/23.jpg)
Possible Solutions/Workarounds #2
SERVER-863 : Tokenize the field names https://jira.mongodb.org/browse/SERVER-863 planned but not scheduled Vote (66)
“Most collections, even if they don’t contain the same structure , they contain similar. So it would make a lot of sense and save a lot of space to tokenize the field names.”
“The overall benefit as mentioned by other users is that you reduce the amount of storage/RAM taken up by redundant data in each document (so you can use less resources per request, hence gain more throughput and capacity), while importantly also freeing the developer from having to pick short and hard to read field names as a workaround for a technical limitation.”
- Andrew Armstrong
Monday, April 2, 2012
![Page 24: Challenges with MongoDB](https://reader033.vdocuments.net/reader033/viewer/2022052820/54b4ec6a4a79598f728b4594/html5/thumbnails/24.jpg)
Possible Solutions/Workarounds #3
SERVER-164 : Option to store data compressed https://jira.mongodb.org/browse/SERVER-164 planned but not scheduled Vote (126)
“The way oracle handles this is transparent to the database server at the block engine level. They compress the blocks similar to how SAN store's handle it rather than at a record level. They use zlib type compression and the overhead is less than 5 percent. Due to the IO access reduction in both number of blocks touched, and amount of data transferred, the overall effect is a cumulative speed increase.
Should MongoDB do it this way? Maybe? But at the end of the day, the architecture must make Mongo more scalable, as well as increase the ability limit the storage footprint.”
- Michael D. Joy
Monday, April 2, 2012
![Page 25: Challenges with MongoDB](https://reader033.vdocuments.net/reader033/viewer/2022052820/54b4ec6a4a79598f728b4594/html5/thumbnails/25.jpg)
4. Community Contribution is Quite Low
http://www.thompsoncrg.com/wp-content/themes/zoomtechnic/images/slide/img3.jpg
Monday, April 2, 2012
![Page 26: Challenges with MongoDB](https://reader033.vdocuments.net/reader033/viewer/2022052820/54b4ec6a4a79598f728b4594/html5/thumbnails/26.jpg)
Community Contribution is Quite Low
https://github.com/mongodb/mongo/graphs/impacthttps://github.com/mongodb/mongo/contributors
Monday, April 2, 2012
![Page 27: Challenges with MongoDB](https://reader033.vdocuments.net/reader033/viewer/2022052820/54b4ec6a4a79598f728b4594/html5/thumbnails/27.jpg)
5. Attitude Matters
Monday, April 2, 2012
![Page 28: Challenges with MongoDB](https://reader033.vdocuments.net/reader033/viewer/2022052820/54b4ec6a4a79598f728b4594/html5/thumbnails/28.jpg)
5. Attitude Matters
MongoDB already has the sweetest API in the NoSQL world.
Wish more effort invested in fixing the Hard Problems : locking, sharding, storage engine...
http://www.mongodb.org/display/DOCS/SQL+to+Mongo+Mapping+Chart
Monday, April 2, 2012
![Page 29: Challenges with MongoDB](https://reader033.vdocuments.net/reader033/viewer/2022052820/54b4ec6a4a79598f728b4594/html5/thumbnails/29.jpg)
We are hiring
• Backend Engineer (MongoDB, Hadoop, HBase, Storm, Scala, Java, Ruby, Clojure)
• Data Mining Engineer
• DevOps Engineer
• Front End Engineer
We are doing bigdata analytics
[email protected], April 2, 2012
![Page 31: Challenges with MongoDB](https://reader033.vdocuments.net/reader033/viewer/2022052820/54b4ec6a4a79598f728b4594/html5/thumbnails/31.jpg)
Thanks
Q & A
Monday, April 2, 2012