blog serverdensity com mongodb vs cassandra
TRANSCRIPT
-
7/30/2019 Blog Serverdensity Com Mongodb vs Cassandra
1/12
Pssst
Search...
Server Density v2 is coming soon!
W r i t t e n b y D a v i d
Over the 2 years weve been using in production with our server monitoring to ol, , weve built up significant
and about how it works. Back in 2009 when I was loo king at a replacement f or MySQLbut dismissed it because MongoDB had several advantages, and Cassandra was st ill extremely early stage (even more so than
MongoDB at the time). Having been invited to give a comparison at the , I thought Id revisit it to see how it
compares to day.
Disclaimer: Its important to note t hat much of what I know about MongoDB has been learnt through using it in production. We dont
use Cassandra so any comparisons are going to be fairly superficial but t hey will still be relevant because thats t he stage most
MongoDB vs Cassandra
MongoDB Server Density
experience knowledge I looked at Cassandra
Cassandra London Meetup
PDFmyURL.com
http://www.serverdensity.com/http://blog.serverdensity.com/http://blog.serverdensity.com/http://blog.serverdensity.com/http://pdfmyurl.com/?otsrc=watermark&otclc=0.01http://pdfmyurl.com/?otsrc=watermark&otclc=0.01http://www.meetup.com/Cassandra-London/events/16972731/http://blog.boxedice.com/2009/07/25/choosing-a-non-relational-database-why-we-migrated-from-mysql-to-mongodb/http://blog.boxedice.com/mongodb-monitoring/http://blog.boxedice.com/mongodb/http://www.serverdensity.com/http://www.mongodb.org/http://cassandra.apache.org/http://www.serverdensity.com/comingsoonhttp://blog.serverdensity.com/ -
7/30/2019 Blog Serverdensity Com Mongodb vs Cassandra
2/12
people will be in when they are considering which database to pick. As a result o f this I will try t o avoid making technical comparisons
about specific f eatures because this will be biased towards my extensive understanding on MongoDB vs a limited understanding of
Cassandra.
As such, this compariso n is split into 2 t ypes o f dif f erence us age and operations.
Usage: The actual usage as a developer implementing the application with the database.
Operations: Points which are not directly about the core database but its suitability f or production and management on an
operational level.
That said, I will start with several technical comparisons because these are important to understand.
Usage Structure
MongoDB acts much like a relational database. Its data model consists o f a database at t he top level, then collections which are like
tables in MySQL (f or example) and then documents which are contained within the collectio n, like rows in MySQL. Each document hasa f ield and a value where this is s imilar to co lumns and values in MySQL. Fields can be s imple key / value e.g. { 'name': 'David
Mytton' } but they can also cont ain other documents e.g. { 'name': { 'first' : David, 'last' : 'Mytton' } }.
In Cassandra documents are known as columns which are really just a single key and value. e.g. { 'key': 'name', 'value':
PDFmyURL.com
http://pdfmyurl.com/?otsrc=watermark&otclc=0.01http://pdfmyurl.com/?otsrc=watermark&otclc=0.01http://www.slideshare.net/jericevans/cassandra-explained -
7/30/2019 Blog Serverdensity Com Mongodb vs Cassandra
3/12
'David Mytton' }. Theres also a timestamp f ield which is f or internal replication and cons istency. The value can be a single value
but can also co ntain another co lumn. These columns then exist within column families which order data based on a specif ic value in
the columns, ref erenced by a key. At t he to p level there is a keyspace, which is similar to t he MongoDB database.
A goo d set of data model diagrams f or Cassandra can be .
Usage Indexes
work very similar to relational databases. You create single or compound indexes on t he collection level and every
document inserted into that collection has those f ields indexed. Querying by index is extremely fas t so long as yo u have all your
indexes in memory.
Prior to Cass andra 0.7 it was essentially a key/value sto re so if you want to query by the contents of a key (i.e the value) then you
need to create a separate column which references the ot her columns i.e. you create your o wn indexes.
which allowed seco ndary indexes on column values, but only thro ugh the column families mechanism.
Cassandra requires a lot more meta data f or indexes and requires secondary indexes if you want to do range queries. E.g. if we
def ine a new column family with 1 index:
then we cannot do range queries:
We must create a secondary index:
f ound here
MongoDB indexes
This changed in Cassandra
0.7
123456789
$ bin/cassandra-cli --host localhostConnected to: "Test Cluster" on localhost/9160Welcome to cassandra CLI.Type 'help;' or '?' for help. Type 'quit;' or 'exit;' to quit.[default@unknown] create keyspace demo;[default@unknown] use demo;[default@demo] create column family users with comparator=UTF8Type... and column_metadata=[{column_name: full_name, validation_class: UTF8Type},... {column_name: birth_date, validation_class: LongType, index_type: KEYS}];
12
[default@demo] get users where state = 'UT' and birth_date > 1970;No indexed columns present in index clause with operator EQ
1 update column family users with comparator=UTF8Type
PDFmyURL.com
http://pdfmyurl.com/?otsrc=watermark&otclc=0.01http://pdfmyurl.com/?otsrc=watermark&otclc=0.01http://www.datastax.com/dev/blog/whats-new-cassandra-07-secondary-indexeshttp://www.mongodb.org/display/DOCS/Indexeshttp://www.javageneration.com/?p=70 -
7/30/2019 Blog Serverdensity Com Mongodb vs Cassandra
4/12
Then Cassandra can use the state as the primary and filter based on the birth_date:
(Code samples taken f rom ).
Usage Deployment
MongoDB is written in C++ and f or Linux, OS X, Windows and several other platf orms. Its extremely easy to
install download, extract and .
Cassandra is written in Java and has the overhead that brings, but also t he easy ability to integrate into existing Java projects. It
but there is a demonst ration of , which youd
struggle to beat with MongoDB.
I know plenty of people running MongoDB on Windows but would be interested to hear if t hats t he same with Cassandra (I suspect
its more Linux).
Operations/Usage Consistency/Replication
234
... and column_metadata=[{column_name: full_name, validation_class: UTF8Type},
... {column_name: birth_date, validation_class: LongType, index_type: KEYS},
... {column_name: state, validation_class: UTF8Type, index_type: KEYS}];
1 get users where state = 'UT' and birth_date > 1970;
this blog post
provided in binary fo rm
run mongod
takes a little longer to get started setting up a 4 node cluster in less than 2 minutes
PDFmyURL.com
http://pdfmyurl.com/?otsrc=watermark&otclc=0.01http://pdfmyurl.com/?otsrc=watermark&otclc=0.01http://www.mongodb.org/display/DOCS/Replica+Sets+-+Basicshttp://www.screenr.com/5G6http://wiki.apache.org/cassandra/GettingStartedhttp://www.mongodb.org/display/DOCS/Starting+and+Stopping+Mongohttp://www.mongodb.org/downloadshttp://www.datastax.com/dev/blog/whats-new-cassandra-07-secondary-indexes -
7/30/2019 Blog Serverdensity Com Mongodb vs Cassandra
5/12
In MongoDB replication is achieved through . This is an enhanced master/slave model where you have a set of nodes
where one is the master. Data is replicated to all nodes so that if the master f ails, another member will take over. There are
conf iguration opt ions to determine which nodes have priority and you can set o ptions like sync delay to have nodes lag behind (fo r
disaster recovery, f or example).
Writes in MongoDB are unsaf e by default; data isnt written right away by default so its po ss ible that a write operat ion could return
success but be lost if the server f ails bef ore the data is f lushed to disk. This is how Mongo attains high performance. If you need
increased durability t hen you can specify a safe write which will guarantee the data is written to disk bef ore returning. Further, you can
require that t he data also be successf ully written to n replication slaves.
MongoDB drivers also support t he ability to read from slaves. This can be done on a connection, database, collection o r even query
level and the drivers handle sending the right queries to the right slaves, but t here is no guarantee of consist ency (unless you are
using the option to write to all slaves before returning). In contras t Cassandra queries go to every node and the most up to date
column is returned (based on the t imestamp value).
Cassandra has much more advanced support f or replication by being . The server can be set to use a
specific consist ency level to ensure that queries are replicated locally, or . This means you can let Cassandra
handle redundancy across nodes where it is aware of which rack and data centre thos e nodes are o n. Cassandra can also monito r
nodes and route queries away f rom slow responding nodes.
replica sets
aware of the network topology
to remote data centres
PDFmyURL.com
http://pdfmyurl.com/?otsrc=watermark&otclc=0.01http://pdfmyurl.com/?otsrc=watermark&otclc=0.01http://www.datastax.com/docs/0.8/operations/datacenterhttp://www.datastax.com/docs/0.8/operations/clustering#snitcheshttp://www.slideshare.net/benjaminblack/introduction-to-cassandra-replication-and-consistencyhttp://www.mongodb.org/display/DOCS/Replica+Sets -
7/30/2019 Blog Serverdensity Com Mongodb vs Cassandra
6/12
The only disadvantage with Cassandra is that t hese set tings are done on a node level with conf iguration f iles whereas MongoDB
allows very granular ad-hoc cont rol down the query level through driver opt ions which can be called in code at run time.
Operations Whos behind it?
Both Cassandra (Apache 2.0 license) and MongoDB (AGPL) are open source. You can f reely download the co de, write patches and
submit them upstream. However, Cassandra is purely an open source project whereas MongoDB is owned by a commercial company,
. The original authors o f MongoDB are core contributo rs to the code and work for 10gen (indeed, 10gen was f ounded
specifically to support MongoDB and the are the original creators ).
In cont rast , Cassandra was created by 2 engineers f rom Facebook and is incubated by the Apache Foundation. This is not a
disadvantage (indeed, the Apache Web server used by the majority o f websites has similar roo ts and is part o f the Apache
Foundation) but is important to understand when it comes to support , ongoing development and the community (below).
Operations Support
Although there are independent co nsult ants f or MongoDB, the best place to get support is f rom themselves because they
wrote the database so they know it best. Theyre able to provide with phone and e-mail SLAs.
In contras t, Cassandra has and whilst they do have committers to the core
Cassandra code, Id argue its not the same as having access to the entire engineering team and original authors f rom a single
contact point, as is the case with MongoDB.
10gen
CEO and CTO
10gen
support contracts
several companies of f ering commercial support
PDFmyURL.com
http://pdfmyurl.com/?otsrc=watermark&otclc=0.01http://pdfmyurl.com/?otsrc=watermark&otclc=0.01http://wiki.apache.org/cassandra/ThirdPartySupporthttp://www.10gen.com/supporthttp://www.10gen.com/supporthttp://www.apache.org/http://www.10gen.com/teamhttp://www.10gen.com/http://www.10gen.com/ -
7/30/2019 Blog Serverdensity Com Mongodb vs Cassandra
7/12
Operations Ongoing development
Interacting directly with the company that contro ls the main project, especially f or support purposes, means you can have bug f ixesand changes implemented to t he code base. Weve had numerous f ixes committed as a result of problems discovered in our
production usage of MongoDB. We pay 10gen fo r support now but even befo re we did they were very responsive to bugs. We also
get votes f or f eatures and improvements.
In theory this is t he same in Cassandra youd want bugs to be f ixed and features implemented but that doesnt have to happen
because of the nature of open source projects run by volunteers (becomes more complex when companies are paying developers to
work on the project e.g. ).
Of course t here is a risk t hat the company behind the project disappears and all the engineers move on somewhere else but the
project is st ill open source and this is the same with any piece of so f tware you might use.
You could also argue there is more direction and f ocus f rom a commercial company working so lely on the product (and more
engineers dedicated to it) but I dont want to go any further with this po int as this post isnt about o pen source vs commercial. This is
just one po int t o be aware o f .
Eric Evans f rom Rackspace working on Cassandra f ull time
PDFmyURL.com
http://pdfmyurl.com/?otsrc=watermark&otclc=0.01http://pdfmyurl.com/?otsrc=watermark&otclc=0.01http://video.disruptivecode.com/video/840645https://github.com/mongodb/mongo/commits/master -
7/30/2019 Blog Serverdensity Com Mongodb vs Cassandra
8/12
Operations Documentat ion
The is poor. Researching for this I had to visit several websites and watch videos even to getexplanations for key concepts like indexes. There is but that is st ill lacking in explaining concepts
in any depth.
The MongoDB documentation was good when I f irst looked at it but is even better nowadays. Its actually kept up to date and covers
all the f eatures, with examples. Nobody likes writing documentation and it shows with many open source projects; ano ther advantage
of having a company behind the project, f orcing developers t o write t he docs! Incidentally, one of the biggest advantages of the PHP
language is the extensive documentatio n, examples and user submitted notes.
When youre using a completely new data store then documentation is important, and is one of the reasons why I chose MongoDB
back in 2009.
Operations Community
of f icial Cassandra documentationbetter documentatio n f rom Datastax
PDFmyURL.com
http://pdfmyurl.com/?otsrc=watermark&otclc=0.01http://pdfmyurl.com/?otsrc=watermark&otclc=0.01http://www.datastax.com/docs/0.8/indexhttp://wiki.apache.org/cassandra/http://wiki.apache.org/cassandra/ -
7/30/2019 Blog Serverdensity Com Mongodb vs Cassandra
9/12
MongoDB has to be around a product. There have been almost
in the last year, , and . You know youre well known when a phrase like web
scale is associated with your product ( ). Again, this is because there is a company behind the product actively promoting
it and encouraging and managing these events.
Cassandra has had in that time, and whilst there are user groups (I presented this talk at the London one) its certainly
not on the same scale as MongoDB.
Does that matter? None of that existed when we chose MongoDB so we learnt everything ourselves. But for new users today, theres
a huge forum of people who are using MongoDB and are sharing their knowledge freely and easily accessible.
Operations/Usage Drivers
a case study in how to build a community 40 MongoDB conf erences
a very active mailing list user groups around the world
as a parody
1 conference
PDFmyURL.com
http://pdfmyurl.com/?otsrc=watermark&otclc=0.01http://pdfmyurl.com/?otsrc=watermark&otclc=0.01http://www.mongodb.org/display/DOCS/Drivershttp://cassandrasummit2010.eventbrite.com/http://www.xtranormal.com/watch/6995033/mongo-db-is-web-scalehttp://www.10gen.com/usergroupshttp://groups.google.com/group/mongodb-userhttp://www.10gen.com/eventshttp://noshpetigara.com/post/4750801615/open-source-advantages-lessons-from-10gen-and-mongodbhttp://www.10gen.com/events -
7/30/2019 Blog Serverdensity Com Mongodb vs Cassandra
10/12
The o ther main reason I chose MongoDB was t he driver support . All the key drivers f or MongoDB were available and most importantly,
maintained by 10gen themselves. MongoDB has for C, C#, C++, Erlang, Javascript , Java, Perl, PHP, Python, Ruby and
Scala. All fully suppo rted.
The Python and PHP drivers were most important t o us but we also use the C# driver in our Windows monitoring agent and to have
these well maintained just like the core server makes a mass ive dif f erence.
Cassandra only has with . Ive f ound that Python is usually well
catered for when it co mes t o libraries that work well. PHP is another story and weve had issues with RabbitMQ and ZeroMQ in the
past (specif ically not working well under heavy load; they all work f ine for playing around). Good PHP libraries are hard t o co me by.
Conclusion
There is no conclusion. This post isnt about which is best, it s about co mparing the two. Both have advantages and disadvantages
and to t ruly compare you need to run them both in production under significant load f or a long period of time. MongoDB has worked
well f or us and has proven itself at scale and to have f lexibility to do things like as well as be the main
data sto re f or our .
For me, the operational considerations play a major part in making a decision because these t ypes of databases are so new. I would
suspect theyre also important to companies looking to adopt this technology. We dont need a support contract for Apache, for
example, because its so well proven. Our support contract with 10gen has been well worth the money!
Other references
of f icial drivers
off icial Java and Python drivers a few others written by 3rd parties
building a queueing system
server monitoring service
Mongodb vs. Cassandra on Stackoverflow
Cassandra vs MongoDB vs CouchDB vs Redis vs Riak vs HBase vs Membase vs Neo4j comparison
PDFmyURL.com
http://pdfmyurl.com/?otsrc=watermark&otclc=0.01http://pdfmyurl.com/?otsrc=watermark&otclc=0.01http://kkovacs.eu/cassandra-vs-mongodb-vs-couchdb-vs-redishttp://stackoverflow.com/questions/2892729/mongodb-vs-cassandrahttp://www.serverdensity.com/http://blog.boxedice.com/2011/04/13/queueing-mongodb-using-mongodb/http://wiki.apache.org/cassandra/ClientOptionshttp://www.apache.org/dist/cassandra/drivers/http://www.mongodb.org/display/DOCS/Drivers -
7/30/2019 Blog Serverdensity Com Mongodb vs Cassandra
11/12
A little more about David Mytton
is the f ounder of . He has been programming in PHP and Python for over 10 years, regularly
speaks about (including running the ), co- founded the andcan of ten be found cycling in London or drinking tea in Japan. Follow him on and .
Enjoy this post? You may also like
cassandra
MongoDB
nosql
MongoDB Benchmarks
5Share Tweet 3
David Mytton Server Density
MongoDB London MongoDB User Group Open Rights GroupTwitter Google+
Subscribe by e mail
Blog content delivered straight to your inbox.
MongoDB schema design pitfalls
Using Celery for queuing requests
How we handle on call s chedules
Does everyone hate MongoDB?
Favourit e s Curre ntTo p
PDFmyURL.com
http://pdfmyurl.com/?otsrc=watermark&otclc=0.01http://pdfmyurl.com/?otsrc=watermark&otclc=0.01http://blog.serverdensity.com/does-everyone-hate-mongodb/http://blog.serverdensity.com/how-we-handle-on-call-schedules/https://blog.serverdensity.com/using-celery-for-queuing-requests/http://blog.serverdensity.com/mongodb-schema-design-pitfalls/http://blog.serverdensity.com/mongodb-vs-cassandra/#SDcurrenthttp://blog.serverdensity.com/mongodb-vs-cassandra/#SDfavouriteshttp://blog.serverdensity.com/mongodb-vs-cassandra/#SDtophttps://plus.google.com/115123022862835092306?rel=authorhttp://twitter.com/davidmyttonhttp://www.openrightsgroup.org/http://www.meetup.com/London-MongoDB-User-Group/https://blog.serverdensity.com/mongodb/http://www.serverdensity.com/https://blog.serverdensity.com/about/http://blog.serverdensity.com/mongodb-benchmarks/http://blog.serverdensity.com/tag/nosql/http://blog.serverdensity.com/tag/mongodb/http://blog.serverdensity.com/tag/cassandra-2/ -
7/30/2019 Blog Serverdensity Com Mongodb vs Cassandra
12/12
Email Address
Subscribe
|About Us S erver and Websi te Moni toring
PDFmyURL.com
http://pdfmyurl.com/?otsrc=watermark&otclc=0.01http://pdfmyurl.com/?otsrc=watermark&otclc=0.01http://www.serverdensity.com/http://blog.serverdensity.com/about/