blog serverdensity com mongodb vs cassandra

Upload: savio77

Post on 14-Apr-2018

229 views

Category:

Documents


0 download

TRANSCRIPT

  • 7/30/2019 Blog Serverdensity Com Mongodb vs Cassandra

    1/12

    Pssst

    Search...

    Server Density v2 is coming soon!

    W r i t t e n b y D a v i d

    Over the 2 years weve been using in production with our server monitoring to ol, , weve built up significant

    and about how it works. Back in 2009 when I was loo king at a replacement f or MySQLbut dismissed it because MongoDB had several advantages, and Cassandra was st ill extremely early stage (even more so than

    MongoDB at the time). Having been invited to give a comparison at the , I thought Id revisit it to see how it

    compares to day.

    Disclaimer: Its important to note t hat much of what I know about MongoDB has been learnt through using it in production. We dont

    use Cassandra so any comparisons are going to be fairly superficial but t hey will still be relevant because thats t he stage most

    MongoDB vs Cassandra

    MongoDB Server Density

    experience knowledge I looked at Cassandra

    Cassandra London Meetup

    PDFmyURL.com

    http://www.serverdensity.com/http://blog.serverdensity.com/http://blog.serverdensity.com/http://blog.serverdensity.com/http://pdfmyurl.com/?otsrc=watermark&otclc=0.01http://pdfmyurl.com/?otsrc=watermark&otclc=0.01http://www.meetup.com/Cassandra-London/events/16972731/http://blog.boxedice.com/2009/07/25/choosing-a-non-relational-database-why-we-migrated-from-mysql-to-mongodb/http://blog.boxedice.com/mongodb-monitoring/http://blog.boxedice.com/mongodb/http://www.serverdensity.com/http://www.mongodb.org/http://cassandra.apache.org/http://www.serverdensity.com/comingsoonhttp://blog.serverdensity.com/
  • 7/30/2019 Blog Serverdensity Com Mongodb vs Cassandra

    2/12

    people will be in when they are considering which database to pick. As a result o f this I will try t o avoid making technical comparisons

    about specific f eatures because this will be biased towards my extensive understanding on MongoDB vs a limited understanding of

    Cassandra.

    As such, this compariso n is split into 2 t ypes o f dif f erence us age and operations.

    Usage: The actual usage as a developer implementing the application with the database.

    Operations: Points which are not directly about the core database but its suitability f or production and management on an

    operational level.

    That said, I will start with several technical comparisons because these are important to understand.

    Usage Structure

    MongoDB acts much like a relational database. Its data model consists o f a database at t he top level, then collections which are like

    tables in MySQL (f or example) and then documents which are contained within the collectio n, like rows in MySQL. Each document hasa f ield and a value where this is s imilar to co lumns and values in MySQL. Fields can be s imple key / value e.g. { 'name': 'David

    Mytton' } but they can also cont ain other documents e.g. { 'name': { 'first' : David, 'last' : 'Mytton' } }.

    In Cassandra documents are known as columns which are really just a single key and value. e.g. { 'key': 'name', 'value':

    PDFmyURL.com

    http://pdfmyurl.com/?otsrc=watermark&otclc=0.01http://pdfmyurl.com/?otsrc=watermark&otclc=0.01http://www.slideshare.net/jericevans/cassandra-explained
  • 7/30/2019 Blog Serverdensity Com Mongodb vs Cassandra

    3/12

    'David Mytton' }. Theres also a timestamp f ield which is f or internal replication and cons istency. The value can be a single value

    but can also co ntain another co lumn. These columns then exist within column families which order data based on a specif ic value in

    the columns, ref erenced by a key. At t he to p level there is a keyspace, which is similar to t he MongoDB database.

    A goo d set of data model diagrams f or Cassandra can be .

    Usage Indexes

    work very similar to relational databases. You create single or compound indexes on t he collection level and every

    document inserted into that collection has those f ields indexed. Querying by index is extremely fas t so long as yo u have all your

    indexes in memory.

    Prior to Cass andra 0.7 it was essentially a key/value sto re so if you want to query by the contents of a key (i.e the value) then you

    need to create a separate column which references the ot her columns i.e. you create your o wn indexes.

    which allowed seco ndary indexes on column values, but only thro ugh the column families mechanism.

    Cassandra requires a lot more meta data f or indexes and requires secondary indexes if you want to do range queries. E.g. if we

    def ine a new column family with 1 index:

    then we cannot do range queries:

    We must create a secondary index:

    f ound here

    MongoDB indexes

    This changed in Cassandra

    0.7

    123456789

    $ bin/cassandra-cli --host localhostConnected to: "Test Cluster" on localhost/9160Welcome to cassandra CLI.Type 'help;' or '?' for help. Type 'quit;' or 'exit;' to quit.[default@unknown] create keyspace demo;[default@unknown] use demo;[default@demo] create column family users with comparator=UTF8Type... and column_metadata=[{column_name: full_name, validation_class: UTF8Type},... {column_name: birth_date, validation_class: LongType, index_type: KEYS}];

    12

    [default@demo] get users where state = 'UT' and birth_date > 1970;No indexed columns present in index clause with operator EQ

    1 update column family users with comparator=UTF8Type

    PDFmyURL.com

    http://pdfmyurl.com/?otsrc=watermark&otclc=0.01http://pdfmyurl.com/?otsrc=watermark&otclc=0.01http://www.datastax.com/dev/blog/whats-new-cassandra-07-secondary-indexeshttp://www.mongodb.org/display/DOCS/Indexeshttp://www.javageneration.com/?p=70
  • 7/30/2019 Blog Serverdensity Com Mongodb vs Cassandra

    4/12

    Then Cassandra can use the state as the primary and filter based on the birth_date:

    (Code samples taken f rom ).

    Usage Deployment

    MongoDB is written in C++ and f or Linux, OS X, Windows and several other platf orms. Its extremely easy to

    install download, extract and .

    Cassandra is written in Java and has the overhead that brings, but also t he easy ability to integrate into existing Java projects. It

    but there is a demonst ration of , which youd

    struggle to beat with MongoDB.

    I know plenty of people running MongoDB on Windows but would be interested to hear if t hats t he same with Cassandra (I suspect

    its more Linux).

    Operations/Usage Consistency/Replication

    234

    ... and column_metadata=[{column_name: full_name, validation_class: UTF8Type},

    ... {column_name: birth_date, validation_class: LongType, index_type: KEYS},

    ... {column_name: state, validation_class: UTF8Type, index_type: KEYS}];

    1 get users where state = 'UT' and birth_date > 1970;

    this blog post

    provided in binary fo rm

    run mongod

    takes a little longer to get started setting up a 4 node cluster in less than 2 minutes

    PDFmyURL.com

    http://pdfmyurl.com/?otsrc=watermark&otclc=0.01http://pdfmyurl.com/?otsrc=watermark&otclc=0.01http://www.mongodb.org/display/DOCS/Replica+Sets+-+Basicshttp://www.screenr.com/5G6http://wiki.apache.org/cassandra/GettingStartedhttp://www.mongodb.org/display/DOCS/Starting+and+Stopping+Mongohttp://www.mongodb.org/downloadshttp://www.datastax.com/dev/blog/whats-new-cassandra-07-secondary-indexes
  • 7/30/2019 Blog Serverdensity Com Mongodb vs Cassandra

    5/12

    In MongoDB replication is achieved through . This is an enhanced master/slave model where you have a set of nodes

    where one is the master. Data is replicated to all nodes so that if the master f ails, another member will take over. There are

    conf iguration opt ions to determine which nodes have priority and you can set o ptions like sync delay to have nodes lag behind (fo r

    disaster recovery, f or example).

    Writes in MongoDB are unsaf e by default; data isnt written right away by default so its po ss ible that a write operat ion could return

    success but be lost if the server f ails bef ore the data is f lushed to disk. This is how Mongo attains high performance. If you need

    increased durability t hen you can specify a safe write which will guarantee the data is written to disk bef ore returning. Further, you can

    require that t he data also be successf ully written to n replication slaves.

    MongoDB drivers also support t he ability to read from slaves. This can be done on a connection, database, collection o r even query

    level and the drivers handle sending the right queries to the right slaves, but t here is no guarantee of consist ency (unless you are

    using the option to write to all slaves before returning). In contras t Cassandra queries go to every node and the most up to date

    column is returned (based on the t imestamp value).

    Cassandra has much more advanced support f or replication by being . The server can be set to use a

    specific consist ency level to ensure that queries are replicated locally, or . This means you can let Cassandra

    handle redundancy across nodes where it is aware of which rack and data centre thos e nodes are o n. Cassandra can also monito r

    nodes and route queries away f rom slow responding nodes.

    replica sets

    aware of the network topology

    to remote data centres

    PDFmyURL.com

    http://pdfmyurl.com/?otsrc=watermark&otclc=0.01http://pdfmyurl.com/?otsrc=watermark&otclc=0.01http://www.datastax.com/docs/0.8/operations/datacenterhttp://www.datastax.com/docs/0.8/operations/clustering#snitcheshttp://www.slideshare.net/benjaminblack/introduction-to-cassandra-replication-and-consistencyhttp://www.mongodb.org/display/DOCS/Replica+Sets
  • 7/30/2019 Blog Serverdensity Com Mongodb vs Cassandra

    6/12

    The only disadvantage with Cassandra is that t hese set tings are done on a node level with conf iguration f iles whereas MongoDB

    allows very granular ad-hoc cont rol down the query level through driver opt ions which can be called in code at run time.

    Operations Whos behind it?

    Both Cassandra (Apache 2.0 license) and MongoDB (AGPL) are open source. You can f reely download the co de, write patches and

    submit them upstream. However, Cassandra is purely an open source project whereas MongoDB is owned by a commercial company,

    . The original authors o f MongoDB are core contributo rs to the code and work for 10gen (indeed, 10gen was f ounded

    specifically to support MongoDB and the are the original creators ).

    In cont rast , Cassandra was created by 2 engineers f rom Facebook and is incubated by the Apache Foundation. This is not a

    disadvantage (indeed, the Apache Web server used by the majority o f websites has similar roo ts and is part o f the Apache

    Foundation) but is important to understand when it comes to support , ongoing development and the community (below).

    Operations Support

    Although there are independent co nsult ants f or MongoDB, the best place to get support is f rom themselves because they

    wrote the database so they know it best. Theyre able to provide with phone and e-mail SLAs.

    In contras t, Cassandra has and whilst they do have committers to the core

    Cassandra code, Id argue its not the same as having access to the entire engineering team and original authors f rom a single

    contact point, as is the case with MongoDB.

    10gen

    CEO and CTO

    10gen

    support contracts

    several companies of f ering commercial support

    PDFmyURL.com

    http://pdfmyurl.com/?otsrc=watermark&otclc=0.01http://pdfmyurl.com/?otsrc=watermark&otclc=0.01http://wiki.apache.org/cassandra/ThirdPartySupporthttp://www.10gen.com/supporthttp://www.10gen.com/supporthttp://www.apache.org/http://www.10gen.com/teamhttp://www.10gen.com/http://www.10gen.com/
  • 7/30/2019 Blog Serverdensity Com Mongodb vs Cassandra

    7/12

    Operations Ongoing development

    Interacting directly with the company that contro ls the main project, especially f or support purposes, means you can have bug f ixesand changes implemented to t he code base. Weve had numerous f ixes committed as a result of problems discovered in our

    production usage of MongoDB. We pay 10gen fo r support now but even befo re we did they were very responsive to bugs. We also

    get votes f or f eatures and improvements.

    In theory this is t he same in Cassandra youd want bugs to be f ixed and features implemented but that doesnt have to happen

    because of the nature of open source projects run by volunteers (becomes more complex when companies are paying developers to

    work on the project e.g. ).

    Of course t here is a risk t hat the company behind the project disappears and all the engineers move on somewhere else but the

    project is st ill open source and this is the same with any piece of so f tware you might use.

    You could also argue there is more direction and f ocus f rom a commercial company working so lely on the product (and more

    engineers dedicated to it) but I dont want to go any further with this po int as this post isnt about o pen source vs commercial. This is

    just one po int t o be aware o f .

    Eric Evans f rom Rackspace working on Cassandra f ull time

    PDFmyURL.com

    http://pdfmyurl.com/?otsrc=watermark&otclc=0.01http://pdfmyurl.com/?otsrc=watermark&otclc=0.01http://video.disruptivecode.com/video/840645https://github.com/mongodb/mongo/commits/master
  • 7/30/2019 Blog Serverdensity Com Mongodb vs Cassandra

    8/12

    Operations Documentat ion

    The is poor. Researching for this I had to visit several websites and watch videos even to getexplanations for key concepts like indexes. There is but that is st ill lacking in explaining concepts

    in any depth.

    The MongoDB documentation was good when I f irst looked at it but is even better nowadays. Its actually kept up to date and covers

    all the f eatures, with examples. Nobody likes writing documentation and it shows with many open source projects; ano ther advantage

    of having a company behind the project, f orcing developers t o write t he docs! Incidentally, one of the biggest advantages of the PHP

    language is the extensive documentatio n, examples and user submitted notes.

    When youre using a completely new data store then documentation is important, and is one of the reasons why I chose MongoDB

    back in 2009.

    Operations Community

    of f icial Cassandra documentationbetter documentatio n f rom Datastax

    PDFmyURL.com

    http://pdfmyurl.com/?otsrc=watermark&otclc=0.01http://pdfmyurl.com/?otsrc=watermark&otclc=0.01http://www.datastax.com/docs/0.8/indexhttp://wiki.apache.org/cassandra/http://wiki.apache.org/cassandra/
  • 7/30/2019 Blog Serverdensity Com Mongodb vs Cassandra

    9/12

    MongoDB has to be around a product. There have been almost

    in the last year, , and . You know youre well known when a phrase like web

    scale is associated with your product ( ). Again, this is because there is a company behind the product actively promoting

    it and encouraging and managing these events.

    Cassandra has had in that time, and whilst there are user groups (I presented this talk at the London one) its certainly

    not on the same scale as MongoDB.

    Does that matter? None of that existed when we chose MongoDB so we learnt everything ourselves. But for new users today, theres

    a huge forum of people who are using MongoDB and are sharing their knowledge freely and easily accessible.

    Operations/Usage Drivers

    a case study in how to build a community 40 MongoDB conf erences

    a very active mailing list user groups around the world

    as a parody

    1 conference

    PDFmyURL.com

    http://pdfmyurl.com/?otsrc=watermark&otclc=0.01http://pdfmyurl.com/?otsrc=watermark&otclc=0.01http://www.mongodb.org/display/DOCS/Drivershttp://cassandrasummit2010.eventbrite.com/http://www.xtranormal.com/watch/6995033/mongo-db-is-web-scalehttp://www.10gen.com/usergroupshttp://groups.google.com/group/mongodb-userhttp://www.10gen.com/eventshttp://noshpetigara.com/post/4750801615/open-source-advantages-lessons-from-10gen-and-mongodbhttp://www.10gen.com/events
  • 7/30/2019 Blog Serverdensity Com Mongodb vs Cassandra

    10/12

    The o ther main reason I chose MongoDB was t he driver support . All the key drivers f or MongoDB were available and most importantly,

    maintained by 10gen themselves. MongoDB has for C, C#, C++, Erlang, Javascript , Java, Perl, PHP, Python, Ruby and

    Scala. All fully suppo rted.

    The Python and PHP drivers were most important t o us but we also use the C# driver in our Windows monitoring agent and to have

    these well maintained just like the core server makes a mass ive dif f erence.

    Cassandra only has with . Ive f ound that Python is usually well

    catered for when it co mes t o libraries that work well. PHP is another story and weve had issues with RabbitMQ and ZeroMQ in the

    past (specif ically not working well under heavy load; they all work f ine for playing around). Good PHP libraries are hard t o co me by.

    Conclusion

    There is no conclusion. This post isnt about which is best, it s about co mparing the two. Both have advantages and disadvantages

    and to t ruly compare you need to run them both in production under significant load f or a long period of time. MongoDB has worked

    well f or us and has proven itself at scale and to have f lexibility to do things like as well as be the main

    data sto re f or our .

    For me, the operational considerations play a major part in making a decision because these t ypes of databases are so new. I would

    suspect theyre also important to companies looking to adopt this technology. We dont need a support contract for Apache, for

    example, because its so well proven. Our support contract with 10gen has been well worth the money!

    Other references

    of f icial drivers

    off icial Java and Python drivers a few others written by 3rd parties

    building a queueing system

    server monitoring service

    Mongodb vs. Cassandra on Stackoverflow

    Cassandra vs MongoDB vs CouchDB vs Redis vs Riak vs HBase vs Membase vs Neo4j comparison

    PDFmyURL.com

    http://pdfmyurl.com/?otsrc=watermark&otclc=0.01http://pdfmyurl.com/?otsrc=watermark&otclc=0.01http://kkovacs.eu/cassandra-vs-mongodb-vs-couchdb-vs-redishttp://stackoverflow.com/questions/2892729/mongodb-vs-cassandrahttp://www.serverdensity.com/http://blog.boxedice.com/2011/04/13/queueing-mongodb-using-mongodb/http://wiki.apache.org/cassandra/ClientOptionshttp://www.apache.org/dist/cassandra/drivers/http://www.mongodb.org/display/DOCS/Drivers
  • 7/30/2019 Blog Serverdensity Com Mongodb vs Cassandra

    11/12

    A little more about David Mytton

    is the f ounder of . He has been programming in PHP and Python for over 10 years, regularly

    speaks about (including running the ), co- founded the andcan of ten be found cycling in London or drinking tea in Japan. Follow him on and .

    Enjoy this post? You may also like

    cassandra

    MongoDB

    nosql

    MongoDB Benchmarks

    5Share Tweet 3

    David Mytton Server Density

    MongoDB London MongoDB User Group Open Rights GroupTwitter Google+

    Subscribe by e mail

    Blog content delivered straight to your inbox.

    MongoDB schema design pitfalls

    Using Celery for queuing requests

    How we handle on call s chedules

    Does everyone hate MongoDB?

    Favourit e s Curre ntTo p

    PDFmyURL.com

    http://pdfmyurl.com/?otsrc=watermark&otclc=0.01http://pdfmyurl.com/?otsrc=watermark&otclc=0.01http://blog.serverdensity.com/does-everyone-hate-mongodb/http://blog.serverdensity.com/how-we-handle-on-call-schedules/https://blog.serverdensity.com/using-celery-for-queuing-requests/http://blog.serverdensity.com/mongodb-schema-design-pitfalls/http://blog.serverdensity.com/mongodb-vs-cassandra/#SDcurrenthttp://blog.serverdensity.com/mongodb-vs-cassandra/#SDfavouriteshttp://blog.serverdensity.com/mongodb-vs-cassandra/#SDtophttps://plus.google.com/115123022862835092306?rel=authorhttp://twitter.com/davidmyttonhttp://www.openrightsgroup.org/http://www.meetup.com/London-MongoDB-User-Group/https://blog.serverdensity.com/mongodb/http://www.serverdensity.com/https://blog.serverdensity.com/about/http://blog.serverdensity.com/mongodb-benchmarks/http://blog.serverdensity.com/tag/nosql/http://blog.serverdensity.com/tag/mongodb/http://blog.serverdensity.com/tag/cassandra-2/
  • 7/30/2019 Blog Serverdensity Com Mongodb vs Cassandra

    12/12

    Email Address

    Subscribe

    |About Us S erver and Websi te Moni toring

    PDFmyURL.com

    http://pdfmyurl.com/?otsrc=watermark&otclc=0.01http://pdfmyurl.com/?otsrc=watermark&otclc=0.01http://www.serverdensity.com/http://blog.serverdensity.com/about/