NoSQLNow we know what it’s not... what is it?
What are we running from?
• Relational databases are the defacto standard for storing data in a web application.
• A lot of times, that data isn’t really relational at all.
• RDBMS’s have lots of rules that can impact performance.
Rules? What Rules?
• Classic relational databases follow the ACID rules:
• Atomicity
• Consistency
• Isolation
• Durability
Atomicity
• If any part of the update fails, it all fails.
• Databases have to be able to lock tables and rows for operations, which can block or delay other incoming requests.
Consistency
• After a transaction, all copies of the data must be consistent with each other (my interpretation).
• Replication across lots of shards is expensive especially if there’s locking involved.
Isolation
• Data involved in a transaction must be inaccessible to other operations.
• Remember the thing about locked rows and tables?
• It’s a bummer.
Durability
• Once a user is notified that a transaction has completed, the data must be accessible and all integrity constraints have been met.
I come not to bury MySQL...
• Relational databases are great for a lot of uses.
• If you have data that’s actually relational and you need transactions, joins and have a limited number of data types, then an RDBMS will work for you.
But...
• RDBMS’s have been treated like hammers and used for things they’re not good at and weren’t designed for.
• Like the web...
Thus were born...
• Key-Value Stores
• Wide-Column Stores
• Document Stores/Databases
• Graph Databases
All thrown together & clumsily dubbed...
NoSQL
Which, despite it’s negative sound, supposedly means:
“Not Only SQL”
Yeah, I don’t believe it either...
Key-ValueJust what it sounds like. You set a Key to a Value and
can then retrieve it.
Key-Value Benefits
• Simple
• High performance (usually) because there are no transactions or relations so it’s a simple bucket and lookup.
• Extremely flexible
• Commonly used as caches in front of slower resources (like MySQL - bazinga!)
Popular Players
• memcached - in memory only, extremely efficient hashing algorithm allows you to scale easily to hundreds of nodes.
• Redis - persistent, slightly more complex than memcached (has support for arrays) but still highly performant.
• Riak - The Rails Machine guys love it. Jesse?
My Uses
• memcached: Read-through cache for Rails with cache-money.
• redis: persistent cache for results from our algorithm, partitioned by version and instance.
Wide Column
• Family of databases modeled on either Google’s BigTable or Amazon’s Dynamo.
• Pick two out of three from the CAP theorem in order to get horizontal scalability.
• Data stored by column instead of by row.
CAP?
• Consistency: All clients always have the same view of the data.
• Availability: Each client can always read and write.
• Partition Tolerance: The system works well despite physical network partitions
Use cases
• Making sense out of large amounts of data where you know your query scenario ahead of time.
• Large = 100s of millions of records.
• Data-mining log files and other sources of similar data.
Big Players
• HBase
• Cassandra
• Hypertable
• Amazon’s SimpleDB
• Google’s BigTable (the granddaddy of all of them)
Graph Databases
• Store nodes, edges and properties
• Think of them as Things, Connections and Properties
• Good for storing properties and relationships.
• Honestly, I don’t fully understand them... anyone?
The Players
• Neo4j
• FlockDB
• HyperGraphDB
Document Stores
• Short on relationships, tall on rich data types.
• Big on eventual consistency and flexible schemas.
• Hybrid of traditional RDBMS and Key-Value stores.
Use Cases
• Content Management Systems
• Applications with rapid partial updates
• Anything you don’t need joins or transactions for that you would normally use a RDBMS for.
The Players
• CouchDB
• MongoDB
• Terrastore
MongoDB
• Support for rich data types: arrays, hashes, embedded documents, etc
• Support for adding and removing things from arrays and embedded documents (addToSet, for example).
• Map/Reduce support and strong indexes
• Regular expression support in queries
Design Considerations
• Embedded Documents - Use only if it the embedded document will always be selected with the parent.
• Indexes - MongoDB punishes you much earlier for missing indexes than MySQL.
• Document size - Currently, documents are limited to 4MB, which should be large enough, but if it’s not...
Real-World MongoDB
• We use MongoDB heavily at MIS.
• Statistics application and reporting
• Top-secret new application
• Web crawler and indexer
• CMS
Real-World ExampleLet’s do tags. Everything is taggable now, right?
The MySQL Way
Schema
And to get a “thing’s” tags?
SELECT `tags`.* FROM `tags`
INNER JOIN `taggings` ON `tags`.id = `taggings`.tag_id
WHERE ((`taggings`.taggable_id = 237)
AND (`taggings`.taggable_type = 'Song'))
Yuck!That’s a lot of pain for something so simple.
And I didn’t even show you finding things with tag “x”.Or how to set and unset tags on a “thing”.
Ouch.
The MongoDB WayUsing MongoMapper and Rails 3
class Post include MongoMapper::Document key :title, String key :body, String key :tags, Array ensure_index :tags end
Let’s Make This Easy... def add_tag(tag) tag = Post.clean_tag(tag) self.tags << tag self.add_to_set(:tags => tag) unless self.new_record? end def remove_tag(tag) tag = Post.clean_tag(tag) self.tags.delete(tag) self.pull(:tags => tag) unless self.new_record? end def self.clean_tag(str) str.strip.downcase.gsub(" ","-").gsub(/[^a-z0-9-]/,"") end def self.clean_tags(str) out = [] arr = str.split(",") arr.each do |t| out << self.clean_tag(t) end out end
Demo TimeSorry if you’re looking at this later, but it’s console time!
Why I Love MongoDB
• Document model fits how I build web apps.
• For most apps, I don’t need transactions.
• Eventual consistency is actually OK.
• Partial updates and arrays make things that are a pain in SQL-land absolutely painless.
• It’s just smart enough without getting in the way.
What’s NoSQL, really?
• The right tool for the job.
• We’ve got lots of options for storing application data.
• The key is picking the one that solves our real problem.
• And if an RDBMS is the right tool, that’s OK too.
Questions?
Further Reading
• Visual NoSQL: http://blog.nahurst.com/visual-guide-to-nosql-systems
• MongoDB: http://mongodb.org
• MongoMapper: http://mongomapper.com/
Thanks!
• Kevin Lawver
• @kplawver
• http://kevinlawver.com