silicon valley code camp: 2011 introduction to mongodb

26
Introduction to MongoDB Silicon Valley Code Camp Oct 8 th , 2011

Upload: manish-pandit

Post on 14-May-2015

1.716 views

Category:

Technology


3 download

DESCRIPTION

My Talk today at Silicon Valley Code Camp 2011 on Introduction to MongoDB.

TRANSCRIPT

Page 1: Silicon Valley Code Camp: 2011 Introduction to MongoDB

Introduction to MongoDB

Silicon Valley Code Camp

Oct 8th, 2011

Page 2: Silicon Valley Code Camp: 2011 Introduction to MongoDB

Before we start……

• NoSQL is a movement, its not antiSQL• Relational Databases have their place, but

they are not the only solution• Diversify - Best tool for the job• The footers contain quotes from the video

Mongo DB is Web Scale

Page 3: Silicon Valley Code Camp: 2011 Introduction to MongoDB

Agenda

• Relational Databases vs. NoSQL• CAP Theorem • MongoDB at a high level– Collections, Documents– Inserting, Querying and Updating

• Other MongoDB Commands• Replication Topologies• Using MongoDB via a driver• Few Internals• Administration

Page 4: Silicon Valley Code Camp: 2011 Introduction to MongoDB

Relational Databases

• Have been around for years• De-facto standard for any persistence• ACID compliant• Rigid Schema• Usually hard to scale over a distributed network • Normalization is almost always a requirement• ORMs tend to limit the optimizations you can do

to the queries.

Relational Databases were'nt built for Web Scale. They have impotence mismatch.

Page 5: Silicon Valley Code Camp: 2011 Introduction to MongoDB

NoSQL

• Why?– Not everything can be modeled in a relational

construct– Cluster-aware out of the box. Replication, sharding

etc. is built into the core– Schemaless– (Mostly) Open Source, Community supported– High performance by design and not ball-and-

chained with ACID

Page 6: Silicon Valley Code Camp: 2011 Introduction to MongoDB

CAP Theorem : Pick Two

• Consistency – Each client sees the same data

• Availability – The system is always available for any reads and writes

• Partition Tolerance – The system can tolerate any communication failure across the network (except someone pulling the plug across the datacenters).

At any given point in time, only two of the above hold true in any distributed datastore.

If thats what they need to do to get those kick ass benchmarks, then its a great design.

Page 7: Silicon Valley Code Camp: 2011 Introduction to MongoDB

Visual Guide to NoSQL Databases

Source: http://blog.nahurst.com/visual-guide-to-nosql-systems

Page 8: Silicon Valley Code Camp: 2011 Introduction to MongoDB

How do they make up?

Usually the NoSQL databases are AP, or CP.• Consistency – Eventually consistent– Write concerns

• Availability– Read-only– stale data

Page 9: Silicon Valley Code Camp: 2011 Introduction to MongoDB

MongoDB : High Level

• Document-based Database• Schemaless• Cluster-aware• Easy Querying/Javascript Support• Memory Mapped• Drivers in all the popular languages• Excellent developer velocity (Supported by 10gen)• Durable via Journaling• C-P System based on the CAP theoremMongoDB handles WebScale. You turn it on and it scales right up.

Page 10: Silicon Valley Code Camp: 2011 Introduction to MongoDB

Collections

• The closest comparison to a MongoDB Collection in the relational world is a Table

• A collection is not bound by a schema• A collection has a namespace• Can be a capped collection• It contains BSON documents

Page 11: Silicon Valley Code Camp: 2011 Introduction to MongoDB

Documents

• Closest comparison in the relational world is a Row in a Table.

• Must reside within a Collection• Looks like (structured) JSON, stored as BSON

within a collection• Limited to 16MB (as of 2.0)• Larger sizes supported via GridFS

Reference : http://www.bsonspec.org. Defined as Binary-encoded Serialization format for JSON-like Documents.

Page 12: Silicon Valley Code Camp: 2011 Introduction to MongoDB

Inserting Documents

• Console defaults to localhost port 27017• show databases• show collections• Insert a document in a collection• Bulk inserts via Javascript

Page 13: Silicon Valley Code Camp: 2011 Introduction to MongoDB

Querying and Updating Documents

• Query a document• Select certain fields• Using limit, skip, sort and count• Using explain• In Place Updates• $inc, $push, $pull, $pop, $slice, $in, $nin• Indexing on fields

MongoDB is a Document Database, that does not need joins. It uses Map Reduce.

Page 14: Silicon Valley Code Camp: 2011 Introduction to MongoDB

Other console commands

• db.stats()• db.collection.stats()• db.isMaster()• rs.status()• db.currentOp()• db.serverStatus()

Page 15: Silicon Valley Code Camp: 2011 Introduction to MongoDB

Replication: Master Slave

• Achieved by “declaring” 1 node as the master, and “declaring” many nodes as its slaves

• Single point of failure/No failover• Can add any number of slaves easily• May need to put slaves behind a load balancer

Page 16: Silicon Valley Code Camp: 2011 Introduction to MongoDB

Replication : ReplicaSets

• Achieved by creating a cluster, called a replSet, and adding “members” to it.

• The “primary” and “secondary” roles are decided among the nodes. There is no permanent “master” or “slave”.

• Automatic Failover via voting• Arbiter may be needed if there are even number of

nodes to break a tie• Easy to add new members• Adding load-balancing will void failover

Page 17: Silicon Valley Code Camp: 2011 Introduction to MongoDB

Accessing MongoDB Programatically

• Scala – Using casbah

• Code to insert a document• Code to find/query• Code to update

Page 18: Silicon Valley Code Camp: 2011 Introduction to MongoDB

Object-Document Mappers

• Mongo Drivers understand Hashes, or DBObjects. A DBObject essentially is a Map

• The class needs to be converted to a DBObject, either by the developer or by the driver.

• Some such mappers also provide a DAO which makes it easy to perform CRUD operations.– MongoMapper for Ruby– Salat for Scala– Morphia for Java

Page 19: Silicon Valley Code Camp: 2011 Introduction to MongoDB

Internals

• Data is memory mapped, so writes can scale as no disk IO is performed with every write.

• Delayed writes to the disc, default 60 seconds.• Always easier to keep the indices and the working set

of the data in the memory to avoid swapping• Pre-allocated files in increments• Smart algorithm to add padding to the storage when

the document sizes are inconsistent• Durability is achieved by journaling, introduced in 1.7

Page 20: Silicon Valley Code Camp: 2011 Introduction to MongoDB

Replication Internals

• The almighty Oplog – Capped Collection– Acts like a tx log which the slaves or secondaries

read from and apply.• getmore on the primary/master every 4s• Failover and voting• Delayed sync• Using rs.slaveOk() to query the secondaries in

a replSet

Page 21: Silicon Valley Code Camp: 2011 Introduction to MongoDB

Scaling MongoDB

• Be smart with your schema design• Know ahead of time if the system will be read-heavy or

write-heavy• Use explain(), use indices• Do not fetch the entire document - select fields.• Keep an eye on index misses and page faults via

mongostat• Denormalize - avoid links, use embeds.• You can never replicate enough• Horizontal scaling via sharding

If /dev/null is faster then WebScale, I’ll use it. Does /dev/null support sharding?

Page 22: Silicon Valley Code Camp: 2011 Introduction to MongoDB

Backups

• Lock the database for a cold backup• Use filer snapshots• Use mongodump -> BSON, mongorestore to

restore• Use mongoexport -> JSON, mongoimport to

restore• Spare slaves always help

Page 23: Silicon Valley Code Camp: 2011 Introduction to MongoDB

Monitoring

• MMS– Developed by 10gen

• Munin– Plugins available to monitor MongoDB Server

• Nagios– For Machine Health Check

Page 24: Silicon Valley Code Camp: 2011 Introduction to MongoDB

Comparison of NoSQL Solutions

Source: http://perfectmarket.com/blog/not_only_nosql_review_solution_evaluation_guide_chart

Page 25: Silicon Valley Code Camp: 2011 Introduction to MongoDB

We’re hiring!corp.ign.com/careers, and @ignjobs

• Scala• Java• PHP/Zend• Rails• ElasticSearch• MongoDB• MySQL• HTML5

• Jquery Mobile• Sencha Touch• Phonegap• Wordpress• ActionScript/Flash• Redis/Memcached• CI/CD

Page 26: Silicon Valley Code Camp: 2011 Introduction to MongoDB

About

Manish Pandit Sr. Engineering ManagerIGN Entertainment

http://linkedin.com/in/mpandit@lobster1234