webinar: general technical overview of mongodb

Solutions Architect, 10gen

Sandeep Parikh

MongoDB Technical Overview

Agenda

Relational Databases

MongoDB Features

MongoDB Functionality

Scaling and Deployment

Aggregates, Statistics, Analytics

Advanced Topics

About 10gen

•  Background –  Founded in 2007 –  First release of MongoDB in 2009 –  74M+ in funding

•  MongoDB –  Core server –  Native drivers

•  Subscriptions, Consulting, Training

•  Monitoring


User·Name·Email address

Category·Name·URL

Comment·Comment·Date·Author

Article·Name·Slug·Publish date·Text

Tag·Name·URL

RDBMS Strengths

•  Data stored is very compact

•  Rigid schemas have led to powerful query capabilities

•  Data is optimized for joins and storage

•  Robust ecosystem of tools, libraries, integratons

•  40+ years old!

Enter “Big Data”

•  Gartner defines it with 3Vs

•  Volume –  Vast amounts of data being collected

•  Variety –  Evolving data –  Uncontrolled formats, no single schema –  Unknown at design time

•  Velocity –  Inbound data speed –  Fast read/write operations –  Low latency

Mapping Big Data to RDBMS

•  Difficult to store uncontrolled data formats

•  Scaling via big iron or custom data marts/partitioning schemes

•  Schema must be known at design time

•  Impedance mismatch with agile development and deployment techniques

•  Doesn’t map well to native language constructs

MongoDB Features

Goals

•  Scale horizontally over commodity systems

•  Incorporate what works for RDBMSs –  Rich data models, ad-hoc queries, full indexes

•  Drop what doesn’t work well –  Multi-row transactions, complex joins

•  Do not homogenize APIs

•  Match agile development and deployment workflows

Key Features

•  Data stored as documents (JSON) –  Flexible-schema

•  Full CRUD support (Create, Read, Update, Delete) –  Atomic in-place updates –  Ad-hoc queries: Equality, RegEx, Ranges, Geospatial

•  Secondary indexes

•  Replication – redundancy, failover

•  Sharding – partitioning for read/write scalability

Document Oriented, Dynamic Schema

name: “jeff”, eyes: “blue”, height: 72, boss: “ben”}

{name: “brendan”, aliases: [“el diablo”]}

name: “ben”, hat: ”yes”}

{name: “matt”, pizza: “DiGiorno”, height: 72, boss: 555.555.1212}

{name: “will”, eyes: “blue”, birthplace: “NY”, aliases: [“bill”, “la ciacco”], gender: ”???”, boss: ”ben”}

Disk seeks and data locality

Seek = 5+ ms Read = really really fast

User Comment

Article

Disk seeks and data locality

Article

User

Comment Comment Comment Comment Comment

MongoDB Security

•  SSL –  Between your app and MongoDB –  Between nodes in MongoDB cluster

•  Authorization at the database level –  Read Only / Read + Write / Administrator

•  Roadmap –  2.4: SASL, Kerberos authentication –  2.6: Pluggable authentication

Use Cases

Content Management

Operational Intelligence

High Volume Data Feeds E-Commerce User Data

Management

MongoDB Functionality

> var new_article = {

author: “roger”,

date: new Date(),

title: “My Favorite 2012 Movies”,

body: “Here are my favorite movies from 2012…”

tags: [“horror”, “action”, “independent”]

}

> db.articles.save(new_article)

Documents

> db.articles.find()

{

_id: ObjectId(“4c4ba5c0672c685e5e8aabf3”),


date: ISODate("2013-01-08T22:10:19.880Z")




}

// _id is unique but can be anything you like

Querying

// create an ascending index on “author”

> db.articles.ensureIndex({author:1})

> db.articles.find({author:”roger”})

{

_id: ObjectId(“4c4ba5c0672c685e5e8aabf3”),


…

}

Indexes

// Query Operators:

// $all, $exists, $mod, $ne, $in, $nin, $nor, $or,

// $size, $type, $lt, $lte, $gt, $gte

// find articles with any tags

> db.articles.find({tags: {$exists: true}})

// find posts matching a regular expression

> db.articles.find( {author: /^rog*/i } )

// count posts by author

> db.articles.find( {author: ‘roger’} ).count()

Ad-Hoc Queries

// Update Modifiers

// $set, $unset, $inc, $push, $pushAll, $pull,

// $pullAll, $bit

> comment = {

author: “fred”,

date: new Date(),

text: “Best list ever!”

}

> db.articles.update({ _id: “...” }, {

$push: {comments: comment}

});

Atomic Updates

{

_id: ObjectId("4c4ba5c0672c685e5e8aabf3"),

author: "roger",

date: ISODate("2013-01-08T22:10:19.880Z"),




comments : [

{ author: "Fred",

date: ISODate("2013-01-08T23:44:15.458Z"),

text: "Best list ever!” }

]

}

Nested Documents

// Index nested documents

> db.articles.ensureIndex({“comments.author”:1})

> db.articles.find({“comments.author”:’Fred’})

// Index on tags

> db.articles.ensureIndex({tags: 1})

> db.articles.find({tags: ’Manga’})

// Geospatial indexes

> db.articles.ensureIndex({location: “2d”})

> db.posts.find({location: {$near: [22,42]}})

Secondary Indexes

Scaling MongoDB

Scaling MongoDB

•  Replica Sets –  Redundancy, failover, read scalability

•  Sharding –  Auto-partitions data, read/write scalability

•  Multi-datacenter deployments

•  Tunable consistency

•  Engineering for zero downtime

Secondary Secondary

Primary

Client ApplicationDriver

Write

Read

Replica Sets

Node 1Secondary

Node 2Secondary

Node 3Primary

Replication

Heartbeat

ReplicationReplica Set – Initialize

Node 1Secondary

Node 2Secondary

Node 3

Heartbeat

Primary Election

Replica Set – Failure

Node 1Secondary

Node 2Primary

Node 3

Replication

Heartbeat

Replica Set – Failover

Node 1Secondary

Node 2Primary

Replication

Heartbeat

Node 3Recovery

Replication

Replica Set – Recovery

Node 1Secondary

Node 2Primary

Replication

Heartbeat

Node 3Secondary

Replication

Replica Set – Recovered

Secondary Secondary

Primary

Client ApplicationDriver

Write

Read Read

Scaling Reads

Node 1SecondaryConfigServer



Shard Shard Shard

Mongos

App Server

Mongos

App Server

Mongos

App Server

Sharding

Data stored in shard

•  Shard is a node of the cluster

•  For production deployments a shard is a replica set

Shard

Primary

Secondary

Secondary

Shard

orMongod

Config server stores meta data

•  Config Server – Stores cluster chunk

ranges and locations – Production deployments

need 3 nodes – Two phase commit (not

a replica set)

orNode 1SecondaryConfigServer




Mongos manages the data

•  Mongos – Acts as a router / balancer – No local data (persists to config database) – Can have 1 or many

App Server

Mongos Mongos

App Server App Server App Server

Mongos

or




Shard Shard Shard

Mongos

App Server

Mongos

App Server

Mongos

App Server

Sharding

Aggregates, Statistics, Analytics

Analyzing Data in MongoDB

•  Custom application code –  Run your queries, compute your results

•  Aggregation framework –  Declarative, pipeline-based approach

•  Native Map/Reduce in MongoDB –  Javascript functions distributed across cluster

•  Hadoop –  Offline batch processing/computation

db.article.aggregate(

{ $project: {

author: 1,

tags: 1,

}},

{ $unwind: "$tags" },

{ $group: {

_id: “$tags”,

authors: {

$addToSet : "$author"

}

}}

);

Aggregation Framework

{

title: “this is my title” ,

author: “bob” ,

posted: new Date () ,

tags: [“fun”, “good”, “fun”],

comments: [

{ author:“joe”,

text: “this is cool” },

{ author:“sam” ,

text: “this is bad” }

],

other: { foo : 5 }

}

// Operations: $project, $match, $limit, $skip, $unwind, $group, $sort

Mapping SQL to Aggregation SQL statement MongoDB command

SELECT COUNT(*) FROM users

db.users.aggregate([ { $group: {_id:null, count: {$sum:1}} } ])

SELECT SUM(price) FROM orders

db.users.aggregate([ { $group: {_id:null, total: {$sum:”$price”}} } ])

SELECT cust_id, SUM(PRICE) from orders GROUP BY cust_id

db.users.aggregate([ { $group: {_id:”$cust_id”, total:{$sum:”$price”}} } ])

SELECT cust_id, SUM(price) FROM orders WHERE active=true GROUP BY cust_id

db.users.aggregate([ { $match: {active:true} }, { $group: {_id:”$cust_id”, total:{$sum:”$price”}} } ])

Native Map/Reduce

•  More complex aggregation tasks

•  Map and Reduce functions written in JS

•  Can be distributed across sharded cluster for increased parallelism

Hadoop and MongoDB

•  MongoDB-Hadoop adapter

•  1.0 released, 1.1 in development

•  Supports Hadoop –  Map/Reduce, Streaming, Pig

•  MongoDB as input/output storage for Hadoop jobs –  No need to go through HDFS

•  Leverage power of Hadoop ecosystem against operational data in MongoDB

MongoDB Resources

•  Presentations, Webinars –  www.10gen.com/presentations

•  MongoDB documentation –  docs.mongodb.org

•  Community –  groups.google.com/group/mongodb-user –  stackoverflow.com/questions/tagged/mongodb

Questions

webinar: general technical overview of mongodb

Technology