webinar: general technical overview of mongodb

47
Solutions Architect, 10gen Sandeep Parikh MongoDB Technical Overview

Upload: mongodb

Post on 05-Dec-2014

2.581 views

Category:

Technology


0 download

DESCRIPTION

MongoDB is the leading open-source, document database. In this webinar we'll dive into the technical details of MongoDB by first mapping it from relational concepts. Next we'll discuss an example data model and associated query functionality using commands pulled straight from the MongoDB shell. Finally, we'll delve into some of the deployment functionality provided by MongoDB including solutions for data redundancy, node failover and auto-sharding.

TRANSCRIPT

Page 1: Webinar: General Technical Overview of MongoDB

Solutions Architect, 10gen

Sandeep Parikh

MongoDB Technical Overview

Page 2: Webinar: General Technical Overview of MongoDB

Agenda

Relational Databases

MongoDB Features

MongoDB Functionality

Scaling and Deployment

Aggregates, Statistics, Analytics

Advanced Topics

Page 3: Webinar: General Technical Overview of MongoDB

About 10gen

•  Background –  Founded in 2007 –  First release of MongoDB in 2009 –  74M+ in funding

•  MongoDB –  Core server –  Native drivers

•  Subscriptions, Consulting, Training

•  Monitoring

Page 4: Webinar: General Technical Overview of MongoDB

Relational Databases

Page 5: Webinar: General Technical Overview of MongoDB

Relational Databases

User·Name·Email address

Category·Name·URL

Comment·Comment·Date·Author

Article·Name·Slug·Publish date·Text

Tag·Name·URL

Page 6: Webinar: General Technical Overview of MongoDB

RDBMS Strengths

•  Data stored is very compact

•  Rigid schemas have led to powerful query capabilities

•  Data is optimized for joins and storage

•  Robust ecosystem of tools, libraries, integratons

•  40+ years old!

Page 7: Webinar: General Technical Overview of MongoDB

Enter “Big Data”

•  Gartner defines it with 3Vs

•  Volume –  Vast amounts of data being collected

•  Variety –  Evolving data –  Uncontrolled formats, no single schema –  Unknown at design time

•  Velocity –  Inbound data speed –  Fast read/write operations –  Low latency

Page 8: Webinar: General Technical Overview of MongoDB

Mapping Big Data to RDBMS

•  Difficult to store uncontrolled data formats

•  Scaling via big iron or custom data marts/partitioning schemes

•  Schema must be known at design time

•  Impedance mismatch with agile development and deployment techniques

•  Doesn’t map well to native language constructs

Page 9: Webinar: General Technical Overview of MongoDB

MongoDB Features

Page 10: Webinar: General Technical Overview of MongoDB

Goals

•  Scale horizontally over commodity systems

•  Incorporate what works for RDBMSs –  Rich data models, ad-hoc queries, full indexes

•  Drop what doesn’t work well –  Multi-row transactions, complex joins

•  Do not homogenize APIs

•  Match agile development and deployment workflows

Page 11: Webinar: General Technical Overview of MongoDB

Key Features

•  Data stored as documents (JSON) –  Flexible-schema

•  Full CRUD support (Create, Read, Update, Delete) –  Atomic in-place updates –  Ad-hoc queries: Equality, RegEx, Ranges, Geospatial

•  Secondary indexes

•  Replication – redundancy, failover

•  Sharding – partitioning for read/write scalability

Page 12: Webinar: General Technical Overview of MongoDB

Document Oriented, Dynamic Schema

name: “jeff”, eyes: “blue”, height: 72, boss: “ben”}

{name: “brendan”, aliases: [“el diablo”]}

name: “ben”, hat: ”yes”}

{name: “matt”, pizza: “DiGiorno”, height: 72, boss: 555.555.1212}

{name: “will”, eyes: “blue”, birthplace: “NY”, aliases: [“bill”, “la ciacco”], gender: ”???”, boss: ”ben”}

Page 13: Webinar: General Technical Overview of MongoDB

Disk seeks and data locality

Seek = 5+ ms Read = really really fast

User Comment

Article

Page 14: Webinar: General Technical Overview of MongoDB

Disk seeks and data locality

Article

User

Comment Comment Comment Comment Comment

Page 15: Webinar: General Technical Overview of MongoDB

MongoDB Security

•  SSL –  Between your app and MongoDB –  Between nodes in MongoDB cluster

•  Authorization at the database level –  Read Only / Read + Write / Administrator

•  Roadmap –  2.4: SASL, Kerberos authentication –  2.6: Pluggable authentication

Page 16: Webinar: General Technical Overview of MongoDB

Use Cases

Content Management

Operational Intelligence

High Volume Data Feeds E-Commerce User Data

Management

Page 17: Webinar: General Technical Overview of MongoDB

MongoDB Functionality

Page 18: Webinar: General Technical Overview of MongoDB

> var new_article = {

author: “roger”,

date: new Date(),

title: “My Favorite 2012 Movies”,

body: “Here are my favorite movies from 2012…”

tags: [“horror”, “action”, “independent”]

}

> db.articles.save(new_article)

Documents

Page 19: Webinar: General Technical Overview of MongoDB

> db.articles.find()

{

_id: ObjectId(“4c4ba5c0672c685e5e8aabf3”),

author: “roger”,

date: ISODate("2013-01-08T22:10:19.880Z")

title: “My Favorite 2012 Movies”,

body: “Here are my favorite movies from 2012…”

tags: [“horror”, “action”, “independent”]

}

// _id is unique but can be anything you like

Querying

Page 20: Webinar: General Technical Overview of MongoDB

// create an ascending index on “author”

> db.articles.ensureIndex({author:1})

> db.articles.find({author:”roger”})

{

_id: ObjectId(“4c4ba5c0672c685e5e8aabf3”),

author: “roger”,

}

Indexes

Page 21: Webinar: General Technical Overview of MongoDB

// Query Operators:

// $all, $exists, $mod, $ne, $in, $nin, $nor, $or,

// $size, $type, $lt, $lte, $gt, $gte

// find articles with any tags

> db.articles.find({tags: {$exists: true}})

// find posts matching a regular expression

> db.articles.find( {author: /^rog*/i } )

// count posts by author

> db.articles.find( {author: ‘roger’} ).count()

Ad-Hoc Queries

Page 22: Webinar: General Technical Overview of MongoDB

// Update Modifiers

// $set, $unset, $inc, $push, $pushAll, $pull,

// $pullAll, $bit

> comment = {

author: “fred”,

date: new Date(),

text: “Best list ever!”

}

> db.articles.update({ _id: “...” }, {

$push: {comments: comment}

});

Atomic Updates

Page 23: Webinar: General Technical Overview of MongoDB

{

_id: ObjectId("4c4ba5c0672c685e5e8aabf3"),

author: "roger",

date: ISODate("2013-01-08T22:10:19.880Z"),

title: “My Favorite 2012 Movies”,

body: “Here are my favorite movies from 2012…”

tags: [“horror”, “action”, “independent”]

comments : [

{ author: "Fred",

date: ISODate("2013-01-08T23:44:15.458Z"),

text: "Best list ever!” }

]

}

Nested Documents

Page 24: Webinar: General Technical Overview of MongoDB

// Index nested documents

> db.articles.ensureIndex({“comments.author”:1})

> db.articles.find({“comments.author”:’Fred’})

// Index on tags

> db.articles.ensureIndex({tags: 1})

> db.articles.find({tags: ’Manga’})

// Geospatial indexes

> db.articles.ensureIndex({location: “2d”})

> db.posts.find({location: {$near: [22,42]}})

Secondary Indexes

Page 25: Webinar: General Technical Overview of MongoDB

Scaling MongoDB

Page 26: Webinar: General Technical Overview of MongoDB

Scaling MongoDB

•  Replica Sets –  Redundancy, failover, read scalability

•  Sharding –  Auto-partitions data, read/write scalability

•  Multi-datacenter deployments

•  Tunable consistency

•  Engineering for zero downtime

Page 27: Webinar: General Technical Overview of MongoDB

Secondary Secondary

Primary

Client ApplicationDriver

Write

Read

Replica Sets

Page 28: Webinar: General Technical Overview of MongoDB

Node 1Secondary

Node 2Secondary

Node 3Primary

Replication

Heartbeat

ReplicationReplica Set – Initialize

Page 29: Webinar: General Technical Overview of MongoDB

Node 1Secondary

Node 2Secondary

Node 3

Heartbeat

Primary Election

Replica Set – Failure

Page 30: Webinar: General Technical Overview of MongoDB

Node 1Secondary

Node 2Primary

Node 3

Replication

Heartbeat

Replica Set – Failover

Page 31: Webinar: General Technical Overview of MongoDB

Node 1Secondary

Node 2Primary

Replication

Heartbeat

Node 3Recovery

Replication

Replica Set – Recovery

Page 32: Webinar: General Technical Overview of MongoDB

Node 1Secondary

Node 2Primary

Replication

Heartbeat

Node 3Secondary

Replication

Replica Set – Recovered

Page 33: Webinar: General Technical Overview of MongoDB

Secondary Secondary

Primary

Client ApplicationDriver

Write

Read Read

Scaling Reads

Page 34: Webinar: General Technical Overview of MongoDB

Node 1SecondaryConfigServer

Node 1SecondaryConfigServer

Node 1SecondaryConfigServer

Shard Shard Shard

Mongos

App Server

Mongos

App Server

Mongos

App Server

Sharding

Page 35: Webinar: General Technical Overview of MongoDB

Data stored in shard

•  Shard is a node of the cluster

•  For production deployments a shard is a replica set

Shard

Primary

Secondary

Secondary

Shard

orMongod

Page 36: Webinar: General Technical Overview of MongoDB

Config server stores meta data

•  Config Server – Stores cluster chunk

ranges and locations – Production deployments

need 3 nodes – Two phase commit (not

a replica set)

orNode 1SecondaryConfigServer

Node 1SecondaryConfigServer

Node 1SecondaryConfigServer

Node 1SecondaryConfigServer

Page 37: Webinar: General Technical Overview of MongoDB

Mongos manages the data

•  Mongos – Acts as a router / balancer – No local data (persists to config database) – Can have 1 or many

App Server

Mongos Mongos

App Server App Server App Server

Mongos

or

Page 38: Webinar: General Technical Overview of MongoDB

Node 1SecondaryConfigServer

Node 1SecondaryConfigServer

Node 1SecondaryConfigServer

Shard Shard Shard

Mongos

App Server

Mongos

App Server

Mongos

App Server

Sharding

Page 39: Webinar: General Technical Overview of MongoDB

Aggregates, Statistics, Analytics

Page 40: Webinar: General Technical Overview of MongoDB

Analyzing Data in MongoDB

•  Custom application code –  Run your queries, compute your results

•  Aggregation framework –  Declarative, pipeline-based approach

•  Native Map/Reduce in MongoDB –  Javascript functions distributed across cluster

•  Hadoop –  Offline batch processing/computation

Page 41: Webinar: General Technical Overview of MongoDB

db.article.aggregate(

{ $project: {

author: 1,

tags: 1,

}},

{ $unwind: "$tags" },

{ $group: {

_id: “$tags”,

authors: {

$addToSet : "$author"

}

}}

);

Aggregation Framework

{

title: “this is my title” ,

author: “bob” ,

posted: new Date () ,

tags: [“fun”, “good”, “fun”],

comments: [

{ author:“joe”,

text: “this is cool” },

{ author:“sam” ,

text: “this is bad” }

],

other: { foo : 5 }

}

// Operations: $project, $match, $limit, $skip, $unwind, $group, $sort

Page 42: Webinar: General Technical Overview of MongoDB

Mapping SQL to Aggregation SQL  statement   MongoDB  command  

SELECT  COUNT(*)  FROM  users  

db.users.aggregate([      {  $group:  {_id:null,  count:  {$sum:1}}  }  ])  

SELECT  SUM(price)  FROM  orders  

db.users.aggregate([      {  $group:  {_id:null,  total:  {$sum:”$price”}}  }  ])  

SELECT  cust_id,  SUM(PRICE)  from  orders  GROUP  BY  cust_id  

db.users.aggregate([      {  $group:  {_id:”$cust_id”,  total:{$sum:”$price”}}  }  ])  

SELECT  cust_id,  SUM(price)  FROM  orders  WHERE  active=true  GROUP  BY  cust_id  

db.users.aggregate([      {  $match:  {active:true}  },      {  $group:  {_id:”$cust_id”,  total:{$sum:”$price”}}  }  ])  

Page 43: Webinar: General Technical Overview of MongoDB

Native Map/Reduce

•  More complex aggregation tasks

•  Map and Reduce functions written in JS

•  Can be distributed across sharded cluster for increased parallelism

Page 44: Webinar: General Technical Overview of MongoDB

var map = function() {

emit(this.author, {votes: this.votes});

};

var reduce = function(key, values) {

var sum = 0;

values.forEach(function(doc) {

sum += doc.votes;

});

return {votes: sum};

};

Map/Reduce Functions

Page 45: Webinar: General Technical Overview of MongoDB

Hadoop and MongoDB

•  MongoDB-Hadoop adapter

•  1.0 released, 1.1 in development

•  Supports Hadoop –  Map/Reduce, Streaming, Pig

•  MongoDB as input/output storage for Hadoop jobs –  No need to go through HDFS

•  Leverage power of Hadoop ecosystem against operational data in MongoDB

Page 46: Webinar: General Technical Overview of MongoDB

MongoDB Resources

•  Presentations, Webinars –  www.10gen.com/presentations

•  MongoDB documentation –  docs.mongodb.org

•  Community –  groups.google.com/group/mongodb-user –  stackoverflow.com/questions/tagged/mongodb

Page 47: Webinar: General Technical Overview of MongoDB

Questions