mongodb: advantages of an open source nosql database

92
MongoDB: Advantages of an Open Source NoSQL Database: An Introduction FITC {spotlight on the MEAN stack}

Upload: fitc

Post on 06-May-2015

2.086 views

Category:

Internet


9 download

DESCRIPTION

MongoDB: Advantages of an Open Source NoSQL Database with Kevin Cearns. The presentation will present an overview of the MongoDB NoSQL database, its history and current status as the leading NoSQL database. It will focus on how NoSQL, and in particular MongoDB, benefits developers building big data or web scale applications. Discuss the community around MongoDB and compare it to commercial alternatives. An introduction to installing, configuring and maintaining standalone instances and replica sets will be provided. Presented live at FITC's Spotlight:MEAN Stack on March 28th, 2014. More info at FITC.ca

TRANSCRIPT

Page 1: MongoDB: Advantages of an Open Source NoSQL Database

MongoDB:Advantages of an Open Source NoSQL Database:

An Introduction

FITC {spotlight on the MEAN stack}

Page 2: MongoDB: Advantages of an Open Source NoSQL Database

Who am I?

Page 3: MongoDB: Advantages of an Open Source NoSQL Database

The cand.IO { Candy-oh! } Platform

We've made it our mission to become the premier provider of infrastructure, platform and operations services for big data, web and mobile applications.

We effectively manage your operations, allowing you to create, deploy and iterate

DevOps * SysOps * NoOps

Page 4: MongoDB: Advantages of an Open Source NoSQL Database
Page 5: MongoDB: Advantages of an Open Source NoSQL Database

What is NoSQL?

Page 6: MongoDB: Advantages of an Open Source NoSQL Database

“...when ‘NoSQL’ is applied to a database, it refers to an ill-defined set of mostly open-source databases, mostly developed in the early 21st century, and mostly not using SQL”

Martin Fowler: NoSQL Distilled

Page 7: MongoDB: Advantages of an Open Source NoSQL Database

● NoSQL databases don’t use SQL

● Generally open source projects

● Driven by need to scale and run on clusters

● Operate without a schema

● Shift away from relational model

● NoSQL models: key-value, document, column-family, graph

Page 8: MongoDB: Advantages of an Open Source NoSQL Database

what is MongoDB?

Page 9: MongoDB: Advantages of an Open Source NoSQL Database

Not

use

d w

ith p

erm

issi

on, p

leas

e ke

ep to

you

rsel

f, ap

prec

iate

d, th

anks

!

Page 10: MongoDB: Advantages of an Open Source NoSQL Database

History

● Development began in 2007

● Initially conceived as a persistent data store for a larger platform as a service offering

● In 2009, MongoDB was open sourced with an AGPL license

● Version 1.4 was released in March 2010 and considered the first production ready version

Page 11: MongoDB: Advantages of an Open Source NoSQL Database

mongodb.org/downloads

Page 12: MongoDB: Advantages of an Open Source NoSQL Database

DB-Engines Ranking

Page 13: MongoDB: Advantages of an Open Source NoSQL Database

MongoDB is a ______________ database

● Document

● Open Source

● High performance

● Horizontally scalable

● Full featuredPop

Qui

z!

Page 14: MongoDB: Advantages of an Open Source NoSQL Database

Document Database

● Not for .PDF and .DOC files

● A document is essentially an associate array

○ Document = JSON object

○ Document = PHP Array

○ Document = Python Dict

○ Document = Ruby Hash

○ etc.

Page 15: MongoDB: Advantages of an Open Source NoSQL Database

Open Source

● MongoDB is an open source project

● On GitHub and Jira

● Licensed under the AGPL

● Started and sponsored by 10gen (now MongoDB Inc.)

● Commercial licenses available

● Contributions welcome

Page 16: MongoDB: Advantages of an Open Source NoSQL Database

High Performance

● Written in C++

● Extensive use of memory-mapped filesi.e. read-through write-through memory caching

● Runs nearly everywhere

● Data serialized as BSON (fast parsing)

● Full support for primary and secondary indexes

Page 17: MongoDB: Advantages of an Open Source NoSQL Database
Page 18: MongoDB: Advantages of an Open Source NoSQL Database

Full Featured

● Ad Hoc queries

● Real time aggregation

● Rich query capabilities

● Geospatial features

● Support for most programming languages

● Flexible schema

Page 19: MongoDB: Advantages of an Open Source NoSQL Database

Document Database

Page 20: MongoDB: Advantages of an Open Source NoSQL Database

Terminology

RDBMS MongoDB

Table, View CollectionRow DocumentIndex IndexJoin Embedded DocumentForeign Key ReferencePartition Shard

Page 21: MongoDB: Advantages of an Open Source NoSQL Database

Typical (relational) ERD

Page 22: MongoDB: Advantages of an Open Source NoSQL Database

Schema Design

Page 23: MongoDB: Advantages of an Open Source NoSQL Database

MongoDB has native bindings for over 12 languages

Page 24: MongoDB: Advantages of an Open Source NoSQL Database

MongoDB Drivers

● Drivers connect to mongo servers

● Drivers translate BSON to native types

● mongo shell is not a driver, but works like one in some ways

● Installed using typical means (npm, pecl, gem, pip)

Page 25: MongoDB: Advantages of an Open Source NoSQL Database

Running MongoDB

$ tar –xzf mongodb-linux-x86_64-2.4.7.tgz

$ cd mongodb-linux-x86_64-2.4.7/bin

$ sudo mkdir –p /data/db

$ sudo ./mongod

Page 26: MongoDB: Advantages of an Open Source NoSQL Database

Mongo Shell

$ mongo

MongoDB shell version: 2.4.4

connecting to: test

> db.test.insert({text: 'Welcome to MongoDB'})

> db.test.find().pretty()

{

"_id" : ObjectId("51c34130fbd5d7261b4cdb55"),

"text" : "Welcome to MongoDB"

}

Page 27: MongoDB: Advantages of an Open Source NoSQL Database

Start with an object (or array, hash, dict, etc.)

var user = {

username: ’kcearns',

first_name: ’Kevin',

last_name: ’Cearns',

}

Page 28: MongoDB: Advantages of an Open Source NoSQL Database

Switch to your DB

>db

test

> use blog

switching to db blog

Page 29: MongoDB: Advantages of an Open Source NoSQL Database

Insert the record (no collection creation required)

> db.users.insert(user)

Page 30: MongoDB: Advantages of an Open Source NoSQL Database

Find one record

> db.users.findOne()

{

"_id" : ObjectId("50804d0bd94ccab2da652599"),

"username" : ”kcearns",

"first_name" : ”Kevin",

"last_name" : ”Cearns"

}

Page 31: MongoDB: Advantages of an Open Source NoSQL Database

_id

● _id is the primary key in MongoDB

● Automatically indexed

● Automatically created as an ObjectID if not provided

● Any unique immutable value can be used

Page 32: MongoDB: Advantages of an Open Source NoSQL Database

ObjectId

● ObjectId is a special 12 byte value

● Guaranteed to be unique across your cluster

● ObjectId(“50804d0bd94ccab2da652599”)

Page 33: MongoDB: Advantages of an Open Source NoSQL Database

Creating a Blog Post

> db.article.insert({

title: ‘Hello World’,

body: ‘This is my first blog post’,

date: new Date(‘2013-06-20’),

username: kcearns,

tags: [‘adventure’, ‘mongodb’],

comments: [ ]

})

Page 34: MongoDB: Advantages of an Open Source NoSQL Database

Finding the Post

> db.article.find().pretty()

{

"_id" : ObjectId("51c3bafafbd5d7261b4cdb5a"),

"title" : "Hello World",

"body" : "This is my first blog post",

"date" : ISODate("2013-10-20T00:00:00Z"),

"username" : "kcearns",

"tags" : [

"adventure",

"mongodb"

],

"comments" : [ ]

}

Page 35: MongoDB: Advantages of an Open Source NoSQL Database

Querying An Array

> db.article.find({tags:'adventure'}).pretty()

{

"_id" : ObjectId("51c3bcddfbd5d7261b4cdb5b"),

"title" : "Hello World",

"body" : "This is my first blog post",

"date" : ISODate("2013-10-20T00:00:00Z"),

"username" : "kcearns",

"tags" : [

"adventure",

"mongodb"

],

"comments" : [ ]

}

Page 36: MongoDB: Advantages of an Open Source NoSQL Database

Prime Time

What are your production options?

Page 37: MongoDB: Advantages of an Open Source NoSQL Database
Page 38: MongoDB: Advantages of an Open Source NoSQL Database
Page 39: MongoDB: Advantages of an Open Source NoSQL Database
Page 40: MongoDB: Advantages of an Open Source NoSQL Database

Roll your own...

Page 41: MongoDB: Advantages of an Open Source NoSQL Database

Operations Best practices

● Setup and configuration

● Hardware

● Operating system and file system configurations

● Networking

Page 42: MongoDB: Advantages of an Open Source NoSQL Database

Setup and configuration

● Only 64 bit versions of operating systems should be used

● Configuration files should be used for consistent setups

● Upgrades should be done as often as possible

● Data migration - don’t simply import your legacy dump

Page 43: MongoDB: Advantages of an Open Source NoSQL Database

Hardware

● MongoDB makes extensive use of RAM (the more RAM the better)

● Shared storage is not required

● Disk access patterns are not sequential

SSD where possible, better to spend money on more RAM or SSD vs. faster spinning drives

● RAID 10

● Faster clock speeds vs. numerous cores

Page 44: MongoDB: Advantages of an Open Source NoSQL Database

Operating system and file system configurations

● Ext4 and XFS file systems are recommended

● Turn off atime for the storage volume with the database files

● Disable NUMA (non-uniform memory access) in BIOS or start mongod with NUMA disabled

● Ensure readahead for block devices where the database files live are small (setting readahead to 32 (16KB) )

● Modify ulimit values

Page 45: MongoDB: Advantages of an Open Source NoSQL Database

Networking

● Run mongod in a trusted environment, prevent access from all unknown entities

● MongoDB binds to all available network interfaces, bind your mongod to the private or internal interface if you have one

Page 46: MongoDB: Advantages of an Open Source NoSQL Database

Replica sets

“...a group of mongod processes that maintain the same data set. Replica sets provide redundancy and high availability, and are the basis for all production deployments.”

Page 47: MongoDB: Advantages of an Open Source NoSQL Database
Page 48: MongoDB: Advantages of an Open Source NoSQL Database
Page 49: MongoDB: Advantages of an Open Source NoSQL Database
Page 50: MongoDB: Advantages of an Open Source NoSQL Database

● Secondaries apply operations from the primary asynchronously

● Replica sets supports dedicated members for reporting, disaster recovery and backup

● Automatic failover occurs when a primary does not communicate with other members of the set for more than 10 seconds

Page 51: MongoDB: Advantages of an Open Source NoSQL Database
Page 52: MongoDB: Advantages of an Open Source NoSQL Database

Sharding

● MongoDB approach to scaling out

● Data is split up and stored on different machines (usually a replica set)

● Supports Autosharding

● The cluster balances data across machines automatically

Page 53: MongoDB: Advantages of an Open Source NoSQL Database
Page 54: MongoDB: Advantages of an Open Source NoSQL Database

DEMO

Page 55: MongoDB: Advantages of an Open Source NoSQL Database

Backup

● expect failure when you feel most prepared

● any backup is better than no backup

● backup the backup

Page 56: MongoDB: Advantages of an Open Source NoSQL Database

Backup Considerations:

the business recovery expectation

ALWAYS

dictates the backup method

Page 57: MongoDB: Advantages of an Open Source NoSQL Database

● geography

● system Errors

● production constraints

● system capabilities

● database configuration

● actual requirements

● business requirements

Page 58: MongoDB: Advantages of an Open Source NoSQL Database

geography

● OFF SITE (away from your primary infrastructure)

● MULTIPLE COPIES OFF SITE

Page 59: MongoDB: Advantages of an Open Source NoSQL Database

System Errors

● ensure the integrity and availability of backups

● MULTIPLE COPIES OFF SITE

Page 60: MongoDB: Advantages of an Open Source NoSQL Database

Production constraints

● backup operations themselves require system resources

● consider backup schedules and availability of resources

Page 61: MongoDB: Advantages of an Open Source NoSQL Database

System capabilities:

some backup methods like LVM require the system tools to support them

Page 62: MongoDB: Advantages of an Open Source NoSQL Database

Consider the database configuration:

replication and sharding affects the backup method

Page 63: MongoDB: Advantages of an Open Source NoSQL Database

Actual requirements

● what needs to be backed up

● how timely does it need to be

● what's your recovery window

Page 64: MongoDB: Advantages of an Open Source NoSQL Database

Backup methods

● binary dumps of the database using mongodump/mongorestore

● filesystem snapshots like lvm

Page 65: MongoDB: Advantages of an Open Source NoSQL Database

Filesystem backup

● utilized with system level tools like LVM (logical volume manager)

● creates a filesystem snapshot or "block level" backup

● same premise as "hard links" - creates pointers between the live data and the snapshot volume

● requires configuration outside of MongoDB

Page 66: MongoDB: Advantages of an Open Source NoSQL Database

Snapshot limitations

● all writes to the database need to be written fully to disk (journal or data files)

● the journal must reside on the same volume as the data

● snapshots create an image of the entire disk

● Isolate data files and journal on a single logical disk that contains no other data

Page 67: MongoDB: Advantages of an Open Source NoSQL Database

Snapshots

● if mongod has journaling enabled you can use any kind of file system or volume/block level snapshot tool

# lvcreate --size 100M --snapshot --name snap01 /dev/vg0/mongodb

● creates an LVM snapshot named snap01 of the mongodb volume in the vg0 volume group

Page 68: MongoDB: Advantages of an Open Source NoSQL Database

Snapshots

● mount the snapshot and move the data to separate storage

# mount /dev/vg0/snap01# dd if=/dev/vg0/snap01 | gzip > snap01.gz

(block level copy of the snapshot image and compressed into a gzipped file)

# lvcreate --size 1G --name mongodb-new vg0# gzip -d -c snap01 | dd of=/dev/vg0/mongodb-new

Page 69: MongoDB: Advantages of an Open Source NoSQL Database

Mongodump & Mongorestore

● write the entire contents of the instance to a file in binary format

● can backup the entire server, database or collection

● queries allow you to backup part of a collection

Page 70: MongoDB: Advantages of an Open Source NoSQL Database

# mongodump

connects to the local database instance and creates a database backup named dump/ in the current directory

Page 71: MongoDB: Advantages of an Open Source NoSQL Database

# mongodump --dbpath /data/db --out /data/backup

Connects directly to local data files with no mongod process and saves output to /data/backup. Access to the data directory is restricted during the dump.

# mongodump --host mongodb.example.net --port 27017

Connects to host mongodb.example.net on port 27017 and saves output to a dump subdirectory of the current working directory

# mongodump --collection collection --db test

Creates a backup of the collection name collection from the database test in a dump subdirectory of the current working directory

Page 72: MongoDB: Advantages of an Open Source NoSQL Database

--oplog

mongodump copies data from the source database as well as all of the oplog entries from the beginning of the backup procedure until the backup procedure completes

--oplogReplay

Page 73: MongoDB: Advantages of an Open Source NoSQL Database

Mongorestore

● restores a backup created by mongodump

● by default mongorestore looks for a database backup in the dump/ directory

● can connect to an active mongod process or write to a local database path without mongod

● can restore an entire database or subset of the backup

Page 74: MongoDB: Advantages of an Open Source NoSQL Database

# mongorestore --port 27017 /data/backup

Connects to local mongodb instance on port 27017 and restores the dump from /data/backup

# mongorestore --dbpath /data/db /data/backup

Restore writes to data files inside /data/db from the dump in /data/backup

# mongorestore --filter '{"field": 1}'

Restore only adds documents from the dump located in the dump subdirectory of the current working directory if the documents have a field name field that holds a value of 1

Page 75: MongoDB: Advantages of an Open Source NoSQL Database
Page 76: MongoDB: Advantages of an Open Source NoSQL Database

When things go wrong

...and they will!

Page 77: MongoDB: Advantages of an Open Source NoSQL Database

Tools for Diagnostics

● Know your DB (ie., working set)

● Logs

● MMS Monitoring

● mongostat

● OS tools (ie, vmstat)

Page 78: MongoDB: Advantages of an Open Source NoSQL Database

Know your DB

● Determine working set

● Database profiler

● Scale for Read or Write

● db.serverStatus()

● rs.status()

● db.stats()

Page 79: MongoDB: Advantages of an Open Source NoSQL Database

Working Set

● db.runCommand( { serverStatus: 1, workingset: 1 })

"workingSet" : {"note" : "thisIsAnEstimate","pagesInMemory" : 17,"computationTimeMicros" : 10085,"overSeconds" : 999

},

Page 80: MongoDB: Advantages of an Open Source NoSQL Database

Working Set

pagesInMemory: contains a count of the total number of pages accessed by mongod over the period displayed inoverSeconds. The default page size is 4 kilobytes: to convert this value to the amount of data in memory multiply this value by 4 kilobyte

overSeconds: overSeconds returns the amount of time elapsed between the newest and oldest pages tracked in the pagesInMemory data point.If overSeconds is decreasing, or if pagesInMemory equals physical RAM and overSeconds is very small, the working set may be much larger than physical RAM.When overSeconds is large, MongoDB’s data set is equal to or smaller than physical RAM

Page 81: MongoDB: Advantages of an Open Source NoSQL Database

Performance of Database Operations

● Database profiler collects fine grained data about write operations, cursors and database commands

● Enable profiling on a per database or per instance basis

● Minor affect on performance

● system.profile collection is a capped collection with a default size of 1 megabyte

● db.setProfilingLevel(0)

Page 82: MongoDB: Advantages of an Open Source NoSQL Database

Performance of Database Operations

● 0 - the profiler is off

● 1 - collects profiling data for slow operations only. By default slow operations are those slower than 100 milliseconds. You can modify the threshold for slow operations with the slowms option

● 2 - collects profiling data for all database operations

● db.getProfilingStatus()

Page 83: MongoDB: Advantages of an Open Source NoSQL Database

Verbose Logs

● Set verbosity in config file

● use admindb.runCommand( { setParameter: 1, logLevel: 2 } )

v = Alternate form or verbosevv = Additional increase in verbosityvvv = Additional increase in verbosityvvvv = Additional increase in verbosityvvvvv = Additional increse in verbosity

Page 84: MongoDB: Advantages of an Open Source NoSQL Database

MMS Monitoring

Page 85: MongoDB: Advantages of an Open Source NoSQL Database

mongostat

● provides an overview of the status of a currently running mongod or mongos instance

● similar to vmstat but specific to mongodb instances

inserts: the number of objects inserted in the db per secondquery: the number of query operations per secondmapped: the total amount of data mapped in megabytesfaults: the number of page faults per secondlocked: the percent of time in a global write lockqr: length of queue of clients waiting to read dataqw: length of queue of clients waiting to write data

Page 86: MongoDB: Advantages of an Open Source NoSQL Database

OS tools

Network latency: ping and traceroute (especially helpful troubleshooting replica set issues and communication between members)

Disk throughput: iostat or vmstat (disk related issues can cause all kinds of problems)

Page 87: MongoDB: Advantages of an Open Source NoSQL Database

meetup.com/Toronto-MongoDB-User-Group

Page 88: MongoDB: Advantages of an Open Source NoSQL Database

Google Plus: Toronto MongoDB Users

Page 89: MongoDB: Advantages of an Open Source NoSQL Database

References

● github.com/mongodb/mongo

● jira.mongodb.org

● education.mongodb.com

● docs.mongodb.org

Page 90: MongoDB: Advantages of an Open Source NoSQL Database

education.mongodb.com

Page 91: MongoDB: Advantages of an Open Source NoSQL Database

Not

use

d w

ith p

erm

issi

on, p

leas

e ke

ep to

you

rsel

f, ap

prec

iate

d, th

anks

!

Page 92: MongoDB: Advantages of an Open Source NoSQL Database

Thank You!@kcearns

@candiocloud

entuit.com cand.io

FITC {spotlight on the MEAN stack}