introducing mongodb

57
Introducing:  MongoDB David J. C. Beach Sunday, August 1, 2010

Upload: pankajs95339

Post on 03-Jun-2018

230 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Introducing MongoDB

8/12/2019 Introducing MongoDB

http://slidepdf.com/reader/full/introducing-mongodb 1/57

Introducing: MongoDB

David J. C. Beach

Sunday, August 1, 2010

Page 2: Introducing MongoDB

8/12/2019 Introducing MongoDB

http://slidepdf.com/reader/full/introducing-mongodb 2/57

David Beach

Software Consultant (past 6 years)

Python since v1.4 (late 90’s)Design, Algorithms, Data Structures

Sometimes Database stuff

not a “frameworks” guy

Organizer: Front Range Pythoneers

Sunday, August 1, 2010

Page 3: Introducing MongoDB

8/12/2019 Introducing MongoDB

http://slidepdf.com/reader/full/introducing-mongodb 3/57

Outline

Part I: Trends in Databases

Part II: Mongo Basic Usage

Part III: Advanced Features

Sunday, August 1, 2010

Page 4: Introducing MongoDB

8/12/2019 Introducing MongoDB

http://slidepdf.com/reader/full/introducing-mongodb 4/57

Part I:Trends in Databases

Sunday, August 1, 2010

Page 5: Introducing MongoDB

8/12/2019 Introducing MongoDB

http://slidepdf.com/reader/full/introducing-mongodb 5/57

Database Trends

Past: “Relational” (RDBMS)

Data stored in Tables, Rows, Columns

Relationships designated by Primary, Foreign

keysData is controlled & queried via SQL

Sunday, August 1, 2010

Page 6: Introducing MongoDB

8/12/2019 Introducing MongoDB

http://slidepdf.com/reader/full/introducing-mongodb 6/57

Trends:

Criticisms of RDBMSRigid data model

Hard to scale / distribute

Slow (transactions, disk seeks)

SQL not well standardizedAwkward for modern/dynamic languages

Sunday, August 1, 2010

Page 7: Introducing MongoDB

8/12/2019 Introducing MongoDB

http://slidepdf.com/reader/full/introducing-mongodb 7/57

Trends:

FragmentationRelational with ORM (Hibernate, SQLAlchemy)

ODBMS / ORDBMS (push OO-concepts into database)

Key-Value Stores (MemcacheDB, Redis, Cassandra)

Graph (neo4j)Document Oriented (Mongo, Couch, etc...)

Sunday, August 1, 2010

Page 8: Introducing MongoDB

8/12/2019 Introducing MongoDB

http://slidepdf.com/reader/full/introducing-mongodb 8/57

Where Mongo Fits

“The Best Features ofDocument Databases,Key-Value Stores,

and RDBMSes.”

Sunday, August 1, 2010

Page 9: Introducing MongoDB

8/12/2019 Introducing MongoDB

http://slidepdf.com/reader/full/introducing-mongodb 9/57

What is Mongo

Document-Oriented Database

Produced by 10gen / Implemented in C++Source Code Available

Runs on Linux, Mac, Windows, Solaris

Database: GNU AGPL v3.0 License

Drivers: Apache License v2.0

Sunday, August 1, 2010

Page 10: Introducing MongoDB

8/12/2019 Introducing MongoDB

http://slidepdf.com/reader/full/introducing-mongodb 10/57

Mongo

Advantagesjson-style documents(dynamic schemas)

exible indexing (B-Tree)

replication and high-availability (HA)

automatic shardingsupport (v1.6)*

easy-to-use API

fast queries (auto-tuningplanner)

fast insert & deletes(sometimes trade-offs)

Sunday, August 1, 2010

Page 11: Introducing MongoDB

8/12/2019 Introducing MongoDB

http://slidepdf.com/reader/full/introducing-mongodb 11/57

Mongo

Language Bindings

C, C++, JavaPython, Ruby, Perl

PHP, JavaScript

(many more community supported ones)

Sunday, August 1, 2010

Page 12: Introducing MongoDB

8/12/2019 Introducing MongoDB

http://slidepdf.com/reader/full/introducing-mongodb 12/57

Mongo

Disadvantages

No Relational Model / SQL

No Explicit Transactions / ACID

Limited Query API

Sunday, August 1, 2010

Page 13: Introducing MongoDB

8/12/2019 Introducing MongoDB

http://slidepdf.com/reader/full/introducing-mongodb 13/57

When to use Mongo

Rich semistructured records (Documents)

Transaction isolation not essential

Humongous amounts of data

Need for extreme speedYou hate schema migrations

Sunday, August 1, 2010

Page 14: Introducing MongoDB

8/12/2019 Introducing MongoDB

http://slidepdf.com/reader/full/introducing-mongodb 14/57

Part II:Mongo Basic Usage

Sunday, August 1, 2010

Page 15: Introducing MongoDB

8/12/2019 Introducing MongoDB

http://slidepdf.com/reader/full/introducing-mongodb 15/57

Installing Mongo

Use a 64-bit OS (Linux, Mac, Windows)

Get Binaries: www.mongodb.org

Run “mongod” process

Sunday, August 1, 2010

Page 16: Introducing MongoDB

8/12/2019 Introducing MongoDB

http://slidepdf.com/reader/full/introducing-mongodb 16/57

Installing PyMongo

Download:http://pypi.python.org/pypi/pymongo/1.7

Build with setuptools

(includes C extension for speed)

# python setup.py install

# python setup.py --no-ext install

Sunday, August 1, 2010

Page 17: Introducing MongoDB

8/12/2019 Introducing MongoDB

http://slidepdf.com/reader/full/introducing-mongodb 17/57

Mongo Anatomy

Database

Collection

Document

Mongo Server

Sunday, August 1, 2010

Page 18: Introducing MongoDB

8/12/2019 Introducing MongoDB

http://slidepdf.com/reader/full/introducing-mongodb 18/57

>>> import pymongo

>>> connection = pymongo.Connection(“localhost”)

Getting a Connection

Connection required for using Mongo

Sunday, August 1, 2010

Page 19: Introducing MongoDB

8/12/2019 Introducing MongoDB

http://slidepdf.com/reader/full/introducing-mongodb 19/57

>>> db = connection.mydatabase

Finding a Database

Databases = logically separate stores

Navigation using propertiesWill create DB if not found

Sunday, August 1, 2010

Page 20: Introducing MongoDB

8/12/2019 Introducing MongoDB

http://slidepdf.com/reader/full/introducing-mongodb 20/57

>>> blog = db.blog

Using a Collection

Collection is analogous to Table

Contains documentsWill create collection if not found

Sunday, August 1, 2010

Page 21: Introducing MongoDB

8/12/2019 Introducing MongoDB

http://slidepdf.com/reader/full/introducing-mongodb 21/57

>>> entry1 = {“title”: “Mongo Tutorial”, “body”: “Here’s a document to insert.” }

>>> blog.insert(entry1)

ObjectId('4c3a12eb1d41c82762000001')

Inserting

collection.insert(document) => document_id

document

Sunday, August 1, 2010

Page 22: Introducing MongoDB

8/12/2019 Introducing MongoDB

http://slidepdf.com/reader/full/introducing-mongodb 22/57

>>> entry1

{'_id': ObjectId('4c3a12eb1d41c82762000001'), 'body': "Here's a document to insert.", 'title': 'Mongo Tutorial'}

Inserting (contd.)

Documents must have ‘_id’ eld

Automatically generated unless assigned12-byte unique binary value

Sunday, August 1, 2010

Page 23: Introducing MongoDB

8/12/2019 Introducing MongoDB

http://slidepdf.com/reader/full/introducing-mongodb 23/57

>>> entry2 = {"title": "Another Post", "body": "Mongo is powerful", "author": "David", "tags": ["Mongo", "Power"]}

>>> blog.insert(entry2)ObjectId('4c3a1a501d41c82762000002')

Inserting (contd.)

Documents may have different properties

Properties may be atomic, lists, dictionaries

another documentSunday, August 1, 2010

Page 24: Introducing MongoDB

8/12/2019 Introducing MongoDB

http://slidepdf.com/reader/full/introducing-mongodb 24/57

>>> blog.ensure_index(“author”)

>>> blog.ensure_index(“tags”)

Indexing

May create index on any eld

If eld is list => index associates all values

index by single value

by multiple values

Sunday, August 1, 2010

Page 25: Introducing MongoDB

8/12/2019 Introducing MongoDB

http://slidepdf.com/reader/full/introducing-mongodb 25/57

bulk_entries = [ ]for i in range(100000): entry = { "title": "Bulk Entry #%i" % (i+1), "body": "What Content!", "author": random.choice(["David", "Robot"]), "tags": ["bulk",

random.choice(["Red", "Blue", "Green"])] } bulk_entries.append(entry)

Bulk Insert

Let’s produce 100,000 fake posts

Sunday, August 1, 2010

Page 26: Introducing MongoDB

8/12/2019 Introducing MongoDB

http://slidepdf.com/reader/full/introducing-mongodb 26/57

>>> blog.insert(bulk_entries)

[ObjectId(...), ObjectId(...), ...]

Bulk Insert (contd.)

collection.insert(list_of_documents)

Inserts 100,000 entries into blogReturns in 2.11 seconds

Sunday, August 1, 2010

Page 27: Introducing MongoDB

8/12/2019 Introducing MongoDB

http://slidepdf.com/reader/full/introducing-mongodb 27/57

>>> blog.remove() # clear everything

>>> blog.insert(bulk_entries, safe=True)

Bulk Insert (contd.)

returns in 7.90 seconds (vs. 2.11 seconds)

driver returns early; DB is still working...unless you specify “safe=True”

Sunday, August 1, 2010

Page 28: Introducing MongoDB

8/12/2019 Introducing MongoDB

http://slidepdf.com/reader/full/introducing-mongodb 28/57

>>> blog.find_one({“title”: “Bulk Entry #12253”})

{u'_id': ObjectId('4c3a1e411d41c82762018a89'), u'author': u'Robot', u'body': u'What Content!', u'tags': [u'bulk', u'Green'], u'title': u'Bulk Entry #99999'}

Querying

collection.nd_one(spec) => document

spec = document of query parameters

Sunday, August 1, 2010

Page 29: Introducing MongoDB

8/12/2019 Introducing MongoDB

http://slidepdf.com/reader/full/introducing-mongodb 29/57

>>> blog.find_one({“title”: “Bulk Entry #12253”, “tags”: “Green”})

{u'_id': ObjectId('4c3a1e411d41c82762018a89'), u'author': u'Robot', u'body': u'What Content!', u'tags': [u'bulk', u'Green'], u'title': u'Bulk Entry #99999'}

Querying

(Specs)Multiple conditions on document => “AND”

Value for tags is an “ANY” match

Sunday, August 1, 2010

Page 30: Introducing MongoDB

8/12/2019 Introducing MongoDB

http://slidepdf.com/reader/full/introducing-mongodb 30/57

>>> green_items = [ ]>>> for item in blog.find({“tags”: “Green”}): green_items.append(item)

Querying

(Multiple)collection.nd(spec) => cursor

new items are fetched in bulk (behind thescenes)

>>> green_items = list(blog.find({“tags”: “Green”}))

- or -

Sunday, August 1, 2010

Page 31: Introducing MongoDB

8/12/2019 Introducing MongoDB

http://slidepdf.com/reader/full/introducing-mongodb 31/57

>>> blog.find({"tags": "Green"}).count()

16646

Querying

(Counting)Use the nd() method + count()

Returns number of matches found

Sunday, August 1, 2010

Page 32: Introducing MongoDB

8/12/2019 Introducing MongoDB

http://slidepdf.com/reader/full/introducing-mongodb 32/57

>>> item = blog.find_one({“title”: “Bulk Entry #12253”})>>> item.tags.append(“New”)>>> blog.update({“_id”: item[‘_id’]}, item)

Updating

collection.update(spec, document)

updates single document matching spec

“multi=True” => updates all matching docs

Sunday, August 1, 2010

Page 33: Introducing MongoDB

8/12/2019 Introducing MongoDB

http://slidepdf.com/reader/full/introducing-mongodb 33/57

>>> blog.remove({"author":"Robot"}, safe=True)

Deleting

use remove(...)

it works like nd(...)

Sunday, August 1, 2010

Page 34: Introducing MongoDB

8/12/2019 Introducing MongoDB

http://slidepdf.com/reader/full/introducing-mongodb 34/57

Part III:Advanced Features

Sunday, August 1, 2010

Page 35: Introducing MongoDB

8/12/2019 Introducing MongoDB

http://slidepdf.com/reader/full/introducing-mongodb 35/57

Advanced Querying

Regular Expressions

{“tag” : re.compile(r“^Green|Blue$”)}

Nested Values {“foo.bar.x” : 3}

$where Clause (JavaScript)

Sunday, August 1, 2010

Page 36: Introducing MongoDB

8/12/2019 Introducing MongoDB

http://slidepdf.com/reader/full/introducing-mongodb 36/57

>>> blog.find({“$or”: [{“tags”: “Green”}, {“tags”:“Blue”}]})

Advanced Querying

$lt, $gt, $lte, $gte, $ne

$in, $nin, $mod, $all, $size, $exists, $type

$or, $not

$elemmatch

Sunday, August 1, 2010

Page 37: Introducing MongoDB

8/12/2019 Introducing MongoDB

http://slidepdf.com/reader/full/introducing-mongodb 37/57

>>> blog.find().limit(50) # find 50 articles>>> blog.find().sort(“title”).limit(30) # 30 titles>>> blog.find().distinct(“author”) # unique author names

Advanced Querying

collection.nd(...)

sort(“name”) - sortinglimit(...) & skip(...) [like LIMIT & OFFSET]

distinct(...) [like SQL’s DISTINCT]

collection.group(...) - like SQL’s GROUP BY

Sunday, August 1, 2010

Page 38: Introducing MongoDB

8/12/2019 Introducing MongoDB

http://slidepdf.com/reader/full/introducing-mongodb 38/57

Map/Reduce

collection.map_reduce(mapper, reducer)ultimate in querying power

distribute across multiple nodes

Sunday, August 1, 2010

Page 39: Introducing MongoDB

8/12/2019 Introducing MongoDB

http://slidepdf.com/reader/full/introducing-mongodb 39/57

Map/Reduce

Visualized

Diagram Credit:

by Tom White; O’Reilly BooksChapter 2, page 20

also see:Map/Reduce : A Visual Explanation

1 2 3

Sunday, August 1, 2010

Page 40: Introducing MongoDB

8/12/2019 Introducing MongoDB

http://slidepdf.com/reader/full/introducing-mongodb 40/57

db.runCommand({mapreduce: "DenormAggCollection" ,query: { filter1: { '$in' : [ 'A' , 'B' ] }, filter2: 'C' ,

filter3: { '$gt' : 123 } },map: function () { emit( { d1: this .Dim1, d2: this .Dim2 }, { msum: this .measure1, recs: 1, mmin: this .measure1, mmax: this .measure2 < 100 ? this .measure2 : 0 } );},reduce: function (key, vals) { var ret = { msum: 0, recs: 0, mmin: 0, mmax: 0 }; for (var i = 0; i < vals.length; i++) { ret.msum += vals[i].msum; ret.recs += vals[i].recs; if (vals[i].mmin < ret.mmin) ret.mmin = vals[i].mmin; if ((vals[i].mmax < 100 ) && (vals[i].mmax > ret.mmax)) ret.mmax = vals[i].mmax; } return ret; },finalize: function (key, val) { val.mavg = val.msum / val.recs; return val;

},out: 'result1' ,verbose: true});db.result1. find({ mmin: { '$gt' : 0 } }). sort({ recs: -1 }). skip( 4). limit( 8);

SELECT Dim1, Dim2, SUM(Measure1) AS MSum, COUNT(*) AS RecordCount, AVG(Measure2) AS MAvg,

MIN(Measure1) AS MMin MAX( CASE WHEN Measure2 < 100 THEN Measure2 END) AS MMaxFROM DenormAggTableWHERE (Filter1 IN ( ’A’ , ’B’ )) AND (Filter2 = ‘C’ ) AND (Filter3 > 123 )GROUP BY Dim1, Dim2HAVING (MMin > 0)ORDER BY RecordCount DESCLIMIT 4, 8

!

"

#

$

%

!

&'

!

"

#

$

%

()*+,-. .01-230*2 4*5+123 6)- ,+55-.*+7 63 8-93 02 7:- 16, ;+2470*2<)-.+402= 7:- 30>- *; 7:- ?*)802= 3-7@

A-63+)-3 1+37 B- 162+6559 6==)-=67-.@

C==)-=67-3 .-,-2.02= *2 )-4*). 4*+2731+37 ?607 +2705 ;02650>670*2@A-63+)-3 462 +3- ,)*4-.+)65 5*=04@

D057-)3 :6E- 62 FGAHC470E-G-4*).I5**802= 3795-@

' C==)-=67- ;057-)02= 1+37 B- 6,,50-. 7*7:- )-3+57 3-7< 2*7 02 7:- 16,H)-.+4-@

& C34-2.02=J !K L-34-2.02=J I!

G - E 0 3 0 * 2

$ < M ) - 6 7 - .

" N ! N I N # I N '

G 0 4 8 F 3 B * ) 2 - < ) 0 4 8 * 3 B * ) 2 - @ * ) =

19OPQ A*2=*LR

ht tp://rickosborne.org/download/SQL-to-MongoDB.pdf Sunday, August 1, 2010

Page 41: Introducing MongoDB

8/12/2019 Introducing MongoDB

http://slidepdf.com/reader/full/introducing-mongodb 41/57

Map/ReduceExamples

Sunday, August 1, 2010

Page 42: Introducing MongoDB

8/12/2019 Introducing MongoDB

http://slidepdf.com/reader/full/introducing-mongodb 42/57

Health Clinic Example

Person registers with the Clinic

Weighs in on the scale

1 year => comes in 100 times

Sunday, August 1, 2010

Page 43: Introducing MongoDB

8/12/2019 Introducing MongoDB

http://slidepdf.com/reader/full/introducing-mongodb 43/57

Health Clinic Example

person = { “name”: “Bob”,

“weighings”: [

{“date”: date(2009, 1, 15), “weight”: 165.0},

{“date”: date(2009, 2, 12), “weight”: 163.2},

... ]

}

Sunday, August 1, 2010

Page 44: Introducing MongoDB

8/12/2019 Introducing MongoDB

http://slidepdf.com/reader/full/introducing-mongodb 44/57

for i in range(N): person = { 'name': 'person%04i' % i } weighings = person['weighings'] = [ ] std_weight = random.uniform(100, 200) for w in range(100): date = (datetime.datetime(2009, 1, 1) + datetime.timedelta( days=random.randint(0, 365)) weight = random.normalvariate(std_weight, 5.0)

weighings.append({ 'date': date, 'weight': weight }) weighings.sort(key=lambda x: x['date']) all_people.append(person)

Map/Reduce

Insert Script

Sunday, August 1, 2010

Page 45: Introducing MongoDB

8/12/2019 Introducing MongoDB

http://slidepdf.com/reader/full/introducing-mongodb 45/57

Insert Data

Performance

1

10

100

1000

1k 10k 100k

3.14s

29.5s

292s

Insert

Sunday, August 1, 2010

Page 46: Introducing MongoDB

8/12/2019 Introducing MongoDB

http://slidepdf.com/reader/full/introducing-mongodb 46/57

map_fn = Code("""function () { this.weighings.forEach(function(z) { emit(z.date, z.weight); });

}""")

reduce_fn = Code("""function (key, values) { var total = 0; for (var i = 0; i < values.length; i++) { total += values[i];

} return total;}""")

result = people.map_reduce(map_fn, reduce_fn)

Map/Reduce

Total Weight by Day

Sunday, August 1, 2010

Page 47: Introducing MongoDB

8/12/2019 Introducing MongoDB

http://slidepdf.com/reader/full/introducing-mongodb 47/57

>>> for doc in result.find(): print doc

{u'_id': datetime.datetime(2009, 1, 1, 0, 0), u'value':39136.600753163315}{u'_id': datetime.datetime(2009, 1, 2, 0, 0), u'value':41685.341024046182}{u'_id': datetime.datetime(2009, 1, 3, 0, 0), u'value':

38232.326554504165}

... lots more ...

Map/Reduce

Total Weight by Day

Sunday, August 1, 2010

Page 48: Introducing MongoDB

8/12/2019 Introducing MongoDB

http://slidepdf.com/reader/full/introducing-mongodb 48/57

Total Weight by Day

Performance

1

10

100

1000

1k 10k 100k

4.29s

38.8s

384s

MapReduce

Sunday, August 1, 2010

Page 49: Introducing MongoDB

8/12/2019 Introducing MongoDB

http://slidepdf.com/reader/full/introducing-mongodb 49/57

map_fn = Code("""function () { var target_date = new Date(2009, 9, 5); var pos = bsearch(this.weighings, "date", target_date);

var recent = this.weighings[pos]; emit(this._id, { name: this.name, date: recent.date, weight: recent.weight });};""")

reduce_fn = Code("""function (key, values) { return values[0];};""")

result = people.map_reduce(map_fn, reduce_fn, scope={"bsearch": bsearch})

Map/Reduce

Weight on Day

Sunday, August 1, 2010

Page 50: Introducing MongoDB

8/12/2019 Introducing MongoDB

http://slidepdf.com/reader/full/introducing-mongodb 50/57

bsearch = Code("""function(array, prop, value) { var min, max, mid, midval; for(min = 0, max = array.length - 1; min <= max; ) { mid = min + Math.floor((max - min) / 2); midval = array[mid][prop]; if(value === midval) { break; } else if(value > midval) { min = mid + 1;

} else { max = mid - 1; } } return (midval > value) ? mid - 1 : mid;};""")

Map/Reduce

bsearch() function

Sunday, August 1, 2010

h

Page 51: Introducing MongoDB

8/12/2019 Introducing MongoDB

http://slidepdf.com/reader/full/introducing-mongodb 51/57

Weight on Day

Performance

1

10

100

1000

1k 10k 100k1.23s

10s

108s

MapReduce

Sunday, August 1, 2010

h

Page 52: Introducing MongoDB

8/12/2019 Introducing MongoDB

http://slidepdf.com/reader/full/introducing-mongodb 52/57

target_date = datetime.datetime(2009, 10, 5)

for person in people.find(): dates = [ w['date'] for w in person['weighings'] ] pos = bisect.bisect_right(dates, target_date) val = person['weighings'][pos]

Weight on Day

(Python Version)

Sunday, August 1, 2010

d

Page 53: Introducing MongoDB

8/12/2019 Introducing MongoDB

http://slidepdf.com/reader/full/introducing-mongodb 53/57

Map/Reduce

Performance

0.1

1

10

100

1000

1k 10k 100k

0.37s

2.2s

26s

1.23s

10s

108s

MapReduce Python

Sunday, August 1, 2010

Page 54: Introducing MongoDB

8/12/2019 Introducing MongoDB

http://slidepdf.com/reader/full/introducing-mongodb 54/57

Summary

Sunday, August 1, 2010

Page 55: Introducing MongoDB

8/12/2019 Introducing MongoDB

http://slidepdf.com/reader/full/introducing-mongodb 55/57

Resources

www.10gen.com

www.mongodb.org

MongoDBThe Denitive GuideO’Reilly

api.mongodb.org/pythonPyMongo

Sunday, August 1, 2010

Page 56: Introducing MongoDB

8/12/2019 Introducing MongoDB

http://slidepdf.com/reader/full/introducing-mongodb 56/57

END OF SLIDES

Sunday, August 1, 2010

Ch lkb d

Page 57: Introducing MongoDB

8/12/2019 Introducing MongoDB

http://slidepdf.com/reader/full/introducing-mongodb 57/57

Chalkboard

is not Comic SansThis is Chalkboard, not Comic Sans.

This isn’t Chalkboard, it’s Comic Sans.

does it matter, anyway?