distilled mongo db by boris trofimov

61
distilled Boris Trofimov Team Lead@Sigma Ukraine @b0ris_1 [email protected]

Upload: alex-tumanoff

Post on 01-Nov-2014

2.451 views

Category:

Documents


3 download

DESCRIPTION

 

TRANSCRIPT

Page 1: Distilled mongo db by Boris Trofimov

distilled

Boris Trofimov Team Lead@Sigma Ukraine

@[email protected]

Page 2: Distilled mongo db by Boris Trofimov

Agenda● Part 1. Why NoSQL

– SQL benefints and critics

– NoSQL challange● Part 2. MongoDB

– Overview

– Console and query example

– Java Integration

– Data consistancy

– Scaling

– Tips

Page 3: Distilled mongo db by Boris Trofimov

Part 1. Why NoSQL

Page 4: Distilled mongo db by Boris Trofimov

Relational DBMS Benefits

Page 5: Distilled mongo db by Boris Trofimov

SQL

● Simplicity● Uniform representation● Runtime schema modifications

SELECT DISTINCT p.LastName, p.FirstName FROM Person.Person AS p JOIN HumanResources.Employee AS e ON e.BusinessEntityID = p.BusinessEntityID WHERE 5000.00 IN (SELECT Bonus FROM Sales.SalesPerson AS sp WHERE e.BusinessEntityID = sp.BusinessEntityID);

Page 6: Distilled mongo db by Boris Trofimov

Strong schema definition

Page 7: Distilled mongo db by Boris Trofimov

Strong consistency

SQL features like Foreign and Primary Keys, Unique fields

ACID (atomicity, consistency, isolation, durability) transactions

Business transactions ~ system transactions

Page 8: Distilled mongo db by Boris Trofimov

RDBMS Criticism

Page 9: Distilled mongo db by Boris Trofimov

Big gap between domain and relational model

Page 10: Distilled mongo db by Boris Trofimov

Performance Issues

JOINS Minimization Choosing right transaction strategy Query Optimization

Consistency costs too much

Normalization Impact Performance issues

Page 11: Distilled mongo db by Boris Trofimov

Schema migration issuesConsistency issues

Reinventing bicycle

Involving external tools like DBDeploy

Scaling options

Consistency issues

Poor scaling options

Page 12: Distilled mongo db by Boris Trofimov

SQL Opposition

● Object Databases by OMG● ORM● ?

Page 13: Distilled mongo db by Boris Trofimov

No SQL Yes

● Transactionaless in usual understanding

● Schemaless, no migration

● Closer to domain

● Focused on aggregates

● Trully scalable

Page 14: Distilled mongo db by Boris Trofimov

NoSQL Umbrella

Page 15: Distilled mongo db by Boris Trofimov

Key-Value Databases

Page 16: Distilled mongo db by Boris Trofimov

Column-Family Databases

Page 17: Distilled mongo db by Boris Trofimov

Document-oriented Databases

Page 18: Distilled mongo db by Boris Trofimov

Graph-oriented Databases

Page 19: Distilled mongo db by Boris Trofimov

Aggregate oriented Databases

● Document databases implement idea of Aggregate oriented database.

● Aggregate is a storage atom● Aggregate oriented databsaes are closer to application

domain.● Ensures atomic operations with aggregate● Aggregate might be replicated or sharded efficiently● Major question: to embed or not to embed

Page 20: Distilled mongo db by Boris Trofimov

Relations vs Aggregates

Page 21: Distilled mongo db by Boris Trofimov

// in customers{"id":1,"name":"Medvedev","billingAddress":[{"city":"Moscow"}]}

// in orders{"id":99,"customerId":1,"orderItems":[ { "productId":47, "price": 444.45, "productName": "iPhone 5" } ],"shippingAddress":[{"city":"Moscow"}]"orderPayment":[ { "ccinfo":"1000-1000-1000-1000", "txnId":"abelif879rft", "billingAddress": {"city": "Moscow"} } ],}

Relational Model Document Model

Page 22: Distilled mongo db by Boris Trofimov

Part 2. MongoDB

Page 23: Distilled mongo db by Boris Trofimov

MongoDB Basics

MongoDB is document-oriented and DBMS

MongoDB is Client-Server DBMS

Mongo DB = Collections + Indexes

JSON/JavaScript is major language to access

Page 24: Distilled mongo db by Boris Trofimov

Collections

Simple creating (during first insert).

Two documents from the same collection might be completly different

NameDocuments

IndexesIndexes

Page 25: Distilled mongo db by Boris Trofimov

Document

{ "fullName" : "Fedor Buhankin", "course" : 5, "univercity" : "ONPU", "faculty" : "IKS", "_id" : { "$oid" : "5071c043cc93742e0d0e9cc7" } "homeAddress" : "Ukraine, Odessa 23/34", "averageAssessment" : 5, "subjects" : [ "math", "literature", "drawing", "psychology" ] }

Identifier (_id)

Body i JSON (Internally BSON)

● Any part of the ducument can be indexed● Max document size is 16M

● Major bricks: scalar value, map and list

Page 26: Distilled mongo db by Boris Trofimov

MongoDB Console

Page 27: Distilled mongo db by Boris Trofimov

Query Examples

Page 28: Distilled mongo db by Boris Trofimov

// in customers{"id":1,"name":"Medvedev","billingAddress":[{"city":"Moscow"}]}

// in orders{"id":99,"customerId":1,"orderItems":[ { "productId":47, "price": 444.45, "productName": "iPhone 5" } ],"shippingAddress":[{"city":"Moscow"}],"orderPayment":[ { "ccinfo":"1000-1000-1000-1000", "txnId":"abelif879rft", "billingAddress": {"city": "Moscow"} } ]}

SELECT * FROM ORDERS;

db.orders.find()

Simple Select

Page 29: Distilled mongo db by Boris Trofimov

SELECT * FROM ORDERS WHERE customerId = 1;

db.orders.find( {"customerId":1} )

Simple Condition// in customers{"id":1,"name":"Medvedev","billingAddress":[{"city":"Moscow"}]}

// in orders{"id":99,"customerId":1,"orderItems":[ { "productId":47, "price": 444.45, "productName": "iPhone 5" } ],"shippingAddress":[{"city":"Moscow"}],"orderPayment":[ { "ccinfo":"1000-1000-1000-1000", "txnId":"abelif879rft", "billingAddress": {"city": "Moscow"} } ]}

Page 30: Distilled mongo db by Boris Trofimov

SELECT * FROM orders WHERE customerId > 1

db.orders.find({ "customerId" : { $gt: 1 } } );

Simple Comparison// in customers{"id":1,"name":"Medvedev","billingAddress":[{"city":"Moscow"}]}

// in orders{"id":99,"customerId":1,"orderItems":[ { "productId":47, "price": 444.45, "productName": "iPhone 5" } ],"shippingAddress":[{"city":"Moscow"}],"orderPayment":[ { "ccinfo":"1000-1000-1000-1000", "txnId":"abelif879rft", "billingAddress": {"city": "Moscow"} } ]}

Page 31: Distilled mongo db by Boris Trofimov

SELECT * FROM orders WHERE customerId = 1 AND orderDate is not NULL

db.orders.find( { customerId:1, orderDate : { $exists : true } } );

AND Condition// in customers{"id":1,"name":"Medvedev","billingAddress":[{"city":"Moscow"}]}

// in orders{"id":99,"customerId":1,"orderItems":[ { "productId":47, "price": 444.45, "productName": "iPhone 5" } ],"shippingAddress":[{"city":"Moscow"}],"orderPayment":[ { "ccinfo":"1000-1000-1000-1000", "txnId":"abelif879rft", "billingAddress": {"city": "Moscow"} } ]}

Page 32: Distilled mongo db by Boris Trofimov

SELECT * FROM orders WHERE customerId = 100 OR orderDate is not NULL

db.orders.find( { $or:[ {customerId:100}, {orderDate : { $exists : false }} ] } );

OR Condition// in customers{"id":1,"name":"Medvedev","billingAddress":[{"city":"Moscow"}]}

// in orders{"id":99,"customerId":1,"orderItems":[ { "productId":47, "price": 444.45, "productName": "iPhone 5" } ],"shippingAddress":[{"city":"Moscow"}],"orderPayment":[ { "ccinfo":"1000-1000-1000-1000", "txnId":"abelif879rft", "billingAddress": {"city": "Moscow"} } ]}

Page 33: Distilled mongo db by Boris Trofimov

SELECT orderId, orderDateFROM orders WHERE customerId = 1

db.orders.find({customerId:1},{orderId:1,orderDate:1})

Select fields// in customers{"id":1,"name":"Medvedev","billingAddress":[{"city":"Moscow"}]}

// in orders{"id":99,"customerId":1,"orderItems":[ { "productId":47, "price": 444.45, "productName": "iPhone 5" } ],"shippingAddress":[{"city":"Moscow"}],"orderPayment":[ { "ccinfo":"1000-1000-1000-1000", "txnId":"abelif879rft", "billingAddress": {"city": "Moscow"} } ]}

Page 34: Distilled mongo db by Boris Trofimov

SELECT * FROM OrdersWHERE Orders.id IN (

SELECT id FROM orderItem WHERE productName LIKE '%iPhone%')

db.orders.find( {"orderItems.productName":/.*iPhone.*/} )

Inner select// in customers{"id":1,"name":"Medvedev","billingAddress":[{"city":"Moscow"}]}

// in orders{"id":99,"customerId":1,"orderItems":[ { "productId":47, "price": 444.45, "productName": "iPhone 5" } ],"shippingAddress":[{"city":"Moscow"}],"orderPayment":[ { "ccinfo":"1000-1000-1000-1000", "txnId":"abelif879rft", "billingAddress": {"city": "Moscow"} } ]}

Page 35: Distilled mongo db by Boris Trofimov

SELECT * FROM orders WHERE orderDate is NULL

db.orders.find( { orderDate : { $exists : false } } );

NULL checks// in customers{"id":1,"name":"Medvedev","billingAddress":[{"city":"Moscow"}]}

// in orders{"id":99,"customerId":1,"orderItems":[ { "productId":47, "price": 444.45, "productName": "iPhone 5" } ],"shippingAddress":[{"city":"Moscow"}],"orderPayment":[ { "ccinfo":"1000-1000-1000-1000", "txnId":"abelif879rft", "billingAddress": {"city": "Moscow"} } ]}

Page 36: Distilled mongo db by Boris Trofimov

More examples

• db.orders.sort().skip(20).limit(10)

• db.orders.count({ "orderItems.price" : { $gt: 444 })

• db.orders.find( { orderItems: { "productId":47, "price": 444.45, "productName": "iPhone 5" } } );

• db.orders.find()._addSpecial( "$comment" , "this is tagged query" )

Page 37: Distilled mongo db by Boris Trofimov

Queries between collections

● Remember, MongoDB = no JOINs

● 1 approach: Perform multiple queries (lazy loading)● 2 approach: use MapReduce framework● 3 approach: use Aggregation Framework

Page 38: Distilled mongo db by Boris Trofimov

Map Reduce Framework● Is used to perform complex grouping with collection

documents● Is able to manipulate over multiple collections● Uses MapReduce pattern● Use JavaScript language● Support sharded environment● The result is similar to materialized views

Page 39: Distilled mongo db by Boris Trofimov

Map Reduce Concept

a1a1

a2a2

a3a3

a4a4

a5a5

a6a6

anan

......

b1b1

b2b2

b3b3

b4b4

b5b5

b6b6

bnbn

......

Launch mapFor every elem

Launch reduce

mapmap

mapmap

mapmap

mapmap

mapmap

mapmap

mapmap

reducereduce cc

f map : A→ B f reduce : B[ ]→C

Page 40: Distilled mongo db by Boris Trofimov

Implement MAP functionImplement MAP function

Implement REDUCE functionImplement REDUCE function

Execute MAP func:Mark each document

with specific color

Execute MAP func:Mark each document

with specific color

Input

Execute REDUCE func:Merge each colored set

into single element

Execute REDUCE func:Merge each colored set

into single element

MAP

REDUCE

Output

Collection X

How it works

Page 41: Distilled mongo db by Boris Trofimov

Take amount of orders for each customer

db.cutomers_orders.remove(); mapUsers = function() { emit( this.customerId, {count: 1, this.customerId} );}; reduce = function(key, values) { var result = {count: 0, customerId:key}; values.forEach(function(value) { result.count += value.count; }); return result; }; db.customers.mapReduce(mapUsers, reduce, {"out": {"replace""cutomers_orders"}});

Output: [ {count:123, customerId:1}, {count:33, customerId:2} ]

Page 42: Distilled mongo db by Boris Trofimov

Aggregation andAggregation Framework

● Simplify most used mapreduce operarions like group by criteria

● Restriction on pipeline size is 16MB● Support sharded environment (Aggregation

Framework only)

Page 43: Distilled mongo db by Boris Trofimov

Indexes

● Anything might be indexed● Indexes improve performance● Implementation uses B-trees

Page 44: Distilled mongo db by Boris Trofimov
Page 45: Distilled mongo db by Boris Trofimov

Access via API

Mongo m = new Mongo();// orMongo m = new Mongo( "localhost" );// orMongo m = new Mongo( "localhost" , 27017 );// or, to connect to a replica set, supply a seed list of membersMongo m = new Mongo(Arrays.asList(new ServerAddress("localhost", 27017), new ServerAddress("localhost", 27018), new ServerAddress("localhost", 27019)))DB db = m.getDB( "mydb" );

DBCollection coll = db.getCollection("customers");

ArrayList list = new ArrayList(); list.add(new BasicDBObject("city", "Odessa")); BasicDBObject doc= new BasicDBObject(); doc.put("name", "Kaktus"); doc.put("billingAddress", list); coll.insert(doc);

Use Official MongoDB Java Driver (just include mongo.jar)

Page 46: Distilled mongo db by Boris Trofimov

Closer to Domain model● Morphia http://code.google.com/p/morphia/● Spring Data for MongoDB

http://www.springsource.org/spring-data/mongodb

Major features:● Type-safe POJO centric model● Annotations based mapping behavior● Good performance● DAO templates● Simple criterias

Page 47: Distilled mongo db by Boris Trofimov

Example with Morphia@Entity("Customers")class Customer { @Id ObjectId id; // auto-generated, if not set (see ObjectId) @Indexed String name; // value types are automatically persisted List<Address> billingAddress; // by default fields are @Embedded Key<Customer> bestFriend; //referenceto external document @Reference List<Customer> partners = new ArrayList<Customer>(); //refs are stored and loaded automatically // ... getters and setters

//Lifecycle methods -- Pre/PostLoad, Pre/PostPersist... @PostLoad void postLoad(DBObject dbObj) { ... }}

Datastore ds = new Morphia(new Mongo()).createDatastore("tempDB")morphia.map(Customer.class); Key<Customer> newCustomer = ds.save(new Customer("Kaktus",...)); Customer customer = ds.find(Customer.class).field("name").equal("Medvedev").get();

Page 48: Distilled mongo db by Boris Trofimov

To embed or not to embed● Separate collections are good if you need

to select individual documents, need more control over querying, or have huge documents.

● Embedded documents are good when you want the entire document, size of the document is predicted. Embedded documents provide perfect performance.

Page 49: Distilled mongo db by Boris Trofimov

Schema migration● Schemaless● Main focus is how the aplication will behave when

new field will has been added● Incremental migration technque (version field)

Use Cases : – removing field– renaming fields– refactoring aggregate

Page 50: Distilled mongo db by Boris Trofimov

Data Consistency● Transactional consistency

– domain design should take into account aggregate atomicity

● Replication consistency– Take into account Inconsistency window (sticky sessions)

● Eventual consistency● Accept CAP theorem

– it is impossible for a distributed computer system to simultaneously provide all three of the following guarantees: consistency, availability and partition tolerance.

Page 51: Distilled mongo db by Boris Trofimov

Scaling

Page 52: Distilled mongo db by Boris Trofimov

Scaling options

● Autosharding● Master-Slave replication● Replica Set clusterization● Sharding + Replica Set

Page 53: Distilled mongo db by Boris Trofimov

Sharding● MongoDB supports autosharding● Just specify shard key and pattern● Sharding increases writes● Major way for scaling the system

Page 54: Distilled mongo db by Boris Trofimov

Master-Slave replication● One master, many slaves● Slaves might be hidden or can be used to read● Master-Slave increase

reades and provides

reliability

Page 55: Distilled mongo db by Boris Trofimov

Replica Set clusterization● The replica set automatically elects a primary (master)● Master shares the same state between all replicas

● Limitation (limit: 12 nodes)● WriteConcern option

● Benefits:– Failover and Reliability

– Distributing read load

– maintance without downtime

Page 56: Distilled mongo db by Boris Trofimov

Sharding + ReplicaSet

● Allows to build huge scalable failover database

Page 57: Distilled mongo db by Boris Trofimov

MongoDB Criticism

● Dataloss reports on heavy-write configurations● Atomic operatons over multiple documents

When not to use

● Heavy cross-document atomic operations● Queries against varying aggregate structure

Page 58: Distilled mongo db by Boris Trofimov

Tips● Do not use autoincrement ids● Small names are are preffered● By default DAO methods are async● Think twise on collection design● Use atomic modifications for a document

Page 59: Distilled mongo db by Boris Trofimov

Out of scope

● MapReduce options● Indexes● Capped collections

Page 60: Distilled mongo db by Boris Trofimov

Further reading

http://www.mongodb.org

Kyle Banker, MongoDB in Action

Martin Fowler NoSQL Distilled

Page 61: Distilled mongo db by Boris Trofimov

Thank you!