webinar: mongodb and polyglot persistence architecture

Post on 10-Aug-2015

1.277 Views

Category:

Technology

1 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Polyglot Persistence

{ Name: ‘Bryan Reinero’,

Title: ‘Developer Advocate’,

Twitter: ‘@blimpyacht’,

Email: ‘bryan@mongdb.com’ }

What is the Polyglots?

• Using multiple Database Technologies in a Given Application

• Using the right tool for the right job

What is the Polyglots?

• Using multiple Database Technologies in a Given Application

• Using the right tool for the right job

Derived from “polyglot programming”. Applications programmed from a mix of languages.

Why is the Polyglots?

• Relational has been the dominant model• Higher performance requirements• Increasingly large datasets• Use of IaaS and commodity hardware

Vertical Scaling

Horizontal Scaling

7

Availability

http://avstop.com/ac/flighttrainghandbook/imagel4b.jpg

8

Availability

http://avstop.com/ac/flighttrainghandbook/imagel4b.jpg

Requirements• Maximize uptime• Minimize time to recover

9

Availability

http://avstop.com/ac/flighttrainghandbook/imagel4b.jpg

Requirements• Maximize uptime• Minimize time to recover

Hardware failures

Network partitions

Data center failures

Maintenance Operations

10

Availability

http://avstop.com/ac/flighttrainghandbook/imagel4b.jpg

Business critical systems require automatic fault detection and fail over

11

Variant Data Models

58842

45647

52320

88237

78932

Key-Value Store

Eratosthenes

Democritus

Hypatia

Shemp

Euripides

ID Name

12

Variant Data Models

Eratosthenes

Democritus

Hypatia

Shemp

Euripides

Graph Databases

13

Variant Data Models

Document Databases{

maker : ”Agusta",type : sportbike,rake : 7,trail : 3.93,engine : {

type : "internal combustion",layout : "inline"cylinders : 4,displacement : 750,

},transmission : {

type : "cassette",speeds : 6,pattern : "sequential”,ratios : [ 2.7, 1.94, 1.34, 1,

0.83, 0.64 ]}

}

The Goals of Normalization

• Model data an understandable form

• Reduce fact redundancy and data inconsistency

• Enforce integrity constraints

Polyglot Persistence

ApplicationServers MongoDB

RDBMS

Key / Value

Session Data, Shopping Carts

Product Catalog,User Accounts,Domain Objects

PaymentSystems,Reporting

GraphSocial Data,Recommendations

Polyglot Persistence

ApplicationServers MongoDB

RDBMS

Key / Value

Session Data, Shopping Carts

Product Catalog,User Accounts,Domain Objects

PaymentSystems,Reporting

GraphSocial Data,Recommendations

What are your requirements?

• Availability• Scalability• Performance• Access Patterns• Data Model

18

Key Value Stores

58842

45647

52320

88237

78932

Used for• Session data• Cookies• Shopping carts

Eratosthenes

Democritus

Hypatia

Shemp

Euripides

ID Name

19

Key Value Stores

58842

45647

52320

88237

78932

• Fast, if in memory• Single access pattern• Complex data parsed

in client

Eratosthenes

Democritus

Hypatia

Shemp

Euripides

ID Name

Key Value Store

“{maker : ‘Agusta’,type : sportbike,rake : 7,trail : 3.93,engine : {

type : ‘internal combustion’,layout : ‘inline’,cylinders : 4,displacement : 750,

},transmission : {

type : ‘cassette’,speeds : 6,pattern : ‘sequential’,ratios : [ 2.7, 1.94, 1.34, 1, 0.83, 0.64 ]

}}”

MongoDB

{ _id: 78234974,maker : ”Agusta",type : sportbike,rake : 7,trail : 3.93,engine : {

type : "internal combustion",layout : "inline"cylinders : 4,displacement : 750,

},transmission : {

type : "cassette",speeds : 6,pattern : "sequential”,ratios : [ 2.7, 1.94, 1.34, 1, 0.83, 0.64 ]

}}

Self Defining Schema

MongoDB

{ _id: 78234974,maker : ”Agusta",type : sportbike,rake : 7,trail : 3.93,engine : {

type : "internal combustion",layout : "inline"cylinders : 4,displacement : 750,

},transmission : {

type : "cassette",speeds : 6,pattern : "sequential”,ratios : [ 2.7, 1.94, 1.34, 1, 0.83, 0.64 ]

}}

Self Defining SchemaNested Objects

MongoDB

{ _id: 78234974,maker : ”Agusta",type : sportbike,rake : 7,trail : 3.93,engine : {

type : "internal combustion",layout : "inline"cylinders : 4,displacement : 750,

},transmission : {

type : "cassette",speeds : 6,pattern : "sequential”,ratios : [ 2.7, 1.94, 1.34, 1, 0.83, 0.64 ]

}}

Self Defining SchemaNested ObjectsArray types

MongoDB

{ _id: 78234974,maker : ”Agusta",type : sportbike,rake : 7,trail : 3.93,engine : {

type : "internal combustion",layout : "inline"cylinders : 4,displacement : 750,

},transmission : {

type : "cassette",speeds : 6,pattern : "sequential”,ratios : [ 2.7, 1.94, 1.34, 1, 0.83, 0.64 ]

}}

Primary Key,Auto indexed

MongoDB

{ _id: 78234974,maker : ”Agusta",type : sportbike,rake : 7,trail : 3.93,engine : {

type : "internal combustion",layout : "inline"cylinders : 4,displacement : 750,

},transmission : {

type : "cassette",speeds : 6,pattern : "sequential”,ratios : [ 2.7, 1.94, 1.34, 1, 0.83, 0.64 ]

}}

Secondaryindexes

MongoDB

{ _id: 78234974,maker : ”Agusta",type : sportbike,rake : 7,trail : 3.93,engine : {

type : "internal combustion",layout : "inline"cylinders : 4,displacement : 750,

},transmission : {

type : "cassette",speeds : 6,pattern : "sequential”,ratios : [ 2.7, 1.94, 1.34, 1, 0.83, 0.64 ]

}}

Projectionsdb.vehicles.find ( {_id:78234974 }, { engine:1,_id:0 })

Data Model

RDBMS MongoDBTable, View ➜ CollectionRow ➜ DocumentIndex ➜ IndexJoin ➜ Embedded DocumentForeign Key ➜ ReferencePartition ➜ Shard

Flexible Schemas

{ maker : "M.V. Agusta",type : sportsbike,engine : {

type : ”internal combustion",

cylinders: 4,displacement : 750

},rake : 7,trail : 3.93

}{ maker : "M.V. Agusta",

type : Helicopterengine : {

type : "turboshaft"layout : "axial”,massflow : 1318

},Blades : 4undercarriage : "fixed"

}

Flexible Schemas

Discriminator column

{ maker : "M.V. Agusta",type : sportsbike,engine : {

type : ”internal combustion",

cylinders: 4,displacement :

750},rake : 7,trail : 3.93

}{ maker : "M.V. Agusta",

type : Helicopterengine : {

type : "turboshaft"

layout : "axial”,massflow : 1318

},Blades : 4undercarriage : "fixed"

}

Flexible Schemas

Shared indexing strategy

{ maker : "M.V. Agusta",type : sportsbike,engine : {

type : ”internal combustion",

cylinders: 4,displacement :

750},rake : 7,trail : 3.93

}{ maker : "M.V. Agusta",

type : Helicopterengine : {

type : "turboshaft"

layout : "axial”,massflow : 1318

},Blades : 4undercarriage : "fixed"

}

Flexible Schemas

Polymorphic Attributes

{ maker : "M.V. Agusta",type : sportsbike,engine : {

type : ”internal combustion",

cylinders: 4,displacement :

750},rake : 7,trail : 3.93

}{ maker : "M.V. Agusta",

type : Helicopter,engine : {

type : "turboshaft”,

layout : "axial”,massflow : 1318

},Blades : 4,undercarriage : "fixed"

}

Tao of MongoDB

• Model data for use, not storage• Avoid ad-hoc queries• Index effectively, index efficiently

Strong Consistency vs.

Eventual Consistency

Availability

Availablity

Fail-over

Fail-over

Strong vs. Eventual Consistency

Strong vs. Eventual Consistency

Node A

Node B

Node C

Node E

Node D

Client 1

Client 2

Strong vs. Eventual Consistency

Node A

Node B

Node C

Node E

Node D

Client 1

Client 2

Write

Strong vs. Eventual Consistency

Node A

Node B

Node C

Node E

Node D

Client 1

Client 2

Read

Write

Strong vs. Eventual Consistency

Node A

Node B

Node C

Node E

Node D

Client 1

Client 2

Write

Read

Strong vs. Eventual Consistency

Node A

Node B

Node C

Node E

Node D

Client 1

Client 2

Write

Read

Analytics

45

Hadoop

A framework for distributed processing of large data sets• Terabyte and petabyte datasets• Data warehousing• Advanced analytics• Not a database• No indexes• Batch processing

46

Use Cases

• Behavioral analytics• Segmentation• Fraud detection• Prediction• Pricing analytics• Sales analytics

47

Data Management

HadoopOffline ProcessingAnalyticsData Warehousing

MongoDBOnline OperationsApplicationOperational

48

Typical Implementations

Application Server

49

MongoDB as an Operational Store

Application Server

50

Data Flows

HadoopConnector

BSON Files

MapReduce & HDFS

51

Cluster

MONGOS

SHARD A

SHARDB

SHARD C

SHARD D

MONGOS Client

52

53

Hadoop / Spark Trade-offs

Plus• Access to Analytics

Libraries• Processes unstructured

data• Handles petabyte data

sets

Minus• Overhead of a separate

distributed system• Writing MapReduce not

for the faint of heart• Designed for batch

oriented processing

54

Relational for Reporting & Business Intelligence

Plus• Existing ecosystem of BI

tools• Lower overhead than

Hadoop clusters• Large pool of expertise

and talent

RDBMSPrimary ETL

Oplog

Replication

Integrations & ETL

RDBMSPrimary

LucenePrimaryMongo

Connector

Oplog

Replication

Integrations with Search Solutions

Considerations

• Increased system complexity

• Operations overhead• Increased expertise

Thanks!

{ Name: ‘Bryan Reinero’,

Title: ‘Developer Advocate’,

Twitter: ‘@blimpyacht’,

Email: ‘bryan@mongdb.com’ }

top related