webinar: mongodb and polyglot persistence architecture
TRANSCRIPT
Polyglot Persistence
{ Name: ‘Bryan Reinero’,
Title: ‘Developer Advocate’,
Twitter: ‘@blimpyacht’,
Email: ‘[email protected]’ }
What is the Polyglots?
• Using multiple Database Technologies in a Given Application
• Using the right tool for the right job
What is the Polyglots?
• Using multiple Database Technologies in a Given Application
• Using the right tool for the right job
Derived from “polyglot programming”. Applications programmed from a mix of languages.
Why is the Polyglots?
• Relational has been the dominant model• Higher performance requirements• Increasingly large datasets• Use of IaaS and commodity hardware
Vertical Scaling
Horizontal Scaling
7
Availability
http://avstop.com/ac/flighttrainghandbook/imagel4b.jpg
8
Availability
http://avstop.com/ac/flighttrainghandbook/imagel4b.jpg
Requirements• Maximize uptime• Minimize time to recover
9
Availability
http://avstop.com/ac/flighttrainghandbook/imagel4b.jpg
Requirements• Maximize uptime• Minimize time to recover
Hardware failures
Network partitions
Data center failures
Maintenance Operations
10
Availability
http://avstop.com/ac/flighttrainghandbook/imagel4b.jpg
Business critical systems require automatic fault detection and fail over
11
Variant Data Models
58842
45647
52320
88237
78932
Key-Value Store
Eratosthenes
Democritus
Hypatia
Shemp
Euripides
ID Name
12
Variant Data Models
Eratosthenes
Democritus
Hypatia
Shemp
Euripides
Graph Databases
13
Variant Data Models
Document Databases{
maker : ”Agusta",type : sportbike,rake : 7,trail : 3.93,engine : {
type : "internal combustion",layout : "inline"cylinders : 4,displacement : 750,
},transmission : {
type : "cassette",speeds : 6,pattern : "sequential”,ratios : [ 2.7, 1.94, 1.34, 1,
0.83, 0.64 ]}
}
The Goals of Normalization
• Model data an understandable form
• Reduce fact redundancy and data inconsistency
• Enforce integrity constraints
Polyglot Persistence
ApplicationServers MongoDB
RDBMS
Key / Value
Session Data, Shopping Carts
Product Catalog,User Accounts,Domain Objects
PaymentSystems,Reporting
GraphSocial Data,Recommendations
Polyglot Persistence
ApplicationServers MongoDB
RDBMS
Key / Value
Session Data, Shopping Carts
Product Catalog,User Accounts,Domain Objects
PaymentSystems,Reporting
GraphSocial Data,Recommendations
What are your requirements?
• Availability• Scalability• Performance• Access Patterns• Data Model
18
Key Value Stores
58842
45647
52320
88237
78932
Used for• Session data• Cookies• Shopping carts
Eratosthenes
Democritus
Hypatia
Shemp
Euripides
ID Name
19
Key Value Stores
58842
45647
52320
88237
78932
• Fast, if in memory• Single access pattern• Complex data parsed
in client
Eratosthenes
Democritus
Hypatia
Shemp
Euripides
ID Name
Key Value Store
“{maker : ‘Agusta’,type : sportbike,rake : 7,trail : 3.93,engine : {
type : ‘internal combustion’,layout : ‘inline’,cylinders : 4,displacement : 750,
},transmission : {
type : ‘cassette’,speeds : 6,pattern : ‘sequential’,ratios : [ 2.7, 1.94, 1.34, 1, 0.83, 0.64 ]
}}”
MongoDB
{ _id: 78234974,maker : ”Agusta",type : sportbike,rake : 7,trail : 3.93,engine : {
type : "internal combustion",layout : "inline"cylinders : 4,displacement : 750,
},transmission : {
type : "cassette",speeds : 6,pattern : "sequential”,ratios : [ 2.7, 1.94, 1.34, 1, 0.83, 0.64 ]
}}
Self Defining Schema
MongoDB
{ _id: 78234974,maker : ”Agusta",type : sportbike,rake : 7,trail : 3.93,engine : {
type : "internal combustion",layout : "inline"cylinders : 4,displacement : 750,
},transmission : {
type : "cassette",speeds : 6,pattern : "sequential”,ratios : [ 2.7, 1.94, 1.34, 1, 0.83, 0.64 ]
}}
Self Defining SchemaNested Objects
MongoDB
{ _id: 78234974,maker : ”Agusta",type : sportbike,rake : 7,trail : 3.93,engine : {
type : "internal combustion",layout : "inline"cylinders : 4,displacement : 750,
},transmission : {
type : "cassette",speeds : 6,pattern : "sequential”,ratios : [ 2.7, 1.94, 1.34, 1, 0.83, 0.64 ]
}}
Self Defining SchemaNested ObjectsArray types
MongoDB
{ _id: 78234974,maker : ”Agusta",type : sportbike,rake : 7,trail : 3.93,engine : {
type : "internal combustion",layout : "inline"cylinders : 4,displacement : 750,
},transmission : {
type : "cassette",speeds : 6,pattern : "sequential”,ratios : [ 2.7, 1.94, 1.34, 1, 0.83, 0.64 ]
}}
Primary Key,Auto indexed
MongoDB
{ _id: 78234974,maker : ”Agusta",type : sportbike,rake : 7,trail : 3.93,engine : {
type : "internal combustion",layout : "inline"cylinders : 4,displacement : 750,
},transmission : {
type : "cassette",speeds : 6,pattern : "sequential”,ratios : [ 2.7, 1.94, 1.34, 1, 0.83, 0.64 ]
}}
Secondaryindexes
MongoDB
{ _id: 78234974,maker : ”Agusta",type : sportbike,rake : 7,trail : 3.93,engine : {
type : "internal combustion",layout : "inline"cylinders : 4,displacement : 750,
},transmission : {
type : "cassette",speeds : 6,pattern : "sequential”,ratios : [ 2.7, 1.94, 1.34, 1, 0.83, 0.64 ]
}}
Projectionsdb.vehicles.find ( {_id:78234974 }, { engine:1,_id:0 })
Data Model
RDBMS MongoDBTable, View ➜ CollectionRow ➜ DocumentIndex ➜ IndexJoin ➜ Embedded DocumentForeign Key ➜ ReferencePartition ➜ Shard
Flexible Schemas
{ maker : "M.V. Agusta",type : sportsbike,engine : {
type : ”internal combustion",
cylinders: 4,displacement : 750
},rake : 7,trail : 3.93
}{ maker : "M.V. Agusta",
type : Helicopterengine : {
type : "turboshaft"layout : "axial”,massflow : 1318
},Blades : 4undercarriage : "fixed"
}
Flexible Schemas
Discriminator column
{ maker : "M.V. Agusta",type : sportsbike,engine : {
type : ”internal combustion",
cylinders: 4,displacement :
750},rake : 7,trail : 3.93
}{ maker : "M.V. Agusta",
type : Helicopterengine : {
type : "turboshaft"
layout : "axial”,massflow : 1318
},Blades : 4undercarriage : "fixed"
}
Flexible Schemas
Shared indexing strategy
{ maker : "M.V. Agusta",type : sportsbike,engine : {
type : ”internal combustion",
cylinders: 4,displacement :
750},rake : 7,trail : 3.93
}{ maker : "M.V. Agusta",
type : Helicopterengine : {
type : "turboshaft"
layout : "axial”,massflow : 1318
},Blades : 4undercarriage : "fixed"
}
Flexible Schemas
Polymorphic Attributes
{ maker : "M.V. Agusta",type : sportsbike,engine : {
type : ”internal combustion",
cylinders: 4,displacement :
750},rake : 7,trail : 3.93
}{ maker : "M.V. Agusta",
type : Helicopter,engine : {
type : "turboshaft”,
layout : "axial”,massflow : 1318
},Blades : 4,undercarriage : "fixed"
}
Tao of MongoDB
• Model data for use, not storage• Avoid ad-hoc queries• Index effectively, index efficiently
Strong Consistency vs.
Eventual Consistency
Availability
Availablity
Fail-over
Fail-over
Strong vs. Eventual Consistency
Strong vs. Eventual Consistency
Node A
Node B
Node C
Node E
Node D
Client 1
Client 2
Strong vs. Eventual Consistency
Node A
Node B
Node C
Node E
Node D
Client 1
Client 2
Write
Strong vs. Eventual Consistency
Node A
Node B
Node C
Node E
Node D
Client 1
Client 2
Read
Write
Strong vs. Eventual Consistency
Node A
Node B
Node C
Node E
Node D
Client 1
Client 2
Write
Read
Strong vs. Eventual Consistency
Node A
Node B
Node C
Node E
Node D
Client 1
Client 2
Write
Read
Analytics
45
Hadoop
A framework for distributed processing of large data sets• Terabyte and petabyte datasets• Data warehousing• Advanced analytics• Not a database• No indexes• Batch processing
46
Use Cases
• Behavioral analytics• Segmentation• Fraud detection• Prediction• Pricing analytics• Sales analytics
47
Data Management
HadoopOffline ProcessingAnalyticsData Warehousing
MongoDBOnline OperationsApplicationOperational
48
Typical Implementations
Application Server
49
MongoDB as an Operational Store
Application Server
50
Data Flows
HadoopConnector
BSON Files
MapReduce & HDFS
51
Cluster
MONGOS
SHARD A
SHARDB
SHARD C
SHARD D
MONGOS Client
52
53
Hadoop / Spark Trade-offs
Plus• Access to Analytics
Libraries• Processes unstructured
data• Handles petabyte data
sets
Minus• Overhead of a separate
distributed system• Writing MapReduce not
for the faint of heart• Designed for batch
oriented processing
54
Relational for Reporting & Business Intelligence
Plus• Existing ecosystem of BI
tools• Lower overhead than
Hadoop clusters• Large pool of expertise
and talent
RDBMSPrimary ETL
Oplog
Replication
Integrations & ETL
RDBMSPrimary
LucenePrimaryMongo
Connector
Oplog
Replication
Integrations with Search Solutions
Considerations
• Increased system complexity
• Operations overhead• Increased expertise
Thanks!
{ Name: ‘Bryan Reinero’,
Title: ‘Developer Advocate’,
Twitter: ‘@blimpyacht’,
Email: ‘[email protected]’ }