NoSQL DistilledA guide to polyglot persistence
#NoSQLDistilled@pramodsadalageThoughtWorks Inc.
Tuesday, June 11, 13
Why RDBMS?
Tuesday, June 11, 13
ACID TransactionsAtomicityConsistencyIsolationDurability
Tuesday, June 11, 13
Standard Query Interface
Tuesday, June 11, 13
Interact with many languages
Tuesday, June 11, 13
Everyone knows SQL
Tuesday, June 11, 13
Everyone knows SQL
Tuesday, June 11, 13
Limit less indexing
Tuesday, June 11, 13
Handles many data models
Tuesday, June 11, 13
Why NoSQL
Tuesday, June 11, 13
Schema changes are hard
Tuesday, June 11, 13
line items:
customer: Ann
$4820321293533
$3910321601912
$5110131495054
$96
$39
$51
payment details:
Card: AmexCC Number: 12345expiry: 04/2001
ID: 1001orders
customers
order lines
credit cards
Impedance mismatchTuesday, June 11, 13
Application vs Integration databases
Billing
Inventory
Billing
Inventory
Integration Database
Application Database web service
Tuesday, June 11, 13
Running on clustersTuesday, June 11, 13
Un-Structured Data
Tuesday, June 11, 13
Un-Even rate of data growth
Tuesday, June 11, 13
Domain Models
Tuesday, June 11, 13
Domain driven data models
Tuesday, June 11, 13
RDBMS dataTuesday, June 11, 13
Aggregate model (Embedding objects)
Tuesday, June 11, 13
Aggregate Data
// in customers{ "customer": { "id": 1, "name": "Martin", "billingAddress": [{"city": "Chicago"}], "orders": [ { "id":99, "orderItems":[ { "productId":27, "price": 32.45, "productName": "NoSQL Distilled" } ], "shippingAddress":[{"city":"Chicago"}] "orderPayment":[ { "ccinfo":"1000-1000-1000-1000", "txnId":"abelif879rft", "billingAddress": {"city": "Chicago"} } ], } ] }}
Tuesday, June 11, 13
Aggregate model (Referencing Objects)
Tuesday, June 11, 13
Aggregate data
// in Customers{ "id":1, "name":"Martin", "billingAddress":[{"city":"Chicago"}]}// in Orders{ "id":99, "customerId":1, "orderItems":[ { "productId":27, "price": 32.45, "productName": "NoSQL Distilled" } ], "shippingAddress":[{"city":"Chicago"}] "orderPayment":[ { "ccinfo":"1000-1000-1000-1000", "txnId":"abelif879rft", "billingAddress": {"city": "Chicago"} } ],}
Tuesday, June 11, 13
Aggregate Orientation
Tuesday, June 11, 13
RDBMS’s have no concept of aggregates
Tuesday, June 11, 13
Aggregates reduce the need for ACID
Tuesday, June 11, 13
Better for clusters, can be distributed easily
Tuesday, June 11, 13
Key-ValueDocument
Column-Family
Tuesday, June 11, 13
Key Value Databases
Tuesday, June 11, 13
Key-ValueDatabase
•One Key-One Value•Value is opaque to database•Like a Hash•Some are distributed
Oracle Riak
instance cluster
table bucket
row key-value
row-id key
Tuesday, June 11, 13
“key” (VIN) “value” (car facts)
JTTDR… …
make#Ford model#Mustang
year#2011 ….
Tuesday, June 11, 13
Document Databases
Tuesday, June 11, 13
Document Database
•One Key-One Value•Value is visible to database•Value can be queries•JSON/XML documents
Oracle MongoDB
instance mongod
schema database
table collection
row document
row_id _id
Tuesday, June 11, 13
“id” (VIN) “document” (car facts)
JTTDR…
{… “make”: “Ford”, “model”: “Mustang”, “year”: 2011, … }
Tuesday, June 11, 13
Column-Family Databases
Tuesday, June 11, 13
Column-Family Database
•Data organized as columns•Each row has row key•Columns have versioned data
•Row data is sorted by column name
Oracle Cassandra
instance cluster
database keyspace
table column-family
row rowcolumns same for
every row
columns can be different for each row
Tuesday, June 11, 13
“id” (VIN) “column families”(car facts)
JTTDR…
{… “car”:{“make”: “Ford”, “model”: “focus”…} “service”:{…} }
Tuesday, June 11, 13
Key-Points Aggregate Databases
Tuesday, June 11, 13
Inter-aggregate relations are hard to maintain
Tuesday, June 11, 13
Schema-less means implicit schema
Tuesday, June 11, 13
Graph Databases
Tuesday, June 11, 13
Tuesday, June 11, 13
Graph Databases
•Is multi-relational graph•Relationships are first-class citizens
•Traversal algorithms•Nodes and Edges can have data (key-value pairs)
Tuesday, June 11, 13
Graph databases work best for data with complex
relations
Tuesday, June 11, 13
Key-Value Database Usage
Tuesday, June 11, 13
Session Storage
Tuesday, June 11, 13
User Profiles/Preferences
Tuesday, June 11, 13
Shopping Cart
Tuesday, June 11, 13
Single user analytics
Tuesday, June 11, 13
Document Database Usage
Tuesday, June 11, 13
Event Logging
Tuesday, June 11, 13
Prototype development
Tuesday, June 11, 13
eCommerce Application
Tuesday, June 11, 13
Content Management Applications
Tuesday, June 11, 13
Column-Family Database Usage
Tuesday, June 11, 13
Large write volume
Tuesday, June 11, 13
Content Management
Tuesday, June 11, 13
eCommerce Application
Tuesday, June 11, 13
Graph Database Usage
Tuesday, June 11, 13
Connected Data
Tuesday, June 11, 13
Routing things/money
Tuesday, June 11, 13
Location Services
Tuesday, June 11, 13
Recommendation engines
Tuesday, June 11, 13
Schema-less really?
Tuesday, June 11, 13
Schema-free does not mean no schema-migration
Tuesday, June 11, 13
Schema is implicit in code
Tuesday, June 11, 13
Data must be migrated, when schema in code is
changed
Tuesday, June 11, 13
All data need not be migrated at the same time
(lazy migration)
Tuesday, June 11, 13
Polyglot Persistence
Tuesday, June 11, 13
Use different data storage technology for varying
needs
Tuesday, June 11, 13
Can be across the enterprise or in single
application
Tuesday, June 11, 13
Encapsulate data access through services
Tuesday, June 11, 13
Order persistence service
Document store
e-commerce platform
Session/Cart storage service
Key-Value store
Inventory and Price service
RDBMS(Legacy DB)
Nodes and Relations service
Graph store
Shopping cart and session data
Completed Orders
Inventoryand
Item Price Customer social graph
Tuesday, June 11, 13
RDBMS
e-commerce platform
Shopping cart data
Completed Orders
Session dataSOLR
Searchrequests
Update Indexed Data
Update indexed data, batch or realtime
Tuesday, June 11, 13
martinfowler.com/bliki/PolyglotPersistence.html
Session Storage
Financial Data
Shopping Cart
Recommendationengine
Product Catalog
Reporting Analytics User Activity Logs
Speculative Retail Web Application
Redis RDBMS Riak Neo4J
MongoDB RDBMS Cassandra Cassandra
Tuesday, June 11, 13
Experience
Tuesday, June 11, 13
e-learning (before)
Tuesday, June 11, 13
e-learning (after)
Tuesday, June 11, 13
e-commerce (before)
Tuesday, June 11, 13
e-commerce (after)
Tuesday, June 11, 13
How do I choose?
Tuesday, June 11, 13
Choose for programmer productivity
Tuesday, June 11, 13
Choose for data access performance
Tuesday, June 11, 13
Choose to stick with the default
Tuesday, June 11, 13
Choose by testing your expectations
Tuesday, June 11, 13
Try the databases, they are all open-source
Tuesday, June 11, 13
Thanks#NoSQLDistilled
@pramodsadalagesadalage.com
Tuesday, June 11, 13