the rise of nosql and polyglot persistence

54
Abdelmonaim Remani | Just.me Inc. The Rise of NoSQL and Polyglot Persistence

Upload: abdelmonaim-remani

Post on 05-Dec-2014

7.115 views

Category:

Technology


4 download

DESCRIPTION

The rise of NoSQL is characterized with confusion and ambiguity; very much like any fast-emerging organic movement in the absence of well-defined standards and adequate software solutions. Whether you are a developer or an architect, many questions come to mind when faced with the decision of where your data should be stored and how it should be managed. The following are some of these questions: What does the rise of all these NoSQL technologies mean to my enterprise? What is NoSQL to begin with? Does it mean "No SQL"? Could this be just another fad? Is it a good idea to bet the future of my enterprise on these new exotic technologies and simply abandon proven mature Relational DataBase Management Systems (RDBMS)? How scalable is scalable? Assuming that I am sold, how do I choose the one that fit my needs best? Is there a middle ground somewhere? What is this Polyglot Persistence I hear about? The answers to these questions and many more is the subject of this talk along with a survey of the most popular of NoSQL technologies. Be there or be square.

TRANSCRIPT

Page 1: The Rise of NoSQL and Polyglot Persistence

Abdelmonaim Remani | Just.me Inc.

The Rise of NoSQL and Polyglot Persistence

Page 2: The Rise of NoSQL and Polyglot Persistence

About Me

• Software Architect at Just.me Inc.• Interested in technology evangelism and enterprise software

development and architecture• Frequent speaker (JavaOne, JAX, OSCON, ORDEV, etc…)• Open-source advocate• President and founder of a number of user group

– NorCal Java User Group– The Silicon Valley Spring User Group– The Silicon Valley Dart Meetup

• Bio: http://about.me/PolymathicCoder• Twitter: @PolymathicCoder• Email: [email protected]

Page 3: The Rise of NoSQL and Polyglot Persistence

License

• Creative Commons Attribution Non-Commercial 3.0 Unported– http://creativecommons.org/licenses/by-nc/3.0

• Disclaimer: The graphics and the logo in the presentation belong to their rightful owners

Page 4: The Rise of NoSQL and Polyglot Persistence

The Golden Age of Relational Databases

Page 5: The Rise of NoSQL and Polyglot Persistence

Relational Data Stores

• Relational Data Stores have been the predominant choice in storing data– The existence mature solutions

• Oracle, MySQL, Ms SQL Server, etc…

– Wide adoption and familiarity• Developers and even advanced business users

– An abundance of tools– Etc…

• It became the De-Facto standard

Page 6: The Rise of NoSQL and Polyglot Persistence

The Relational Model

• Data– Stored in

• 2 dimensional tables (Relations)• Rows (tuples) and columns (attributes)

• Has well-define enforced schema– Relations themselves– Integrity constrains

• Normalization– Smaller tables with well-defined relationship between them– Why?

• Minimized redundancy• No modification anomalies

– Modification Propagation or cascading

Page 7: The Rise of NoSQL and Polyglot Persistence

The Relational Model

• Supported by SQL (Structured Query Language)– A somewhat standardized query language– Very flexible– Many Operations

• Across multiple relations such as JOIN• Aggregations such as GROUP BY• Etc…

Page 8: The Rise of NoSQL and Polyglot Persistence

The Relational Model

•Transactional• ACID

– Atomicity» All or nothing

– Consistency» From one valid state to another

– Isolation» Concurrency result in a valid state

– Durability» Once committed, it’s forever

Page 9: The Rise of NoSQL and Polyglot Persistence

The Relational Model

• Designed with the assumptions that

– The end-user will directly interact with database

» It makes sense that the RDBMS should manage concurrency and integrity

» Access Patterns are unknown

» A flexible query language that is close to English

» Data structure with no bias towards a particular pattern of querying

– The database runs on a single machine

» The only way to promise true ACID

Page 10: The Rise of NoSQL and Polyglot Persistence

Road Bumps

• We started building more complex applications on top of relational databases

– Business logic moved out of the RDBMS

» Fewer triggers and stored procedures and replaced by equivalent application layer code

– The applications themselves evolved beyond the procedural paradigm to a more OOP approach

» The Object-Relational impedance mismatch

» ORM framework to the rescue

Page 11: The Rise of NoSQL and Polyglot Persistence

Scalability

Page 12: The Rise of NoSQL and Polyglot Persistence

We became data hoarders!

• As our datasets grew out of control• Performance decreases exponentially

– We buy a beefier machines• Larry Ellison’s most expensive RAC and make

him even richer

• This put off the problem for a little while

Page 13: The Rise of NoSQL and Polyglot Persistence

Optimization

• We hire a guy– Indexes half of the databases

• Made those queries a little faster

– Creates materialized views for complex joins• Nightmare to maintain, get stale, etc…

– He de-normalizes• Any thing but a smooth transition!• Redundancy

– He introduces Caching• Data too stale• More redundancy

Page 14: The Rise of NoSQL and Polyglot Persistence

Clustering

• We hire another guy– Tells us that we hit the limit of the one machine– You need to scale out (Horizontally)

• Master/Slave– Assuming you read more than you write– Write to the Master and Read from the Slaves– Master needs to replicate data across the slaves

» Risk incorrect reads– How’s that consistent?!!

• Sharding– Improves reads as much as writes– Can’t join across partitions– No referential integrity– Requires modification of client applications– Introduces a single-point of failure– How’s that consistent?!!

Page 15: The Rise of NoSQL and Polyglot Persistence

What’s the Point?

• We vertically scale our relational database– We’re no longer consistent– No ACIDity?– We loose query flexibility

• Are we doing something wrong?

Page 16: The Rise of NoSQL and Polyglot Persistence

The CAP Theorem

Page 17: The Rise of NoSQL and Polyglot Persistence

The CAP Theorem

• Eric Brewer on distributed systems– Pick tow out of

• Consistency• Availability• Partition Tolerance

• There is Fast Cheap Good service– Cheap Good service won’t be Fast– Fast Good service won’t be Cheap– Fast Cheap service won’t be Good

Page 18: The Rise of NoSQL and Polyglot Persistence

Relational Model & CAP

• Relational Data Stores happen to favor– Consistency and Availability– For historical reasons

• They are key to certain type of applications• The bank example

– I deposit $100 in my friend’s bank account– Blah blah blah…

• According to CAP, Partition Tolerance is impossible meaning that horizontal scaling is impossible

Page 19: The Rise of NoSQL and Polyglot Persistence

Scheiße!

• We’re in a pickle– Too much data in CA model– Vertical Scaling

• Too expensive• Not sustainable

• Forced to explore other alternatives in light of CAP

Page 20: The Rise of NoSQL and Polyglot Persistence

What AP Looks Like

• Partition Tolerance– Since we reached the limit of the one machine

we have no choice but to scale horizontally– Which means to be partition tolerant

• Availability– Nobody is willing to give up most of the time– This becomes even better with distribution– In a cluster of servers

• The individual node might be unreliable by itself• But a whole inherently reliable

Page 21: The Rise of NoSQL and Polyglot Persistence

What AP Looks Like

• According the CAP we simply cannot have C• Consistency

– I make a update and all subsequent read the most updated value

– Unfortunately this is impossible as it takes time for the change to be replicated across each node of the cluster

• What a bummer?!• Let’s look and AP system

– DNS (Domain Naming Service)• Not all the nodes have the most updated records (You

register that domain name and wait for a few days to guarantee that every DNS knows about it)

Page 22: The Rise of NoSQL and Polyglot Persistence

Eventual Consistency

• This is no so bad– It means that we just settled for a lesser degree Consistency

• So what if– Mohammad in Morocco updated his relationship status to

single on an some edge node– His cousin who lives Spain saw it immediately because they

happen to be on the same edge node– His secret admirer Sara who lives in the United States could

not see it until an hour later– His bother in Japan got the update the next day– They all got it eventually!

• Eventual Consistency as Opposed to Immediate Consistency

Page 23: The Rise of NoSQL and Polyglot Persistence

The Compromise

• We settle for weaker consistency model– BASE

• Basically Available• Soft state• Eventual Consistency

• ACID on the individual node BASE on the cluster

Page 24: The Rise of NoSQL and Polyglot Persistence

The Slippery Slope of the Faithless

Page 25: The Rise of NoSQL and Polyglot Persistence

You might as well Question…

• Schema– Logical

• Well-defined and rigid in relational databases• Why not a flexible one or even no schema

– Physical• B Trees in most relational databases• Why not use some other underlying data

structure

Page 26: The Rise of NoSQL and Polyglot Persistence

You might as well Question…

• Integrity Constraints– Who cares?

• A Query Language– Anything would do…

• Security– None

• Name it…

Page 27: The Rise of NoSQL and Polyglot Persistence

NoSQL: Going Rogue…

Page 28: The Rise of NoSQL and Polyglot Persistence

NoSQL

• A wide range of specialized data stores with the goal of addressing the challenges of the relational model

• Eric Evans – The whole point of seeking alternatives is that you

need to solve a problem that relational databases are a bad fit for

• Let me make it easier– It is does not anti-SQL or anti-Relational– Any data store that is non-relational

• “Not Only SQL” instead of “NO SQL”

Page 29: The Rise of NoSQL and Polyglot Persistence

SQL vs. NoSQL

A single machine

CA

Scale Vertically

SQL

ACID

Full Indexes

A cluster

AP/CA/CP

Scale Horizontally

Custom APIs

BASE

Mostly on Keys

There are outliers of course

Page 30: The Rise of NoSQL and Polyglot Persistence

SQL vs. NoSQL

Rigid Schema

Flexible Queries

Schema-less

Pre-defined Queries

There are outliers of course

• SQL (Relational)– Concerned about what the data consists of

• NoSQL (Non-Relational)– Concerned with how the data is queried

Page 31: The Rise of NoSQL and Polyglot Persistence
Page 32: The Rise of NoSQL and Polyglot Persistence

The Zoo

Page 33: The Rise of NoSQL and Polyglot Persistence

Key-Value Data Stores

• Basically a big hash map associative array– Very Simple– Very fast read and write– No secondary indexes

• Use When– Your data is not highly related– All you need is basic CRUD

• Challenges– Complex queries

• Check out the Amazon Dynamo Paper• http://s3.amazonaws.com/AllThingsDistributed/sosp/amazon-dynamo-sosp2007.

pdf

• Featured Projects– DynamoDB http://hbase.apache.org/– Riak http://wiki.basho.com/– Redis http://redis.io/

Page 34: The Rise of NoSQL and Polyglot Persistence

Columnar Stores• In a table, data of the same column is stored together

– Storage is not wasted on null value as in row-based stores (RDBMS)– Great for sparse tables– Very fast column operation including aggregation

• Use When– Big Data (Excellent leverage of Map Reduce)– Need compression or versioning

• Challenges– You better know your access patterns before hand– Keys design is not trivial

• Check out Google’s BigTable Paper– http://static.googleusercontent.com/external_content/untrusted_dlcp/researc

h.google.com/en/us/archive/bigtable-osdi06.pdf

• Featured Projects– Hbase http://hbase.apache.org/– Cassanda http://cassandra.apache.org/

Page 35: The Rise of NoSQL and Polyglot Persistence

Document Data Stores• Nested structures of hashes and their values

– A document can be• Simply a hash and its value• Hash and another document as its value• No limit in depth

– Very Flexible schema– Well-Indexed data– Works well with OOP (No impedance mismatch)– De-normalize as a best practice

• Use when– You don’t know much about the schema– The schema very likely to change

• Challenges– Complex Join-like queries– Self-referencing documents and circular dependencies

• Projects– MongoDB http://www.mongodb.org/– CouchDB http://couchdb.apache.org/

Page 36: The Rise of NoSQL and Polyglot Persistence

Graph Data Stores

• A graph– Perfect for highly interconnected data– Allows for explicit relationships– Fined graph grained-traversal– Very Flexible– Works well with OOP (No impedance mismatch)

• Use when– Your data looks like a graph and requires graph question– You are smart enough not to try this on another data store

• Challenges– Doesn’t scale-well horizontally

• Featured Projects– Neo4j http://neo4j.org/

Page 37: The Rise of NoSQL and Polyglot Persistence

• Use when– Your data Highly relational– There is a need to break data into small pieces and assemble it in

different ways– When consistence is king– Access patterns are unknown– Reporting

• Challenges– Doesn’t scale-well horizontally

• Featured Projects– Oracle http://www.oracle.com/index.html– Postgres http://www.postgresql.org/– Ms SQL Server http://dev.mysql.com/– MySQL http://www.mysql.com/

Relational Data Stores

Page 38: The Rise of NoSQL and Polyglot Persistence

How do you choose?

Page 39: The Rise of NoSQL and Polyglot Persistence

If It Doesn’t Fit, You Must Acquit!

• Data– Does it have a natural structure?– How it is connected to each other?– How is it distributed?– How much?

• Access Patterns– Reads/Writes ratio?– Uniform or random?

• CAP

Page 40: The Rise of NoSQL and Polyglot Persistence

Other Considerations

• Maturity• Stability• Maintainability• Durability• Cost• Tools• Familiarity

Page 41: The Rise of NoSQL and Polyglot Persistence

For Fairness’ Sake!

Page 42: The Rise of NoSQL and Polyglot Persistence

For Fairness’ Sake!

• Relational data stores did not fail us– They actually perform very well

• We failed ourselves– By using them as solutions for problems

they weren’t designed to solve to begin with

• Take any data store and you’ll get as much trouble

Page 43: The Rise of NoSQL and Polyglot Persistence

For Fairness’ Sake!

• You can’t expect– A flathead screwdriver to work on a Philips

as well as one with the matching Philips blade

– A crosshead screwdriver to work on flathead screw

Page 44: The Rise of NoSQL and Polyglot Persistence

Polyglot Persistence

Page 45: The Rise of NoSQL and Polyglot Persistence

Polyglot Persistence

• Enterprise application are complex and combine complex problems– Assumption that we should use one data store is

absurd– You can’t try to fit all in one model and expect no

problem• Polyglot Persistence

– To leverage multiple data storages, based on the way data is used by the application• Associated with a learning curve• Long term investment (More productive in the long-run)

– Leverage the strength of multiple data stores

Page 46: The Rise of NoSQL and Polyglot Persistence

Polyglot Persistence

• Example– MongoDB for the product catalog– Redis for shopping cart– DynamoDB for social profile info– Neo4j for the social graph– HBase for inbox and public feed messages– MySQL for payment and account info– Cassandra for audit and activity log

• Disclaimer: I’m not making any recommendation here.

Page 47: The Rise of NoSQL and Polyglot Persistence

NoSQL in the Cloud

Page 48: The Rise of NoSQL and Polyglot Persistence

NoSQL in the Cloud

• NoSQL as a commodity– Fully managed data stores (No

maintenance)– Elastic scaling– Cheap storage

• Featured:– Amazon AWS– Heroku Add-ons– CloudFoundry

Page 49: The Rise of NoSQL and Polyglot Persistence

As Promised!

Page 50: The Rise of NoSQL and Polyglot Persistence

The A’s the Q’s in the Abstract

• What does the rise of all these NoSQL mean to my enterprise?– I’m guessing a lot

• What is NoSQL to begin with?– Any non-relational data store

• Does it mean “NO SQL”?– No

• Could this be just another fad?– I don’t think so

Page 51: The Rise of NoSQL and Polyglot Persistence

The A’s the Q’s in the Abstract

• Is a good idea to be the future of my enterprise on these new exotic technologies and simply abandon proven mature RDBMS?– It’s up to you. I will say “No guts, no glory!”

• How scalable is scalable?– However much you need it to be

Page 52: The Rise of NoSQL and Polyglot Persistence

The A’s the Q’s in the Abstract

• Assuming that I am sold, how do I choose the one that fits my needs the best?– I’ll tell you if you hire me

• Is there a middle ground somewhere?– Polyglot Persistence

• What is this Polyglot Persistence I hear about?– It’s the middle ground

Page 53: The Rise of NoSQL and Polyglot Persistence

Any Other Questions?

Page 54: The Rise of NoSQL and Polyglot Persistence

Thank You All!

@PolymathicCoder