Sergio Bossa – [email protected] IV – Roma – 30 gennaio 2010
SCALABLE DATABASESFrom Relational Databases
To Polyglot Persistence
Sergio Bossa [email protected]://twitter.com/sbtourist
Sergio Bossa – [email protected] IV – Roma – 30 gennaio 2010
About Me● Software architect and engineer
● Gioco Digitale (online gambling and casinos)● Open Source enthusiast
● Terracotta Messaging (http://forge.terracotta.org)● Terrastore (http://code.google.com/p/terrastore)● Actorom (http://code.google.com/p/actorom)
● (Micro-)Blogger● http://twitter.com/sbtourist● http://sbtourist.blogspot.com
Sergio Bossa – [email protected] IV – Roma – 30 gennaio 2010
Five fallacies of data-centric systems
Data model is static.Data volume is predictable.
Data access load is predictable.Database topology doesn't change.
Database never fails.
Sergio Bossa – [email protected] IV – Roma – 30 gennaio 2010
Scalable databases in action
● Scaling your database as a way to solve fallacies above.● Scale to handle heterogeneous data.● Scale to handle more data.● Scale to handle more load.● Scale to handle topology changes due to:
● Unplanned growth.● Unpredictable failures.
Sergio Bossa – [email protected] IV – Roma – 30 gennaio 2010
Scaling Relational Databases
Sergio Bossa – [email protected] IV – Roma – 30 gennaio 2010
Master-Slave replication● Master - Slave replication.
● One (and only one) master database.
● One or more slaves.● All writes goes to the master.
● Replicated to slaves.● Reads are balanced among master
and slaves.● Major issues:
● Single point of failure.● Single point of bottleneck.● Static topology.
Sergio Bossa – [email protected] IV – Roma – 30 gennaio 2010
Master-Master replication● Master - Master replication.
● One or more masters.● Writes and reads can go to any
master node.● Writes are replicated among
masters.● Major issues:
● Limited performance and scalability (typically due to 2PC).
● Complexity.● Static topology.
Sergio Bossa – [email protected] IV – Roma – 30 gennaio 2010
Vertical partitioning● Vertical partitioning.
● Put tables belonging to different functional areas on different database nodes.● Scale your data and load by
function.● Move joins to the application
level.● Major issues:
● No more truly relational.● What if a functional area grows too
much?
Sergio Bossa – [email protected] IV – Roma – 30 gennaio 2010
Horizontal partitioning● Horizontal partitioning.
● Split tables by key and put partitions (shards) on different nodes.● Scale your data and load by key.● Move joins to the application
level.● Needs some kind of routing.
● Major issues:
● No more truly relational.● What if your partition grows too
much?
Sergio Bossa – [email protected] IV – Roma – 30 gennaio 2010
Caching● Put a cache in front of your database.
● Distribute.● Write-through for scaling reads.● Write-behind for scaling reads and
writes.● Saves you a lot of pain, but ...
● “Only” scales read/write load.
Sergio Bossa – [email protected] IV – Roma – 30 gennaio 2010
Did we solve our fallacies?● We tried, but ...
● Still bound to the relational model.● Replication only covers a few use cases.● Partitioning is hard.● Caching is good, but not definitive.● ...
● Can we do any better?
Sergio Bossa – [email protected] IV – Roma – 30 gennaio 2010
It's Not Only SQL
Sergio Bossa – [email protected] IV – Roma – 30 gennaio 2010
NOSQL Characteristics● Main traits of characterization:
● Data Model.● Data Processing.● Consistency Model.● Scale Out.
Sergio Bossa – [email protected] IV – Roma – 30 gennaio 2010
Data Model (1)● Column-family based.● Structure:
● Key-identified rows with a sparse number of columns.● Columns grouped in families.● Multiple families for the same key.
● Highlights:● Dynamically add and remove columns.● Efficiently access columns in the same group (column
family).
Sergio Bossa – [email protected] IV – Roma – 30 gennaio 2010
Data Model (2)● Document based.● Structure:
● Key-identified documents.● Schema-less (but optionally constrained).
– JSON, XML ...● Highlights:
● Dynamically change inner documents structure.● Efficiently access documents as a unit.
Sergio Bossa – [email protected] IV – Roma – 30 gennaio 2010
Data Model (3)● Graph based.● Structure:
● Nodes to represent your data.● Relations as meaningful links between nodes.● Properties to enrich both.
● Highlights:● Rich data model.● Efficient, fast, traversal of nodes and relations.
Sergio Bossa – [email protected] IV – Roma – 30 gennaio 2010
Data Model (4)● Key-Value based.● Structure:
● Key-identified opaque values.● Highlights:
● Great flexibility.● Fast reads/writes for single entries.
Sergio Bossa – [email protected] IV – Roma – 30 gennaio 2010
Data Processing● Several options:
● Map/Reduce.● Predicates.● Range Queries.● ...
● One common principle:● Move processing toward related data.
Sergio Bossa – [email protected] IV – Roma – 30 gennaio 2010
Consistency Model (1)● Strict Consistency.
● All nodes ...● At every point in time ...● See a consistent view of the stored data.
– Per-key consistency.– Multi-key consistency.
Sergio Bossa – [email protected] IV – Roma – 30 gennaio 2010
Consistency Model (2)● Eventual Consistency.
● Only a subset of all nodes ...● At a specific point in time ...● See a consistent view of the stored data.
– Other nodes will serve stale data.– Other nodes will eventually get updates later.
Sergio Bossa – [email protected] IV – Roma – 30 gennaio 2010
Scale Out (1)● Master-based.
● Membership managed and broadcasted by masters.
● Data consistency guaranteed by masters.
● No SPOF with active/passive masters.
● No SPOB with active/active masters or cluster-cluster replication.
● Prone to partitioning failures.
Sergio Bossa – [email protected] IV – Roma – 30 gennaio 2010
Scale Out (2)● Peer-to-peer.
● Membership is maintained through multicast or gossip-based protocols.
● Data consistency is maintained through quorum protocols.
● Easier to scale.● Harder to maintain consistency.
Sergio Bossa – [email protected] IV – Roma – 30 gennaio 2010
NOSQL Use Cases● Use cases evolve along the following kinds of data:
● Rich.● Runtime.● Hot Spot.● Massive.● Computational.
● Do not use the same product for all cases.● Pick multiple products for different use cases.
Sergio Bossa – [email protected] IV – Roma – 30 gennaio 2010
NOSQL Products - Cassandra● Cassandra (http://incubator.apache.org/cassandra)● Data Model:
● Column-family based.● Data Processing:
● Range queries, Predicates.● Consistency:
● Eventual consistency.● Scalability:
● Peer-to-peer, gossip based.
Sergio Bossa – [email protected] IV – Roma – 30 gennaio 2010
NOSQL Products - Mongo DB● Mongo DB (http://www.mongodb.org)● Data Model:
● Document based (JSON).● Data Processing:
● Map/Reduce, SQL-like queries.● Consistency:
● Per-document strict consistency.● Scalability:
● Replication, partitioning (alpha).
Sergio Bossa – [email protected] IV – Roma – 30 gennaio 2010
NOSQL Products - Neo4j● Neo4j (http://neo4j.org)● Data Model:
● Graph based.● Data Processing:
● Path traversal, Index-based search.● Consistency:
● Strict consistency.● Scalability:
● Replication.
Sergio Bossa – [email protected] IV – Roma – 30 gennaio 2010
NOSQL Products - Riak● Riak (http://riak.basho.com)● Data Model:
● Document based (JSON).● Data Processing:
● Map/Reduce.● Consistency:
● Eventual consistency.● Scalability:
● Peer-to-peer, gossip based.
Sergio Bossa – [email protected] IV – Roma – 30 gennaio 2010
NOSQL Products - Terrastore● Terrastore (http://code.google.com/p/terrastore)● Data Model:
● Document based (JSON).● Data Processing:
● Range queries, Predicates.● Consistency:
● Per-document strict consistency.● Scalability:
● Master-based.
Sergio Bossa – [email protected] IV – Roma – 30 gennaio 2010
NOSQL Products - Voldemort● Voldemort (http://project-voldemort.com)● Data Model:
● Key-Value.● Data Processing:
● None.● Consistency:
● Eventual consistency.● Scalability:
● Peer-to-peer, gossip based.
Sergio Bossa – [email protected] IV – Roma – 30 gennaio 2010
NOSQL Products and Use Cases
Sergio Bossa – [email protected] IV – Roma – 30 gennaio 2010
Final words● A New World.
● New paradigms.● New use cases.● New products.
● Don't dismiss the old stuff.● Relational databases still have their place.
● Embrace change.● May the NOSQL power be with you.
● Let the Polyglot Persistence era begin!