cassandra core concepts
TRANSCRIPT
![Page 1: Cassandra Core Concepts](https://reader030.vdocuments.net/reader030/viewer/2022021422/58ec058e1a28ab780f8b46cd/html5/thumbnails/1.jpg)
©2013 DataStax Confidential. Do not distribute without consent.
Jon Haddad, Luke Tillman Technical Evangelists, DataStax @rustyrazorblade, @LukeTillman
Cassandra Core Concepts
1
![Page 2: Cassandra Core Concepts](https://reader030.vdocuments.net/reader030/viewer/2022021422/58ec058e1a28ab780f8b46cd/html5/thumbnails/2.jpg)
Small Data• 100's of MB to low GB, single user • sed, awk, grep are great • sqlite • Limitations: • bad for multiple concurrent users (file sharing!)
![Page 3: Cassandra Core Concepts](https://reader030.vdocuments.net/reader030/viewer/2022021422/58ec058e1a28ab780f8b46cd/html5/thumbnails/3.jpg)
Medium Data• Fits on 1 machine • RDBMS is fine • postgres • mysql
• Supports hundreds of concurrent users • ACID makes us feel good • Scales vertically
![Page 4: Cassandra Core Concepts](https://reader030.vdocuments.net/reader030/viewer/2022021422/58ec058e1a28ab780f8b46cd/html5/thumbnails/4.jpg)
Can RDBMS work for big data?
![Page 5: Cassandra Core Concepts](https://reader030.vdocuments.net/reader030/viewer/2022021422/58ec058e1a28ab780f8b46cd/html5/thumbnails/5.jpg)
Replication: ACID is a lie
Master
![Page 6: Cassandra Core Concepts](https://reader030.vdocuments.net/reader030/viewer/2022021422/58ec058e1a28ab780f8b46cd/html5/thumbnails/6.jpg)
Replication: ACID is a lie
Client
Master
![Page 7: Cassandra Core Concepts](https://reader030.vdocuments.net/reader030/viewer/2022021422/58ec058e1a28ab780f8b46cd/html5/thumbnails/7.jpg)
Replication: ACID is a lie
Client
Master
![Page 8: Cassandra Core Concepts](https://reader030.vdocuments.net/reader030/viewer/2022021422/58ec058e1a28ab780f8b46cd/html5/thumbnails/8.jpg)
Replication: ACID is a lie
Client
Master Slave
![Page 9: Cassandra Core Concepts](https://reader030.vdocuments.net/reader030/viewer/2022021422/58ec058e1a28ab780f8b46cd/html5/thumbnails/9.jpg)
Replication: ACID is a lie
Client
Master Slave
![Page 10: Cassandra Core Concepts](https://reader030.vdocuments.net/reader030/viewer/2022021422/58ec058e1a28ab780f8b46cd/html5/thumbnails/10.jpg)
Replication: ACID is a lie
Client
Master Slave
replication lag
![Page 11: Cassandra Core Concepts](https://reader030.vdocuments.net/reader030/viewer/2022021422/58ec058e1a28ab780f8b46cd/html5/thumbnails/11.jpg)
Replication: ACID is a lie
Client
Master Slave
replication lag
![Page 12: Cassandra Core Concepts](https://reader030.vdocuments.net/reader030/viewer/2022021422/58ec058e1a28ab780f8b46cd/html5/thumbnails/12.jpg)
Replication: ACID is a lie
Client
Master Slave
replication lag
Consistent results? Nope!
![Page 13: Cassandra Core Concepts](https://reader030.vdocuments.net/reader030/viewer/2022021422/58ec058e1a28ab780f8b46cd/html5/thumbnails/13.jpg)
Third Normal Form Doesn't Scale• Queries are unpredictable • Users are impatient • Data must be denormalized • If data > memory, you = history • Disk seeks are the worst
(SELECT CONCAT(city_name,', ',region) value, latitude, longitude, id, population, ( 3959 * acos( cos( radians($latitude) ) * cos( radians( latitude ) ) * cos( radians( longitude ) - radians($longitude) ) + sin( radians($latitude) ) * sin( radians( latitude ) ) ) ) AS distance, CASE region WHEN '$region' THEN 1 ELSE 0 END AS region_match FROM `cities` $where and foo_count > 5 ORDER BY region_match desc, foo_count desc limit 0, 11) UNION (SELECT CONCAT(city_name,', ',region) value, latitude, longitude, id, population, ( 3959 * acos( cos( radians($latitude) ) * cos( radians( latitude ) ) * cos( radians( longitude ) - radians($longitude) ) + sin( radians($latitude) ) * sin( radians( latitude ) ) ) ) AS distance,
![Page 14: Cassandra Core Concepts](https://reader030.vdocuments.net/reader030/viewer/2022021422/58ec058e1a28ab780f8b46cd/html5/thumbnails/14.jpg)
Sharding is a Nightmare• Data is all over the place •No more joins •No more aggregations • Denormalize all the things • Querying secondary indexes
requires hitting every shard • Adding shards requires manually
moving data • Schema changes
![Page 15: Cassandra Core Concepts](https://reader030.vdocuments.net/reader030/viewer/2022021422/58ec058e1a28ab780f8b46cd/html5/thumbnails/15.jpg)
Sharding is a Nightmare• Data is all over the place •No more joins •No more aggregations • Denormalize all the things • Querying secondary indexes
requires hitting every shard • Adding shards requires manually
moving data • Schema changes
![Page 16: Cassandra Core Concepts](https://reader030.vdocuments.net/reader030/viewer/2022021422/58ec058e1a28ab780f8b46cd/html5/thumbnails/16.jpg)
Sharding is a Nightmare• Data is all over the place •No more joins •No more aggregations • Denormalize all the things • Querying secondary indexes
requires hitting every shard • Adding shards requires manually
moving data • Schema changes
![Page 17: Cassandra Core Concepts](https://reader030.vdocuments.net/reader030/viewer/2022021422/58ec058e1a28ab780f8b46cd/html5/thumbnails/17.jpg)
Sharding is a Nightmare• Data is all over the place •No more joins •No more aggregations • Denormalize all the things • Querying secondary indexes
requires hitting every shard • Adding shards requires manually
moving data • Schema changes
![Page 18: Cassandra Core Concepts](https://reader030.vdocuments.net/reader030/viewer/2022021422/58ec058e1a28ab780f8b46cd/html5/thumbnails/18.jpg)
High Availability.. not really•Master failover… who's responsible? • Another moving part… • Bolted on hack
•Multi-DC is a mess • Downtime is frequent • Change database settings (innodb buffer
pool, etc) • Drive, power supply failures • OS updates
![Page 19: Cassandra Core Concepts](https://reader030.vdocuments.net/reader030/viewer/2022021422/58ec058e1a28ab780f8b46cd/html5/thumbnails/19.jpg)
Summary of Failure• Scaling is a pain • ACID is naive at best • You aren't consistent
• Re-sharding is a manual process •We're going to denormalize for
performance •High availability is complicated,
requires additional operational overhead
![Page 20: Cassandra Core Concepts](https://reader030.vdocuments.net/reader030/viewer/2022021422/58ec058e1a28ab780f8b46cd/html5/thumbnails/20.jpg)
Lessons Learned
![Page 21: Cassandra Core Concepts](https://reader030.vdocuments.net/reader030/viewer/2022021422/58ec058e1a28ab780f8b46cd/html5/thumbnails/21.jpg)
Lessons Learned• Consistency is not practical
![Page 22: Cassandra Core Concepts](https://reader030.vdocuments.net/reader030/viewer/2022021422/58ec058e1a28ab780f8b46cd/html5/thumbnails/22.jpg)
Lessons Learned• Consistency is not practical• So we give it up
![Page 23: Cassandra Core Concepts](https://reader030.vdocuments.net/reader030/viewer/2022021422/58ec058e1a28ab780f8b46cd/html5/thumbnails/23.jpg)
Lessons Learned• Consistency is not practical• So we give it up
•Manual sharding & rebalancing is hard
![Page 24: Cassandra Core Concepts](https://reader030.vdocuments.net/reader030/viewer/2022021422/58ec058e1a28ab780f8b46cd/html5/thumbnails/24.jpg)
Lessons Learned• Consistency is not practical• So we give it up
•Manual sharding & rebalancing is hard• So let's build in
![Page 25: Cassandra Core Concepts](https://reader030.vdocuments.net/reader030/viewer/2022021422/58ec058e1a28ab780f8b46cd/html5/thumbnails/25.jpg)
Lessons Learned• Consistency is not practical• So we give it up
•Manual sharding & rebalancing is hard• So let's build in
• Every moving part makes systems more complex
![Page 26: Cassandra Core Concepts](https://reader030.vdocuments.net/reader030/viewer/2022021422/58ec058e1a28ab780f8b46cd/html5/thumbnails/26.jpg)
Lessons Learned• Consistency is not practical• So we give it up
•Manual sharding & rebalancing is hard• So let's build in
• Every moving part makes systems more complex• So let's simplify our architecture - no more master / slave
![Page 27: Cassandra Core Concepts](https://reader030.vdocuments.net/reader030/viewer/2022021422/58ec058e1a28ab780f8b46cd/html5/thumbnails/27.jpg)
Lessons Learned• Consistency is not practical• So we give it up
•Manual sharding & rebalancing is hard• So let's build in
• Every moving part makes systems more complex• So let's simplify our architecture - no more master / slave
• Scaling up is expensive
![Page 28: Cassandra Core Concepts](https://reader030.vdocuments.net/reader030/viewer/2022021422/58ec058e1a28ab780f8b46cd/html5/thumbnails/28.jpg)
Lessons Learned• Consistency is not practical• So we give it up
•Manual sharding & rebalancing is hard• So let's build in
• Every moving part makes systems more complex• So let's simplify our architecture - no more master / slave
• Scaling up is expensive• We want commodity hardware
![Page 29: Cassandra Core Concepts](https://reader030.vdocuments.net/reader030/viewer/2022021422/58ec058e1a28ab780f8b46cd/html5/thumbnails/29.jpg)
Lessons Learned• Consistency is not practical• So we give it up
•Manual sharding & rebalancing is hard• So let's build in
• Every moving part makes systems more complex• So let's simplify our architecture - no more master / slave
• Scaling up is expensive• We want commodity hardware
• Scatter / gather no good
![Page 30: Cassandra Core Concepts](https://reader030.vdocuments.net/reader030/viewer/2022021422/58ec058e1a28ab780f8b46cd/html5/thumbnails/30.jpg)
Lessons Learned• Consistency is not practical• So we give it up
•Manual sharding & rebalancing is hard• So let's build in
• Every moving part makes systems more complex• So let's simplify our architecture - no more master / slave
• Scaling up is expensive• We want commodity hardware
• Scatter / gather no good• We denormalize for real time query performance
![Page 31: Cassandra Core Concepts](https://reader030.vdocuments.net/reader030/viewer/2022021422/58ec058e1a28ab780f8b46cd/html5/thumbnails/31.jpg)
Lessons Learned• Consistency is not practical• So we give it up
•Manual sharding & rebalancing is hard• So let's build in
• Every moving part makes systems more complex• So let's simplify our architecture - no more master / slave
• Scaling up is expensive• We want commodity hardware
• Scatter / gather no good• We denormalize for real time query performance• Goal is to always hit 1 machine
![Page 32: Cassandra Core Concepts](https://reader030.vdocuments.net/reader030/viewer/2022021422/58ec058e1a28ab780f8b46cd/html5/thumbnails/32.jpg)
What is Apache Cassandra?• Fast Distributed Database •High Availability • Linear Scalability • Predictable Performance •No SPOF •Multi-DC • Commodity Hardware • Easy to manage operationally •Not a drop in replacement for
RDBMS
![Page 33: Cassandra Core Concepts](https://reader030.vdocuments.net/reader030/viewer/2022021422/58ec058e1a28ab780f8b46cd/html5/thumbnails/33.jpg)
Hash Ring•No master / slave / replica sets •No config servers, zookeeper • Data is partitioned around the ring • Data is replicated to RF=N servers • All nodes hold data and can answer
queries (both reads & writes) • Location of data on ring is
determined by partition key
![Page 34: Cassandra Core Concepts](https://reader030.vdocuments.net/reader030/viewer/2022021422/58ec058e1a28ab780f8b46cd/html5/thumbnails/34.jpg)
CAP Tradeoffs• Impossible to be both consistent and
highly available during a network partition • Latency between data centers also
makes consistency impractical • Cassandra chooses Availability &
Partition Tolerance over Consistency
![Page 35: Cassandra Core Concepts](https://reader030.vdocuments.net/reader030/viewer/2022021422/58ec058e1a28ab780f8b46cd/html5/thumbnails/35.jpg)
Replication• Data is replicated automatically • You pick number of servers • Called “replication factor” or RF • Data is ALWAYS replicated to each
replica • If a machine is down, missing data
is replayed via hinted handoff
![Page 36: Cassandra Core Concepts](https://reader030.vdocuments.net/reader030/viewer/2022021422/58ec058e1a28ab780f8b46cd/html5/thumbnails/36.jpg)
Consistency Levels• Per query consistency • ALL, QUORUM, ONE •How many replicas for query to respond OK
![Page 37: Cassandra Core Concepts](https://reader030.vdocuments.net/reader030/viewer/2022021422/58ec058e1a28ab780f8b46cd/html5/thumbnails/37.jpg)
Multi DC• Typical usage: clients write to local
DC, replicates async to other DCs • Replication factor per keyspace per
datacenter • Datacenters can be physical or logical
![Page 38: Cassandra Core Concepts](https://reader030.vdocuments.net/reader030/viewer/2022021422/58ec058e1a28ab780f8b46cd/html5/thumbnails/38.jpg)
Reads & Writes
![Page 39: Cassandra Core Concepts](https://reader030.vdocuments.net/reader030/viewer/2022021422/58ec058e1a28ab780f8b46cd/html5/thumbnails/39.jpg)
The Write Path
•Writes are written to any node in the cluster (coordinator) •Writes are written to commit log, then to
memtable • Every write includes a timestamp •Memtable flushed to disk periodically
(sstable) •New memtable is created in memory • Deletes are a special write case, called a
“tombstone”
![Page 40: Cassandra Core Concepts](https://reader030.vdocuments.net/reader030/viewer/2022021422/58ec058e1a28ab780f8b46cd/html5/thumbnails/40.jpg)
Compaction• sstables are immutable • updates are written to new sstables • eventually we have too many files on disk • Partition are spread across multiple
SSTables • Same column can be in multiple SSTables •Merged through compaction, only latest
timestamp is kept
sstable sstable sstable
sstable
![Page 41: Cassandra Core Concepts](https://reader030.vdocuments.net/reader030/viewer/2022021422/58ec058e1a28ab780f8b46cd/html5/thumbnails/41.jpg)
The Read Path• Any server may be queried, it acts as the
coordinator • Contacts nodes with the requested key • On each node, data is pulled from
SSTables and merged • Consistency< ALL performs read repair
in background (read_repair_chance)
![Page 42: Cassandra Core Concepts](https://reader030.vdocuments.net/reader030/viewer/2022021422/58ec058e1a28ab780f8b46cd/html5/thumbnails/42.jpg)
Picking a distribution
![Page 43: Cassandra Core Concepts](https://reader030.vdocuments.net/reader030/viewer/2022021422/58ec058e1a28ab780f8b46cd/html5/thumbnails/43.jpg)
Open Source• Latest, bleeding edge features • File JIRAs • Support via mailing list & IRC • Fix bugs • cassandra.apache.org • Perfect for hacking
![Page 44: Cassandra Core Concepts](https://reader030.vdocuments.net/reader030/viewer/2022021422/58ec058e1a28ab780f8b46cd/html5/thumbnails/44.jpg)
DataStax Enterprise• Integrated Multi-DC Search • Integrated Spark for Analytics • Free Startup Program • <3MM rev & <$30M funding
• Extended support • Additional QA • Focused on stable releases for enterprise • Included on USB
![Page 45: Cassandra Core Concepts](https://reader030.vdocuments.net/reader030/viewer/2022021422/58ec058e1a28ab780f8b46cd/html5/thumbnails/45.jpg)
©2013 DataStax Confidential. Do not distribute without consent. 24