cassandra - lesson learned
TRANSCRIPT
![Page 1: Cassandra - lesson learned](https://reader031.vdocuments.net/reader031/viewer/2022022414/587a60541a28ab520b8b76ab/html5/thumbnails/1.jpg)
Cassandra - lesson learned
Andrzej Ludwikowski
![Page 2: Cassandra - lesson learned](https://reader031.vdocuments.net/reader031/viewer/2022022414/587a60541a28ab520b8b76ab/html5/thumbnails/2.jpg)
About me?- http://aludwikowski.blogspot.com/- https://github.com/aludwiko- @aludwikowski- SoftwareMill
![Page 3: Cassandra - lesson learned](https://reader031.vdocuments.net/reader031/viewer/2022022414/587a60541a28ab520b8b76ab/html5/thumbnails/3.jpg)
Why cassandra?- BigData!!!
- Volume (petabytes of data, trillions of entities)- Velocity (real-time, streams, millions of transactions per second)- Variety (un-, semi-, structured)
- Near-linear horizontal scaling (in proper use cases)- Fully distributed, with no single point of failure
- Data replication- By default
![Page 4: Cassandra - lesson learned](https://reader031.vdocuments.net/reader031/viewer/2022022414/587a60541a28ab520b8b76ab/html5/thumbnails/4.jpg)
What is cassandra vs CAP?- CAP Theorem - pick two
![Page 5: Cassandra - lesson learned](https://reader031.vdocuments.net/reader031/viewer/2022022414/587a60541a28ab520b8b76ab/html5/thumbnails/5.jpg)
What is cassandra vs CAP?- CAP Theorem - pick two
![Page 6: Cassandra - lesson learned](https://reader031.vdocuments.net/reader031/viewer/2022022414/587a60541a28ab520b8b76ab/html5/thumbnails/6.jpg)
What is cassandra vs CAP?- CAP Theorem - pick two
![Page 7: Cassandra - lesson learned](https://reader031.vdocuments.net/reader031/viewer/2022022414/587a60541a28ab520b8b76ab/html5/thumbnails/7.jpg)
Origins?
2010
![Page 8: Cassandra - lesson learned](https://reader031.vdocuments.net/reader031/viewer/2022022414/587a60541a28ab520b8b76ab/html5/thumbnails/8.jpg)
Name?
![Page 9: Cassandra - lesson learned](https://reader031.vdocuments.net/reader031/viewer/2022022414/587a60541a28ab520b8b76ab/html5/thumbnails/9.jpg)
Name?
![Page 10: Cassandra - lesson learned](https://reader031.vdocuments.net/reader031/viewer/2022022414/587a60541a28ab520b8b76ab/html5/thumbnails/10.jpg)
Write path
Node 1
Node 2
Node 3
Node 4
Client (driver)
![Page 11: Cassandra - lesson learned](https://reader031.vdocuments.net/reader031/viewer/2022022414/587a60541a28ab520b8b76ab/html5/thumbnails/11.jpg)
Write path
Node 1
Node 2
Node 3
Node 4
Client (driver)
- Any node can coordinate any request (NSPOF)
![Page 12: Cassandra - lesson learned](https://reader031.vdocuments.net/reader031/viewer/2022022414/587a60541a28ab520b8b76ab/html5/thumbnails/12.jpg)
- Any node can coordinate any request (NSPOF)- Replication Factor
Write path
Node 1
Node 2
Node 3
Node 4
Client
RF=3
![Page 13: Cassandra - lesson learned](https://reader031.vdocuments.net/reader031/viewer/2022022414/587a60541a28ab520b8b76ab/html5/thumbnails/13.jpg)
- Any node can coordinate any request (NSPOF)- Replication Factor- Consistency Level
Write path
Node 1
Node 2
Node 3
Node 4
Client
RF=3
CL=2
![Page 14: Cassandra - lesson learned](https://reader031.vdocuments.net/reader031/viewer/2022022414/587a60541a28ab520b8b76ab/html5/thumbnails/14.jpg)
- Token ring from -2^63 to 2^64
Write path - consistent hashing
Node 1
Node 2
Node 3
Node 4
0100
![Page 15: Cassandra - lesson learned](https://reader031.vdocuments.net/reader031/viewer/2022022414/587a60541a28ab520b8b76ab/html5/thumbnails/15.jpg)
- Token ring from -2^63 to 2^64 - Partitioner: partition key -> token
Write path - consistent hashing
Node 1
Node 2
Node 3
Node 4
Client
Partitioner
0-25
25-5051-75
76-10077
![Page 16: Cassandra - lesson learned](https://reader031.vdocuments.net/reader031/viewer/2022022414/587a60541a28ab520b8b76ab/html5/thumbnails/16.jpg)
- Token ring from -2^63 to 2^64 - Partitioner: primary key -> token
Write path - consistent hashing
Node 1
Node 2
Node 3
Node 4
Client
Partitioner
0-25
25-5051-75
76-100
77
![Page 17: Cassandra - lesson learned](https://reader031.vdocuments.net/reader031/viewer/2022022414/587a60541a28ab520b8b76ab/html5/thumbnails/17.jpg)
- Token ring from -2^63 to 2^64 - Partitioner: primary key -> token
Write path - consistent hashing
Node 1
Node 2
Node 3
Node 4
Client
Partitioner
0-25
25-5051-75
76-100
77
77
77
![Page 18: Cassandra - lesson learned](https://reader031.vdocuments.net/reader031/viewer/2022022414/587a60541a28ab520b8b76ab/html5/thumbnails/18.jpg)
- Token ring from -2^63 to 2^64 - Partitioner: primary key -> token
Write path - consistent hashing
Node 1
Node 2
Node 3
Node 4
Client
0-25
Partitioner
77
25-5051-75
76-100
77
77
![Page 19: Cassandra - lesson learned](https://reader031.vdocuments.net/reader031/viewer/2022022414/587a60541a28ab520b8b76ab/html5/thumbnails/19.jpg)
DEMO
![Page 20: Cassandra - lesson learned](https://reader031.vdocuments.net/reader031/viewer/2022022414/587a60541a28ab520b8b76ab/html5/thumbnails/20.jpg)
Write path - problems?
Node 1
Node 2
Node 3
Node 4
Client
0-2577
25-5051-75
76-100
77
77
![Page 21: Cassandra - lesson learned](https://reader031.vdocuments.net/reader031/viewer/2022022414/587a60541a28ab520b8b76ab/html5/thumbnails/21.jpg)
- Hinted handoff
Write path - problems?
Node 1
Node 2
Node 3
Node 4
Client
0-2577
25-5051-75
76-100
77
77
![Page 22: Cassandra - lesson learned](https://reader031.vdocuments.net/reader031/viewer/2022022414/587a60541a28ab520b8b76ab/html5/thumbnails/22.jpg)
- Hinted handoff- Retry idempotent inserts
- build-in policies
Write path - problems?
Node 1
Node 2
Node 3
Node 4
Client
0-2577
25-5051-75
76-100
77
77
![Page 23: Cassandra - lesson learned](https://reader031.vdocuments.net/reader031/viewer/2022022414/587a60541a28ab520b8b76ab/html5/thumbnails/23.jpg)
- Hinted handoff- Retry idempotent inserts
- build-in policies
- Lightweight transactions (Paxos)
Write path - problems?
Node 1
Node 2
Node 3
Node 4
Client
0-2577
25-5051-75
76-100
77
77
![Page 24: Cassandra - lesson learned](https://reader031.vdocuments.net/reader031/viewer/2022022414/587a60541a28ab520b8b76ab/html5/thumbnails/24.jpg)
- Hinted handoff- Retry idempotent inserts
- build-in policies
- Lightweight transactions (Paxos)- Batches
Write path - problems?
Node 1
Node 2
Node 3
Node 4
Client
0-2577
25-5051-75
76-100
77
77
![Page 25: Cassandra - lesson learned](https://reader031.vdocuments.net/reader031/viewer/2022022414/587a60541a28ab520b8b76ab/html5/thumbnails/25.jpg)
Write path - node level
![Page 26: Cassandra - lesson learned](https://reader031.vdocuments.net/reader031/viewer/2022022414/587a60541a28ab520b8b76ab/html5/thumbnails/26.jpg)
Write path - why so fast?- Commit log - append only
![Page 27: Cassandra - lesson learned](https://reader031.vdocuments.net/reader031/viewer/2022022414/587a60541a28ab520b8b76ab/html5/thumbnails/27.jpg)
Write path - why so fast?
![Page 28: Cassandra - lesson learned](https://reader031.vdocuments.net/reader031/viewer/2022022414/587a60541a28ab520b8b76ab/html5/thumbnails/28.jpg)
Write path - why so fast?
50,000 t/s 50 t/ms 5 t/100us 1 t/20us
![Page 29: Cassandra - lesson learned](https://reader031.vdocuments.net/reader031/viewer/2022022414/587a60541a28ab520b8b76ab/html5/thumbnails/29.jpg)
Write path - why so fast?- Commit log - append only- Periodic (10s) or batch sync to disk
Node 1
Node 2
Node 3
Node 4
Client
RF=2
CL=2
![Page 30: Cassandra - lesson learned](https://reader031.vdocuments.net/reader031/viewer/2022022414/587a60541a28ab520b8b76ab/html5/thumbnails/30.jpg)
Dasdd Rack 2
Rack 1
Write path - why so fast?- Commit log - append only- Periodic or batch sync to disk- Network topology aware
Node 1
Node 2
Node 3
Node 4
Client
RF=2
CL=2
![Page 31: Cassandra - lesson learned](https://reader031.vdocuments.net/reader031/viewer/2022022414/587a60541a28ab520b8b76ab/html5/thumbnails/31.jpg)
Write path - why so fast?
Client
- Commit log - append only- Periodic or batch sync to disk- Network topology aware
Asia DC
Europe DC
![Page 32: Cassandra - lesson learned](https://reader031.vdocuments.net/reader031/viewer/2022022414/587a60541a28ab520b8b76ab/html5/thumbnails/32.jpg)
- Most recent win- Eager retries- In-memory
- MemTable- Row Cache- Bloom Filters- Key Caches- Partition Summaries
- On disk- Partition Indexes- SSTables
Node 1
Node 2
Node 3
Node 4
Client
RF=3
CL=3
Read path
timestamp 67
timestamp 99
timestamp 88
![Page 33: Cassandra - lesson learned](https://reader031.vdocuments.net/reader031/viewer/2022022414/587a60541a28ab520b8b76ab/html5/thumbnails/33.jpg)
Immediate vs. Eventual Consistency- if (writeCL + readCL) > replication_factor then immediate consistency- writeCL=ALL, readCL=1- writeCL=1, readCL=ALL- writeCL,readCL=QUORUM- If "stale" is measured in milliseconds,
how much are those milliseconds worth?
Node 1
Node 2
Node 3
Node 4
Client
RF=3
![Page 34: Cassandra - lesson learned](https://reader031.vdocuments.net/reader031/viewer/2022022414/587a60541a28ab520b8b76ab/html5/thumbnails/34.jpg)
Modeling - new mindset- QDD, Query Driven Development- Nesting is ok- Duplication is ok- Writes are cheap
![Page 35: Cassandra - lesson learned](https://reader031.vdocuments.net/reader031/viewer/2022022414/587a60541a28ab520b8b76ab/html5/thumbnails/35.jpg)
QDD - Conceptual model- Technology independent- Chen notation
![Page 36: Cassandra - lesson learned](https://reader031.vdocuments.net/reader031/viewer/2022022414/587a60541a28ab520b8b76ab/html5/thumbnails/36.jpg)
QDD - Application workflow
![Page 37: Cassandra - lesson learned](https://reader031.vdocuments.net/reader031/viewer/2022022414/587a60541a28ab520b8b76ab/html5/thumbnails/37.jpg)
QDD - Logical model
- Chebotko diagram
![Page 38: Cassandra - lesson learned](https://reader031.vdocuments.net/reader031/viewer/2022022414/587a60541a28ab520b8b76ab/html5/thumbnails/38.jpg)
QDD - Physical model
- Technology dependent- Analysis and validation (finding problems)- Physical optimization (fixing problems)- Data types
![Page 39: Cassandra - lesson learned](https://reader031.vdocuments.net/reader031/viewer/2022022414/587a60541a28ab520b8b76ab/html5/thumbnails/39.jpg)
Physical storage
- Primary key- Partition key
CREATE TABLE videos ( id int, title text, runtime int, year int, PRIMARY KEY (id));
id | title | runtime | year----+---------------------+---------+------ 1 | dzien swira | 93 | 2002 2 | chlopaki nie placza | 96 | 2000 3 | psy | 104 | 1992 4 | psy 2 | 96 | 1994
1title runtime year
dzien swira 93 2002
2title runtime year
chlopaki... 96 2000
3title runtime year
psy 104 1992
4title runtime year
psy 2 96 1994
SELECT FROM videosWHERE title = ‘dzien swira’
![Page 40: Cassandra - lesson learned](https://reader031.vdocuments.net/reader031/viewer/2022022414/587a60541a28ab520b8b76ab/html5/thumbnails/40.jpg)
Physical storage
CREATE TABLE videos_with_clustering ( title text, runtime int, year int, PRIMARY KEY ((title), year));
- Primary key (could be compound)- Partition key- Clustering column (order, uniqueness)
title | year | runtime-------------+------+--------- godzilla | 1954 | 98 godzilla | 1998 | 140 godzilla | 2014 | 123 psy | 1992 | 104
godzilla1954 runtime
98
1998 runtime
140
2014 runtime
123
1992 runtime
104psy
SELECT FROM videos_with_clusteringWHERE title = ‘godzilla’
![Page 41: Cassandra - lesson learned](https://reader031.vdocuments.net/reader031/viewer/2022022414/587a60541a28ab520b8b76ab/html5/thumbnails/41.jpg)
Physical storage
CREATE TABLE videos_with_composite_pk( title text, runtime int, year int, PRIMARY KEY ((title, year)));
- Primary key (could be compound)- Partition key (could be composite)- Clustering column (order, uniqueness)
title | year | runtime-------------+------+--------- godzilla | 1954 | 98 godzilla | 1998 | 140 godzilla | 2014 | 123 psy | 1992 | 104
godzilla:1954runtime
93
godzilla:1998runtime
140
godzilla:2014runtime
123
psy:1992runtime
104
SELECT FROM videos_with_composite_pkWHERE title = ‘godzilla’AND year = 1954
![Page 42: Cassandra - lesson learned](https://reader031.vdocuments.net/reader031/viewer/2022022414/587a60541a28ab520b8b76ab/html5/thumbnails/42.jpg)
Modeling - clustering column(s)
Q: Retrieve videos an actor has appeared in (newest first).
![Page 43: Cassandra - lesson learned](https://reader031.vdocuments.net/reader031/viewer/2022022414/587a60541a28ab520b8b76ab/html5/thumbnails/43.jpg)
Modeling - clustering column(s)
CREATE TABLE videos_by_actor ( actor text, added_date timestamp, video_id timeuuid, character_name text, description text, encoding frozen<video_encoding>, tags set<text>, title text, user_id uuid, PRIMARY KEY ( )) WITH CLUSTERING ORDER BY ( );
Q: Retrieve videos an actor has appeared in (newest first).
![Page 44: Cassandra - lesson learned](https://reader031.vdocuments.net/reader031/viewer/2022022414/587a60541a28ab520b8b76ab/html5/thumbnails/44.jpg)
Modeling - clustering column(s)
CREATE TABLE videos_by_actor ( actor text, added_date timestamp, video_id timeuuid, character_name text, description text, encoding frozen<video_encoding>, tags set<text>, title text, user_id uuid, PRIMARY KEY ((actor), added_date)) WITH CLUSTERING ORDER BY (added_date desc);
Q: Retrieve videos an actor has appeared in (newest first).
![Page 45: Cassandra - lesson learned](https://reader031.vdocuments.net/reader031/viewer/2022022414/587a60541a28ab520b8b76ab/html5/thumbnails/45.jpg)
Modeling - clustering column(s)
CREATE TABLE videos_by_actor ( actor text, added_date timestamp, video_id timeuuid, character_name text, description text, encoding frozen<video_encoding>, tags set<text>, title text, user_id uuid, PRIMARY KEY ((actor), added_date, video_id)) WITH CLUSTERING ORDER BY (added_date desc);
Q: Retrieve videos an actor has appeared in (newest first).
![Page 46: Cassandra - lesson learned](https://reader031.vdocuments.net/reader031/viewer/2022022414/587a60541a28ab520b8b76ab/html5/thumbnails/46.jpg)
Modeling - clustering column(s)
CREATE TABLE videos_by_actor ( actor text, added_date timestamp, video_id timeuuid, character_name text, description text, encoding frozen<video_encoding>, tags set<text>, title text, user_id uuid, PRIMARY KEY ((actor), added_date, video_id, character_name)) WITH CLUSTERING ORDER BY (added_date desc);
Q: Retrieve videos an actor has appeared in (newest first).
![Page 47: Cassandra - lesson learned](https://reader031.vdocuments.net/reader031/viewer/2022022414/587a60541a28ab520b8b76ab/html5/thumbnails/47.jpg)
Modeling - compound partition key
CREATE TABLE temperature_by_day ( weather_station_id text, date text, event_time timestamp, temperature text PRIMARY KEY ( )) WITH CLUSTERING ORDER BY ( );
Q: Retrieve last 1000 measurement from given day.
![Page 48: Cassandra - lesson learned](https://reader031.vdocuments.net/reader031/viewer/2022022414/587a60541a28ab520b8b76ab/html5/thumbnails/48.jpg)
Modeling - compound partition key
CREATE TABLE temperature_by_day ( weather_station_id text, date text, event_time timestamp, temperature text PRIMARY KEY ((weather_station_id), date, event_time)) WITH CLUSTERING ORDER BY (event_time desc);
Q: Retrieve last 1000 measurement from given day.
1 day = 86 400 rows1 week = 604 800 rows1 month = 2 592 000 rows1 year = 31 536 000 rows
![Page 49: Cassandra - lesson learned](https://reader031.vdocuments.net/reader031/viewer/2022022414/587a60541a28ab520b8b76ab/html5/thumbnails/49.jpg)
Modeling - compound partition key
CREATE TABLE temperature_by_day ( weather_station_id text, date text, event_time timestamp, temperature text PRIMARY KEY ((weather_station_id, date), event_time)) WITH CLUSTERING ORDER BY (event_time desc);
Q: Retrieve last 1000 measurement from given day.
![Page 50: Cassandra - lesson learned](https://reader031.vdocuments.net/reader031/viewer/2022022414/587a60541a28ab520b8b76ab/html5/thumbnails/50.jpg)
Modeling - TTL
CREATE TABLE temperature_by_day ( weather_station_id text, date text, event_time timestamp, temperature text PRIMARY KEY ((weather_station_id, date), event_time)) WITH CLUSTERING ORDER BY (event_time desc);
Retention policy - keep data only from last week.
INSERT INTO temperature_by_day … USING TTL 604800;
![Page 51: Cassandra - lesson learned](https://reader031.vdocuments.net/reader031/viewer/2022022414/587a60541a28ab520b8b76ab/html5/thumbnails/51.jpg)
Modeling - bit map index
CREATE TABLE car ( year timestamp, model text, color timestamp, vehicle_id int, //other columns PRIMARY KEY ((year, model, color), vehicle_id));
Q: Find car by year and/or model and/or color.
INSERT INTO car (year, model, color, vehicle_id, ...) VALUES (2000, 'Multipla', 'blue', 13, ...);INSERT INTO car (year, model, color, vehicle_id, ...) VALUES (2000, 'Multipla', '', 13, ...);INSERT INTO car (year, model, color, vehicle_id, ...) VALUES (2000, '', 'blue', 13, ...);INSERT INTO car (year, model, color, vehicle_id, ...) VALUES (2000, '', '', 13, ...);INSERT INTO car (year, model, color, vehicle_id, ...) VALUES ('', 'Multipla', 'blue', 13, ...);INSERT INTO car (year, model, color, vehicle_id, ...) VALUES ('', 'Multipla', '', 13, ...);INSERT INTO car (year, model, color, vehicle_id, ...) VALUES ('', '', 'blue', 13, ...);
SELECT * FROM car WHERE year=2000 and model=’’ and color=’blue’;
![Page 52: Cassandra - lesson learned](https://reader031.vdocuments.net/reader031/viewer/2022022414/587a60541a28ab520b8b76ab/html5/thumbnails/52.jpg)
Modeling - wide rows
CREATE TABLE user ( email text, name text, age int, PRIMARY KEY (email));
Q: Find user by email.
![Page 53: Cassandra - lesson learned](https://reader031.vdocuments.net/reader031/viewer/2022022414/587a60541a28ab520b8b76ab/html5/thumbnails/53.jpg)
Modeling - wide rows
CREATE TABLE user ( domain text, user text, name text, age int, PRIMARY KEY ((domain), user));
Q: Find user by email.
![Page 54: Cassandra - lesson learned](https://reader031.vdocuments.net/reader031/viewer/2022022414/587a60541a28ab520b8b76ab/html5/thumbnails/54.jpg)
Modeling - versioning with lightweight transactions
CREATE TABLE document ( id text, content text, version int, locked_by text, PRIMARY KEY ((id)));
INSERT INTO document (id, content , version ) VALUES ( 'my doc', 'some content', 1) IF NOT EXISTS;
UPDATE document SET locked_by = 'andrzej' WHERE id = 'my doc' IF locked_by = null;
UPDATE document SET content = 'better content', version = 2, locked_by = null WHERE id = 'my doc' IF locked_by = 'andrzej';
![Page 55: Cassandra - lesson learned](https://reader031.vdocuments.net/reader031/viewer/2022022414/587a60541a28ab520b8b76ab/html5/thumbnails/55.jpg)
Modeling - JSON with UDT and tuples
{"title": "Example Schema","type": "object","properties": {
"firstName": “andrzej”,"lastName": “ludwikowski”,"age": {
"description": "Age in years","type": "integer","minimum": 0
}},“x_dimension”: “1”,
“y_dimension”: “2”,}
CREATE TYPE age ( description text, type int, minimum int);
CREATE TYPE prop ( firstName text, lastName text, age frozen <age>);
CREATE TABLE json ( title text, type text, properties list<frozen <prop>>, dimensions tuple<int, int> PRIMARY KEY (title));
![Page 56: Cassandra - lesson learned](https://reader031.vdocuments.net/reader031/viewer/2022022414/587a60541a28ab520b8b76ab/html5/thumbnails/56.jpg)
Common use cases
- Sensor data (Zonar)- Fraud detection (Barracuda)- Playlist and collections (Spotify)- Personalization and recommendation engines (Ebay)- Messaging (Instagram)
![Page 57: Cassandra - lesson learned](https://reader031.vdocuments.net/reader031/viewer/2022022414/587a60541a28ab520b8b76ab/html5/thumbnails/57.jpg)
Common anti use cases
- Queue- Search engine
![Page 58: Cassandra - lesson learned](https://reader031.vdocuments.net/reader031/viewer/2022022414/587a60541a28ab520b8b76ab/html5/thumbnails/58.jpg)
![Page 59: Cassandra - lesson learned](https://reader031.vdocuments.net/reader031/viewer/2022022414/587a60541a28ab520b8b76ab/html5/thumbnails/59.jpg)