cassandra advanced data modeling
TRANSCRIPT
![Page 1: Cassandra advanced data modeling](https://reader036.vdocuments.net/reader036/viewer/2022081721/586f781c1a28ab10258b6a55/html5/thumbnails/1.jpg)
CassandraAdvanceddata modeling
Lyon Cassandra UsersRomain Hardouin2016-05-31
![Page 2: Cassandra advanced data modeling](https://reader036.vdocuments.net/reader036/viewer/2022081721/586f781c1a28ab10258b6a55/html5/thumbnails/2.jpg)
$ whoRomain
$ pgrep -fl workCassandra architect
$ whatis teadsNo.1 Video Advertising Marketplace
![Page 3: Cassandra advanced data modeling](https://reader036.vdocuments.net/reader036/viewer/2022081721/586f781c1a28ab10258b6a55/html5/thumbnails/3.jpg)
I. Introduction
II. Key principles
III. Chebotko methodology
IV. Time handling
Data modeling
![Page 4: Cassandra advanced data modeling](https://reader036.vdocuments.net/reader036/viewer/2022081721/586f781c1a28ab10258b6a55/html5/thumbnails/4.jpg)
I. Introduction
![Page 5: Cassandra advanced data modeling](https://reader036.vdocuments.net/reader036/viewer/2022081721/586f781c1a28ab10258b6a55/html5/thumbnails/5.jpg)
Theory
![Page 6: Cassandra advanced data modeling](https://reader036.vdocuments.net/reader036/viewer/2022081721/586f781c1a28ab10258b6a55/html5/thumbnails/6.jpg)
Theory
Chebotko diagrams
E&R
![Page 7: Cassandra advanced data modeling](https://reader036.vdocuments.net/reader036/viewer/2022081721/586f781c1a28ab10258b6a55/html5/thumbnails/7.jpg)
II. Key principles
![Page 8: Cassandra advanced data modeling](https://reader036.vdocuments.net/reader036/viewer/2022081721/586f781c1a28ab10258b6a55/html5/thumbnails/8.jpg)
Know your data
DenormalizeKnow your queries
Key Principles
Nest DataDuplicate Data
![Page 9: Cassandra advanced data modeling](https://reader036.vdocuments.net/reader036/viewer/2022081721/586f781c1a28ab10258b6a55/html5/thumbnails/9.jpg)
Know your domain
Conceptual Data Model, E&R● Entities● Relationships● Attributes / Keys● Cardinalities● Constraints
Know your data
![Page 10: Cassandra advanced data modeling](https://reader036.vdocuments.net/reader036/viewer/2022081721/586f781c1a28ab10258b6a55/html5/thumbnails/10.jpg)
Entities & relationships
Know your data
![Page 11: Cassandra advanced data modeling](https://reader036.vdocuments.net/reader036/viewer/2022081721/586f781c1a28ab10258b6a55/html5/thumbnails/11.jpg)
Query-driven model
Application Workflow
New needs?● New queries => new tables● Alter table possible?
Know your data
Know your queries
![Page 12: Cassandra advanced data modeling](https://reader036.vdocuments.net/reader036/viewer/2022081721/586f781c1a28ab10258b6a55/html5/thumbnails/12.jpg)
Goal: one partition per query
Anti-pattern:● Table scan● Client joins (a.k.a multi-table)● Secondary index● Allow filtering
Know your data
Know your queries
![Page 13: Cassandra advanced data modeling](https://reader036.vdocuments.net/reader036/viewer/2022081721/586f781c1a28ab10258b6a55/html5/thumbnails/13.jpg)
Nest Data
Clustering columns
Collection columns
UDT columns
Know your data
Denormalize
![Page 14: Cassandra advanced data modeling](https://reader036.vdocuments.net/reader036/viewer/2022081721/586f781c1a28ab10258b6a55/html5/thumbnails/14.jpg)
Nest Data
Know your data
Denormalize
CREATE TABLE actors_by_video ( video_id uuid, actor_name text, character_name text, PRIMARY KEY ((video_id),
actor_name, character_name));
![Page 15: Cassandra advanced data modeling](https://reader036.vdocuments.net/reader036/viewer/2022081721/586f781c1a28ab10258b6a55/html5/thumbnails/15.jpg)
Duplicate data
Writes are cheap: « Joins on write »
Duplication occurs at different levels:● Table: Materialized views● Partition● Rows
Know your data
Denormalize
![Page 16: Cassandra advanced data modeling](https://reader036.vdocuments.net/reader036/viewer/2022081721/586f781c1a28ab10258b6a55/html5/thumbnails/16.jpg)
III. Chebotko Methodology
![Page 17: Cassandra advanced data modeling](https://reader036.vdocuments.net/reader036/viewer/2022081721/586f781c1a28ab10258b6a55/html5/thumbnails/17.jpg)
From « A Big Data Modeling Methodology for Apache Cassandra »From « A Big Data Modeling Methodology for Apache Cassandra »
Application workflowApplication workflow
Query workflow Query list
![Page 18: Cassandra advanced data modeling](https://reader036.vdocuments.net/reader036/viewer/2022081721/586f781c1a28ab10258b6a55/html5/thumbnails/18.jpg)
From « A Big Data Modeling Methodology for Apache Cassandra »From « A Big Data Modeling Methodology for Apache Cassandra »
Chebotko DiagramChebotko Diagram
![Page 19: Cassandra advanced data modeling](https://reader036.vdocuments.net/reader036/viewer/2022081721/586f781c1a28ab10258b6a55/html5/thumbnails/19.jpg)
actors_by_video
video_id uuid Kactor_name text C↑character_name text C↑
CREATE TABLE actors_by_video ( video_id uuid, actor_name text, character_name text, PRIMARY KEY ((video_id), actor_name, character_name));
Chebotko DiagramChebotko Diagram
![Page 20: Cassandra advanced data modeling](https://reader036.vdocuments.net/reader036/viewer/2022081721/586f781c1a28ab10258b6a55/html5/thumbnails/20.jpg)
MR 1Entities & Relationships
MR 2Equality search attributes
MR 3Inequality search attribues
Chebotko mapping rules
MR 5Key attributes, uniqueness
MR 4Ordering attributes
<>=
↑↓
![Page 21: Cassandra advanced data modeling](https://reader036.vdocuments.net/reader036/viewer/2022081721/586f781c1a28ab10258b6a55/html5/thumbnails/21.jpg)
From « A Big Data Modeling Methodology for Apache Cassandra »From « A Big Data Modeling Methodology for Apache Cassandra »
Chebotko mapping rulesChebotko mapping rules
![Page 22: Cassandra advanced data modeling](https://reader036.vdocuments.net/reader036/viewer/2022081721/586f781c1a28ab10258b6a55/html5/thumbnails/22.jpg)
Internet of ThingsDemo
Kashlev Data Modeler
![Page 23: Cassandra advanced data modeling](https://reader036.vdocuments.net/reader036/viewer/2022081721/586f781c1a28ab10258b6a55/html5/thumbnails/23.jpg)
IV. Time handling- Tombstones
- TTL
- UPSERTs
![Page 24: Cassandra advanced data modeling](https://reader036.vdocuments.net/reader036/viewer/2022081721/586f781c1a28ab10258b6a55/html5/thumbnails/24.jpg)
IV. Time handling- Tombstones
- TTL
- UPSERTs
![Page 25: Cassandra advanced data modeling](https://reader036.vdocuments.net/reader036/viewer/2022081721/586f781c1a28ab10258b6a55/html5/thumbnails/25.jpg)
Eventually consistency
No instant deletes
Deletes are writes
SSTables are immutable files
Writes are spread across many files
![Page 26: Cassandra advanced data modeling](https://reader036.vdocuments.net/reader036/viewer/2022081721/586f781c1a28ab10258b6a55/html5/thumbnails/26.jpg)
![Page 27: Cassandra advanced data modeling](https://reader036.vdocuments.net/reader036/viewer/2022081721/586f781c1a28ab10258b6a55/html5/thumbnails/27.jpg)
Goal: avoid to read too many* tombstones
...
...
* see tombstone_warn_threshold & tombstone_failure_threshold
![Page 28: Cassandra advanced data modeling](https://reader036.vdocuments.net/reader036/viewer/2022081721/586f781c1a28ab10258b6a55/html5/thumbnails/28.jpg)
IV. Time handling- Tombstones
- TTL
- UPSERTs
![Page 29: Cassandra advanced data modeling](https://reader036.vdocuments.net/reader036/viewer/2022081721/586f781c1a28ab10258b6a55/html5/thumbnails/29.jpg)
TTLsTTLs
Data must be designed to be TTL'ed
tombstones
![Page 30: Cassandra advanced data modeling](https://reader036.vdocuments.net/reader036/viewer/2022081721/586f781c1a28ab10258b6a55/html5/thumbnails/30.jpg)
Why?
What we add?
![Page 31: Cassandra advanced data modeling](https://reader036.vdocuments.net/reader036/viewer/2022081721/586f781c1a28ab10258b6a55/html5/thumbnails/31.jpg)
TIMEdimension
![Page 32: Cassandra advanced data modeling](https://reader036.vdocuments.net/reader036/viewer/2022081721/586f781c1a28ab10258b6a55/html5/thumbnails/32.jpg)
IV. Time handling- Tombstones
- TTL
- UPSERTs
![Page 33: Cassandra advanced data modeling](https://reader036.vdocuments.net/reader036/viewer/2022081721/586f781c1a28ab10258b6a55/html5/thumbnails/33.jpg)
UPSERTsUPSERTs
Same INSERT over and over again?
UPSERTs hide this behavior
What if… one day you want to add time
![Page 34: Cassandra advanced data modeling](https://reader036.vdocuments.net/reader036/viewer/2022081721/586f781c1a28ab10258b6a55/html5/thumbnails/34.jpg)
![Page 35: Cassandra advanced data modeling](https://reader036.vdocuments.net/reader036/viewer/2022081721/586f781c1a28ab10258b6a55/html5/thumbnails/35.jpg)
Questions?
![Page 36: Cassandra advanced data modeling](https://reader036.vdocuments.net/reader036/viewer/2022081721/586f781c1a28ab10258b6a55/html5/thumbnails/36.jpg)
Resources« A Big Data Modeling Methodology for Apache Cassandra »
- Artem Chebotko, Andrey Kashlev & Shiyong Lu - www.cs.wayne.edu/andrey/papers/TR-BIGDATA-05-2015-CKL.pdf
KDM- Andrey Kashlev- kdm.dataview.org