data modeling basics for the cloud with datastax

24
Data Modeling Basics for the Cloud Robert Stupp Solutions Architect @ DataStax – Committer to Apache Cassandra

Upload: datastax

Post on 06-Jan-2017

294 views

Category:

Technology


0 download

TRANSCRIPT

Page 1: Data Modeling Basics for the Cloud with DataStax

Data Modeling Basics for the CloudRobert StuppSolutions Architect @ DataStax – Committer to Apache Cassandra

Page 2: Data Modeling Basics for the Cloud with DataStax

© 2016 DataStax, All Rights Reserved.

2

Data Modeling for the CloudDSE is the databasefor the cloud

1.Always On2.Instantaneously Responsive3.Numerous Endpoints4.Geographically Distributed5.Predictively Scalable CC BY 2.0, by Blake Patterson on Flickr

100000 transactionsper second

200000 transactionsper second

Page 3: Data Modeling Basics for the Cloud with DataStax

© DataStax, All Rights Reserved. 3

Application

Replication Factor 3

Eventual Consistency… is not hopefully consistent

Some data

Some dataSome data

Consistency Level:ONE

Page 4: Data Modeling Basics for the Cloud with DataStax

© DataStax, All Rights Reserved. 4

Application

UP

Replication Factor 3

Quorum Consistency

Some data

Some dataSome data

Consistency Level:QUORUM

DOWN

Page 5: Data Modeling Basics for the Cloud with DataStax

© DataStax, All Rights Reserved. 5

DSE / Cassandra NodeApplication

Write Path

Memtable

Commit LogFiles

SSTable

Some data

Some data

SSTableSSTable SSTableSSTable SSTable

Some data

Some data

Some data

Some data

Some data

Page 6: Data Modeling Basics for the Cloud with DataStax

© DataStax, All Rights Reserved. 6

Compaction

SSTable SSTable SSTable SSTable

SSTable

Page 7: Data Modeling Basics for the Cloud with DataStax

© DataStax, All Rights Reserved. 7

Compaction Strategies

• Size Tiered• Leveled• Date Tiered

Page 8: Data Modeling Basics for the Cloud with DataStax

Data Organization in DSE / CassandraPartition

Device ID Timestamp Temperature

Humidity

01-32483-17383

2016-04-19 14:00

22 70

01-32483-17383

2016-04-19 15:00

21.5 65

01-32483-17383

2016-04-19 16:00

23.0 70

PartitionKey

Clustering Key Columns

Primary Key

Device ID Timestamp01-32483-17383

2016-04-19 14:00

01-32483-17383

2016-04-19 15:00

01-32483-17383

2016-04-19 16:00

Device ID01-32483-1738301-32483-1738301-32483-17383

Page 9: Data Modeling Basics for the Cloud with DataStax

© DataStax, All Rights Reserved. 9

Data Modeling 1011. Understand your data

Conceptual data modeling2. Collect queries

Understand your application3. Model according to queries

Logical data modeling4. Apply optimizations

Physical data modeling

Page 10: Data Modeling Basics for the Cloud with DataStax

© DataStax, All Rights Reserved. 10

Query driven modeling

1. Collect your use cases2. Extract queries3. Model your tables

Page 11: Data Modeling Basics for the Cloud with DataStax

© 2016 DataStax, All Rights Reserved.

11

Queries, yesSELECT timestamp, temperature, humidityFROM sensor_dataWHERE sensor_id = ’01-32483-17383’

Always include the

Partition Key

Page 12: Data Modeling Basics for the Cloud with DataStax

© DataStax, All Rights Reserved. 12

Some standard use-cases

• Customer registration• Customer login• Delivery addresses

Page 13: Data Modeling Basics for the Cloud with DataStax

© 2016 DataStax, All Rights Reserved.

13

Customer registration1. Check if customer exists

query by username

CREATE TABLE customers ( username text PRIMARY KEY, password_hash text, first_name text, last_name text, email text);

SELECT username FROM customers WHERE username = ?

Page 14: Data Modeling Basics for the Cloud with DataStax

© 2016 DataStax, All Rights Reserved.

14

Customer login by username1. Check if user exists and password matches

query by username

CREATE TABLE customers ( username text PRIMARY KEY, password_hash text, first_name text, last_name text, email text);

SELECT password_hash FROM customers WHERE username = ?

Page 15: Data Modeling Basics for the Cloud with DataStax

© 2016 DataStax, All Rights Reserved.

15

Customer login by email1. Check if user exists and password matches

query by email

CREATE TABLE customers ( username text PRIMARY KEY, password_hash text, first_name text, last_name text, email text);

SELECT password_hash FROM customers WHERE email = ?

InvalidRequest: code=2200 [Invalid query] message="Cannot execute this query as it mightinvolve data filtering and thus may have unpredictable performance.

Page 16: Data Modeling Basics for the Cloud with DataStax

© 2016 DataStax, All Rights Reserved.

16

Customer login by email1. Check if user exists and password matches

query by email

CREATE TABLE customers_by_email ( email text PRIMARY KEY, password_hash text, first_name text, last_name text, username text);

SELECT password_hash FROM customers_by_email WHERE email = ?

This works

Page 17: Data Modeling Basics for the Cloud with DataStax

© 2016 DataStax, All Rights Reserved.

17

Modeling delivery addressesCREATE TABLE customer_addresses ( username text, address_type text, street text, zip text, city text, PRIMARY KEY ( username, address_type ));

SELECT street,zip,city FROM customer_addresses WHERE username = ?;

SELECT street,zip,city FROM customer_addresses WHERE username = ? AND address_type = ?;

Page 18: Data Modeling Basics for the Cloud with DataStax

© 2016 DataStax, All Rights Reserved.

18

Modeling delivery addresses1. Print delivery address label

query by user by user namequery delivery address by user and type

SELECT first_name, last_name FROM customers WHERE username = ?;

SELECT street,zip,city FROM customer_addresses WHERE username = ? AND address_type = ?;

This works,

But it’s not great.

Page 19: Data Modeling Basics for the Cloud with DataStax

© 2016 DataStax, All Rights Reserved.

19

Modeling delivery addressesCREATE TYPE delivery_address ( street text, zip text, city text);

Just 1 read

CREATE TABLE customers ( username text PRIMARY KEY, password_hash text, first_name text, last_name text, email text, delivery_addrs map < text, frozen < delivery_address > >);SELECT first_name, last_name, delivery_addrs FROM customers WHERE username = ?;

Page 20: Data Modeling Basics for the Cloud with DataStax

© 2016 DataStax, All Rights Reserved.

20

Customer registration – the problemSELECT username FROM customersWHERE username = ?(no results)

SELECT username FROM customersWHERE username = ?(no results)INSERT INTO customers

(username, first_name, last_name)VALUES(‘snazy’, ‘Robert’, ‘Stupp’)(success)

INSERT INTO customers(username, first_name, last_name)VALUES(‘snazy’, ‘Not’, ‘Robert’)(success)

This one winsThis one gets overwritten

Page 21: Data Modeling Basics for the Cloud with DataStax

© 2016 DataStax, All Rights Reserved.

21

Customer registration – the solutionSELECT username FROM customersWHERE username = ?(no results)

SELECT username FROM customersWHERE username = ?(no results)

INSERT INTO customers …IF NOT EXISTS [applied] = true

INSERT INTO customers …IF NOT EXISTS [applied] = false

Sorry, dudeOK

Page 22: Data Modeling Basics for the Cloud with DataStax

© 2016 DataStax, All Rights Reserved.

22

Customer registration – the even better solutionINSERT INTO customers …IF NOT EXISTS [applied] = true

INSERT INTO customers …IF NOT EXISTS [applied] = false

Sorry, dude

OK

Page 23: Data Modeling Basics for the Cloud with DataStax

© 2016 DataStax, All Rights Reserved.

23

Customer login by email – w/ DSE 5.01. Check if user exists and password matches

query by email

CREATE TABLE customers ( username text PRIMARY KEY, password_hash text, first_name text, last_name text, email text);

CREATE MATERIALIZED VIEW customers_by_email AS SELECT email, username, first_name, last_name, password_hash FROM customers WHERE email IS NOT NULL PRIMARY KEY ( email, username );

SELECT password_hash FROM customers_by_email WHERE email = ?;

Page 24: Data Modeling Basics for the Cloud with DataStax

May the node be with you!

Robert Stupp Solutions Architect @ [email protected] Committer to Apache Cassandra@snazy