data modeling basics for the cloud with datastax

Post on 06-Jan-2017

295 Views

Category:

Technology

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Data Modeling Basics for the CloudRobert StuppSolutions Architect @ DataStax – Committer to Apache Cassandra

© 2016 DataStax, All Rights Reserved.

2

Data Modeling for the CloudDSE is the databasefor the cloud

1.Always On2.Instantaneously Responsive3.Numerous Endpoints4.Geographically Distributed5.Predictively Scalable CC BY 2.0, by Blake Patterson on Flickr

100000 transactionsper second

200000 transactionsper second

© DataStax, All Rights Reserved. 3

Application

Replication Factor 3

Eventual Consistency… is not hopefully consistent

Some data

Some dataSome data

Consistency Level:ONE

© DataStax, All Rights Reserved. 4

Application

UP

Replication Factor 3

Quorum Consistency

Some data

Some dataSome data

Consistency Level:QUORUM

DOWN

© DataStax, All Rights Reserved. 5

DSE / Cassandra NodeApplication

Write Path

Memtable

Commit LogFiles

SSTable

Some data

Some data

SSTableSSTable SSTableSSTable SSTable

Some data

Some data

Some data

Some data

Some data

© DataStax, All Rights Reserved. 6

Compaction

SSTable SSTable SSTable SSTable

SSTable

© DataStax, All Rights Reserved. 7

Compaction Strategies

• Size Tiered• Leveled• Date Tiered

Data Organization in DSE / CassandraPartition

Device ID Timestamp Temperature

Humidity

01-32483-17383

2016-04-19 14:00

22 70

01-32483-17383

2016-04-19 15:00

21.5 65

01-32483-17383

2016-04-19 16:00

23.0 70

PartitionKey

Clustering Key Columns

Primary Key

Device ID Timestamp01-32483-17383

2016-04-19 14:00

01-32483-17383

2016-04-19 15:00

01-32483-17383

2016-04-19 16:00

Device ID01-32483-1738301-32483-1738301-32483-17383

© DataStax, All Rights Reserved. 9

Data Modeling 1011. Understand your data

Conceptual data modeling2. Collect queries

Understand your application3. Model according to queries

Logical data modeling4. Apply optimizations

Physical data modeling

© DataStax, All Rights Reserved. 10

Query driven modeling

1. Collect your use cases2. Extract queries3. Model your tables

© 2016 DataStax, All Rights Reserved.

11

Queries, yesSELECT timestamp, temperature, humidityFROM sensor_dataWHERE sensor_id = ’01-32483-17383’

Always include the

Partition Key

© DataStax, All Rights Reserved. 12

Some standard use-cases

• Customer registration• Customer login• Delivery addresses

© 2016 DataStax, All Rights Reserved.

13

Customer registration1. Check if customer exists

query by username

CREATE TABLE customers ( username text PRIMARY KEY, password_hash text, first_name text, last_name text, email text);

SELECT username FROM customers WHERE username = ?

© 2016 DataStax, All Rights Reserved.

14

Customer login by username1. Check if user exists and password matches

query by username

CREATE TABLE customers ( username text PRIMARY KEY, password_hash text, first_name text, last_name text, email text);

SELECT password_hash FROM customers WHERE username = ?

© 2016 DataStax, All Rights Reserved.

15

Customer login by email1. Check if user exists and password matches

query by email

CREATE TABLE customers ( username text PRIMARY KEY, password_hash text, first_name text, last_name text, email text);

SELECT password_hash FROM customers WHERE email = ?

InvalidRequest: code=2200 [Invalid query] message="Cannot execute this query as it mightinvolve data filtering and thus may have unpredictable performance.

© 2016 DataStax, All Rights Reserved.

16

Customer login by email1. Check if user exists and password matches

query by email

CREATE TABLE customers_by_email ( email text PRIMARY KEY, password_hash text, first_name text, last_name text, username text);

SELECT password_hash FROM customers_by_email WHERE email = ?

This works

© 2016 DataStax, All Rights Reserved.

17

Modeling delivery addressesCREATE TABLE customer_addresses ( username text, address_type text, street text, zip text, city text, PRIMARY KEY ( username, address_type ));

SELECT street,zip,city FROM customer_addresses WHERE username = ?;

SELECT street,zip,city FROM customer_addresses WHERE username = ? AND address_type = ?;

© 2016 DataStax, All Rights Reserved.

18

Modeling delivery addresses1. Print delivery address label

query by user by user namequery delivery address by user and type

SELECT first_name, last_name FROM customers WHERE username = ?;

SELECT street,zip,city FROM customer_addresses WHERE username = ? AND address_type = ?;

This works,

But it’s not great.

© 2016 DataStax, All Rights Reserved.

19

Modeling delivery addressesCREATE TYPE delivery_address ( street text, zip text, city text);

Just 1 read

CREATE TABLE customers ( username text PRIMARY KEY, password_hash text, first_name text, last_name text, email text, delivery_addrs map < text, frozen < delivery_address > >);SELECT first_name, last_name, delivery_addrs FROM customers WHERE username = ?;

© 2016 DataStax, All Rights Reserved.

20

Customer registration – the problemSELECT username FROM customersWHERE username = ?(no results)

SELECT username FROM customersWHERE username = ?(no results)INSERT INTO customers

(username, first_name, last_name)VALUES(‘snazy’, ‘Robert’, ‘Stupp’)(success)

INSERT INTO customers(username, first_name, last_name)VALUES(‘snazy’, ‘Not’, ‘Robert’)(success)

This one winsThis one gets overwritten

© 2016 DataStax, All Rights Reserved.

21

Customer registration – the solutionSELECT username FROM customersWHERE username = ?(no results)

SELECT username FROM customersWHERE username = ?(no results)

INSERT INTO customers …IF NOT EXISTS [applied] = true

INSERT INTO customers …IF NOT EXISTS [applied] = false

Sorry, dudeOK

© 2016 DataStax, All Rights Reserved.

22

Customer registration – the even better solutionINSERT INTO customers …IF NOT EXISTS [applied] = true

INSERT INTO customers …IF NOT EXISTS [applied] = false

Sorry, dude

OK

© 2016 DataStax, All Rights Reserved.

23

Customer login by email – w/ DSE 5.01. Check if user exists and password matches

query by email

CREATE TABLE customers ( username text PRIMARY KEY, password_hash text, first_name text, last_name text, email text);

CREATE MATERIALIZED VIEW customers_by_email AS SELECT email, username, first_name, last_name, password_hash FROM customers WHERE email IS NOT NULL PRIMARY KEY ( email, username );

SELECT password_hash FROM customers_by_email WHERE email = ?;

May the node be with you!

Robert Stupp Solutions Architect @ DataStaxrobert.stupp@datastax.com Committer to Apache Cassandra@snazy

top related