intro to cassandra

52
Cassandra Intro to Tyler Hobbs

Upload: tyler-hobbs

Post on 12-Jun-2015

2.371 views

Category:

Technology


0 download

DESCRIPTION

An introduction to Apache Cassandra, covering the clustering model and the data model.Presented by Tyler Hobbs at the October 2011 Austin NoSQL meetup.

TRANSCRIPT

Page 1: Intro to Cassandra

CassandraIntro to

Tyler Hobbs

Page 2: Intro to Cassandra

Dynamo(clustering)

History

BigTable(data model)

Cassandra

Page 3: Intro to Cassandra

Users

Page 4: Intro to Cassandra

Every node plays the same role– No masters, slaves, or special nodes

– No single point of failure

Clustering

Page 5: Intro to Cassandra

Consistent Hashing

0

10

20

30

40

50

Page 6: Intro to Cassandra

0

10

20

30

40

50

Key: “www.google.com”

Consistent Hashing

Page 7: Intro to Cassandra

0

10

20

30

40

50

Key: “www.google.com”

14

md5(“www.google.com”)

Consistent Hashing

Page 8: Intro to Cassandra

0

10

20

30

40

50

14

Key: “www.google.com”

md5(“www.google.com”)

Consistent Hashing

Page 9: Intro to Cassandra

0

10

20

30

40

50

14

Key: “www.google.com”

md5(“www.google.com”)

Consistent Hashing

Page 10: Intro to Cassandra

0

10

20

30

40

50

14

Key: “www.google.com”

md5(“www.google.com”)

Replication Factor = 3

Consistent Hashing

Page 11: Intro to Cassandra

Client can talk to any node

Clustering

Page 12: Intro to Cassandra

Scaling

50

0

10

20

30

The node at50 owns the red portion

RF = 2

Page 13: Intro to Cassandra

Scaling

50

0

10

20

30

40Add a new node at 40

RF = 2

Page 14: Intro to Cassandra

Scaling

50

0

10

20

30

40Add a new node at 40

RF = 2

Page 15: Intro to Cassandra

Node Failures

50

0

10

20

30

RF = 2

40

Replicas

Page 16: Intro to Cassandra

Node Failures

50

0

10

20

30

RF = 2

40

Replicas

Page 17: Intro to Cassandra

Node Failures

50

0

10

20

30

RF = 2

40

Page 18: Intro to Cassandra

Consistency, Availability Consistency

– Can I read stale data? Availability

– Can I write/read at all? Tunable Consistency

Page 19: Intro to Cassandra

Consistency N = Total number of replicas R = Number of replicas read from

– (before the response is returned) W = Number of replicas written to

– (before the write is considered a success)

Page 20: Intro to Cassandra

Consistency N = Total number of replicas R = Number of replicas read from

– (before the response is returned) W = Number of replicas written to

– (before the write is considered a success)

W + R > N gives strong consistency

Page 21: Intro to Cassandra

Consistency

W + R > N gives strong consistency

N = 3W = 2R = 2

2 + 2 > 3 ==> strongly consistent

Page 22: Intro to Cassandra

Consistency

W + R > N gives strong consistency

N = 3W = 2R = 2

2 + 2 > 3 ==> strongly consistent

Only 2 of the 3 replicas must be available.

Page 23: Intro to Cassandra

Consistency Tunable Consistency

– Specify N (Replication Factor) per data set– Specify R, W per operation

Page 24: Intro to Cassandra

Consistency Tunable Consistency

– Specify N (Replication Factor) per data set– Specify R, W per operation– Quorum: N/2 + 1

• R = W = Quorum• Strong consistency• Tolerate the loss of N – Quorum replicas

– R, W can also be 1 or N

Page 25: Intro to Cassandra

Availability Can tolerate the loss of:

– N – R replicas for reads– N – W replicas for writes

Page 26: Intro to Cassandra

CAP Theorem

Availability

Consistency

During node or network failure:

100%

100%

Possible

Not Possible

Page 27: Intro to Cassandra

CAP Theorem

Availability

Consistency

During node or network failure:

100%

100%

Cassandra

Not Possible

Possible

Page 28: Intro to Cassandra

No single point of failure Replication that works Scales linearly

– 2x nodes = 2x performance• For both writes and reads

– Up to 100's of nodes Operationally simple Multi-Datacenter Replication

Clustering

Page 29: Intro to Cassandra

Comes from Google BigTable Goals

– Minimize disk seeks– High throughput– Low latency– Durable

Data Model

Page 30: Intro to Cassandra

Keyspace– A collection of Column Families– Controls replication settings

Column Family– Kinda resembles a table

Data Model

Page 31: Intro to Cassandra

Static– Object data– Similar to a table in a relational database

Dynamic– Pre-calculated query results– Materialized views

Column Families

Page 32: Intro to Cassandra

Static Column Families

zznate

driftx

thobbs

jbellis

password: *

password: *

password: *

name: Nate

name: Brandon

name: Tyler

password: * name: Jonathan site: riptano.com

Users

Page 33: Intro to Cassandra

Rows– Each row has a unique primary key– Sorted list of (name, value) tuples

• Like a sorted map or dictionary– The (name, value) tuple is called a “column”

Dynamic Column Families

Page 34: Intro to Cassandra

Dynamic Column Families

zznate

driftx

thobbs

jbellis

driftx: thobbs:

driftx: thobbs:mdennis: zznate

Following

zznate:

pcmanus xedin:

Page 35: Intro to Cassandra

Column Timestamps– Each column (tuple) has a timestamp– In the case of a collision, the latest timestamp wins– Client specifies timestamp with write– Writes are idempotent

• Infinite retries allowed

Dynamic Column Families

Page 36: Intro to Cassandra

Dynamic Column Families Other Examples:

– Timeline of tweets by a user– Timeline of tweets by all of the people a user is

following– List of comments sorted by score– List of friends grouped by state

Page 37: Intro to Cassandra

The Data API Two choices

– RPC-based API– CQL

• Cassandra Query Language

Page 38: Intro to Cassandra

Inserting Data

INSERT INTO users (KEY, “name”, “age”) VALUES (“thobbs”, “Tyler”, 24);

Page 39: Intro to Cassandra

Updating Data

INSERT INTO users (KEY, “age”) VALUES (“thobbs”, 34);

Updates are the same as inserts:

Or

UPDATE users SET “age” = 34 WHERE KEY = “thobbs”;

Page 40: Intro to Cassandra

Fetching Data

SELECT * FROM users WHERE KEY = “thobbs”;

Whole row select:

Page 41: Intro to Cassandra

Fetching Data

SELECT “name”, “age” FROM users WHERE KEY = “thobbs”;

Explicit column select:

Page 42: Intro to Cassandra

Fetching Data

UPDATE letters SET 1='a', 2='b', 3='c', 4='d', 5='e' WHERE KEY = “key”;

SELECT 1..3 FROM letters WHERE KEY = “key”;

Get a slice of columns

Returns [(1, a), (2, b), (3, c)]

Page 43: Intro to Cassandra

Fetching Data

SELECT FIRST 2 FROM letters WHERE KEY = “key”;

Get a slice of columns

Returns [(1, a), (2, b)]

SELECT FIRST 2 REVERSED FROM letters WHERE KEY = “key”;

Returns [(5, e), (4, d)]

Page 44: Intro to Cassandra

Fetching Data

SELECT 3..'' FROM letters WHERE KEY = “key”;

Get a slice of columns

Returns [(3, c), (4, d), (5, e)]

SELECT FIRST 2 REVERSED 4..'' FROM letters WHERE KEY = “key”;

Returns [(4, d), (3, c)]

Page 45: Intro to Cassandra

Deleting Data

DELETE FROM users WHERE KEY = “thobbs”;

Delete a whole row:

DELETE “age” FROM users WHERE KEY = “thobbs”;

Delete specific columns:

Page 46: Intro to Cassandra

Secondary Indexes

CREATE INDEX ageIndex ON users (age);

SELECT name FROM USERS WHERE age = 24 AND state = “TX”;

Builtin basic indexes

Page 47: Intro to Cassandra

Performance Writes

– 10k – 30k per second per node– Sub-millisecond latency

Reads– 1k – 10k per second per node– Depends on data set, caching– Usually 0.1 to 10ms latency

Page 48: Intro to Cassandra

Other Features Distributed Counters

– Can support millions of high-volume counters Excellent Multi-datacenter Support

– Disaster recovery– Locality

Hadoop Integration– Isolation of resources– Hive and Pig drivers

Compression

Page 49: Intro to Cassandra

What Cassandra Can't Do Transactions

– Unless you use a distributed lock– Atomicity, Isolation– These aren't needed as often as you'd think

Limited support for ad-hoc queries– Know what you want to do with the data

Page 50: Intro to Cassandra

Not One-size-fits-all Use alongside an RDBMS

– Use the RDBMS for highly-transactional or highly-relational data• Usually a small set of data

– Let Cassandra scale to handle the rest

Page 51: Intro to Cassandra

Language Support Good:

– Java– Python– Ruby– PHP– C#

Coming Soon:– Everything else, now that we have CQL

Page 52: Intro to Cassandra

Tyler Hobbs@tylhobbs

[email protected]

Questions?