using apache cassandra: what is this thing, and how do i use it?

©2013 DataStax. Do not distribute without consent. @zanson Jeremiah Jordan Lead Software Engineer/Support Using Apache Cassandra for Big Data What is this thing, and how do I use it? 1 Monday, October 14, 13

Upload: jeremiahdjordan

Post on 15-Jan-2015




2 download


This is the presentation I gave at the Reflections | Projections conference at UIUC. It is an introduction to some of the basics of Apache Cassandra, followed by actually getting it up and running. This presentation goes over what Apache Cassandra is and how to get it up and running on your development machine. It then goes over using the DataStax Python Driver and the Cassandra Query Language (CQL) to create tables, write data to them, and then read it back out.


Page 1: Using Apache Cassandra: What is this thing, and how do I use it?

©2013 DataStax. Do not distribute without consent.


Jeremiah JordanLead Software Engineer/Support

Using Apache Cassandra for Big DataWhat is this thing, and how do I use it?

1Monday, October 14, 13

Page 2: Using Apache Cassandra: What is this thing, and how do I use it?

Who I am• Jeremiah Jordan

• Lead Software Engineer in Support at DataStax

• Previously Senior Architect at Morningstar, Inc.

• Using Cassandra since 0.6

• Before that, wrote code for the F22

Monday, October 14, 13

Page 3: Using Apache Cassandra: What is this thing, and how do I use it?

Cassandra - An introduction

Monday, October 14, 13

Page 4: Using Apache Cassandra: What is this thing, and how do I use it?

Cassandra - Intro

• Based on Amazon Dynamo and Google BigTable papers

• Shared nothing

• Distributed

• Data safe as possible

• Predictable scaling




Monday, October 14, 13

Page 5: Using Apache Cassandra: What is this thing, and how do I use it?

Cassandra - More than one server

• All nodes participate in a cluster

• Shared nothing

• Add or remove as needed

•More capacity? Add a server


• Each node owns a number of tokens• Tokens denote a range of keys

• 4 nodes? -> Key range/4• Each node owns 1/4 the data

Monday, October 14, 13

Page 6: Using Apache Cassandra: What is this thing, and how do I use it?

Cassandra - Locally Distributed

• Client writes to any node

• Node coordinates with others

• Data replicated in parallel

• Replication factor (RF): How many copies of your data?

• RF = 3 here


Each node stores 3/4 of clusters total data.

Monday, October 14, 13

Page 7: Using Apache Cassandra: What is this thing, and how do I use it?

Cassandra - Geographically Distributed

• Client writes local

• Data syncs across WAN

• Replication Factor per DC


Single coordinator

Monday, October 14, 13

Page 8: Using Apache Cassandra: What is this thing, and how do I use it?

Cassandra - Consistency

• Consistency Level (CL)

• Client specifies per read or write


• ALL = All replicas ack

• QUORUM = > 51% of replicas ack

• LOCAL_QUORUM = > 51% in local DC ack

• ONE = Only one replica acks

Monday, October 14, 13

Page 9: Using Apache Cassandra: What is this thing, and how do I use it?

Cassandra - Transparent to the application

• A single node failure shouldn’t bring failure

• Replication Factor + Consistency Level = Success

• This example:

• RF = 3



>51% Ack so we are good!

Monday, October 14, 13

Page 10: Using Apache Cassandra: What is this thing, and how do I use it?

Application Example - Layout

• Active-Active

• Service based DNS routing


Cassandra Replication

Monday, October 14, 13

Page 11: Using Apache Cassandra: What is this thing, and how do I use it?

Application Example - Uptime


• Normal server maintenance

• Application is unaware

Cassandra Replication

Monday, October 14, 13

Page 12: Using Apache Cassandra: What is this thing, and how do I use it?

Application Example - Failure


• Data center failure

• Data is safe. Route traffic.


Another happy user!

Monday, October 14, 13

Page 13: Using Apache Cassandra: What is this thing, and how do I use it?

Five Years of Cassandra

Jul-09 May-10 Feb-11 Dec-11 Oct-12 Jul-13

0.1 0.3 0.6 0.7 1.0 1.2...




Monday, October 14, 13

Page 14: Using Apache Cassandra: What is this thing, and how do I use it?

Cassandra 2.0 - Big new features

Monday, October 14, 13

Page 15: Using Apache Cassandra: What is this thing, and how do I use it?

SELECT * FROM usersWHERE username = ’jbellis’

[empty resultset]

Session 1SELECT * FROM usersWHERE username = ’jbellis’

[empty resultset]

Session 2

Lightweight transactions: the problem

INSERT INTO users (username,password)VALUES (’jbellis’,‘xdg44hh’)

INSERT INTO users (userName,password)VALUES (’jbellis’,‘8dhh43k’)

It’s a Race!

Who wins?

Monday, October 14, 13

Page 16: Using Apache Cassandra: What is this thing, and how do I use it?

LWT: details• 4 round trips vs 1 for normal updates

• Paxos - Paxos made easy

• Immediate consistency with no leader election or failover

• For reads, ConsistencyLevel.SERIAL


Monday, October 14, 13

Page 17: Using Apache Cassandra: What is this thing, and how do I use it?

UPDATE USERS SET email = ’[email protected]’, ...WHERE username = ’jbellis’IF email = ’[email protected]’;

INSERT INTO USERS (username, email, ...)VALUES (‘jbellis’, ‘[email protected]’, ... )IF NOT EXISTS;

Using LWT

• Don’t overwrite an existing record

• Only update record if condition is met

Monday, October 14, 13

Page 19: Using Apache Cassandra: What is this thing, and how do I use it?

Installing Cassandra

Monday, October 14, 13

Page 20: Using Apache Cassandra: What is this thing, and how do I use it?

Download Cassandra

Monday, October 14, 13

Page 21: Using Apache Cassandra: What is this thing, and how do I use it?

Download Cassandra

Monday, October 14, 13

Page 22: Using Apache Cassandra: What is this thing, and how do I use it?

Download Cassandra

Monday, October 14, 13

Page 23: Using Apache Cassandra: What is this thing, and how do I use it?

Extract Cassandra

Monday, October 14, 13

Page 24: Using Apache Cassandra: What is this thing, and how do I use it?

Setup Data and Log Directories

Monday, October 14, 13

Page 25: Using Apache Cassandra: What is this thing, and how do I use it?

Start Cassandra

Monday, October 14, 13

Page 26: Using Apache Cassandra: What is this thing, and how do I use it?

Start Cassandra

Monday, October 14, 13

Page 27: Using Apache Cassandra: What is this thing, and how do I use it?

Installing Cassandra Python Driver

Monday, October 14, 13

Page 28: Using Apache Cassandra: What is this thing, and how do I use it?

Python Cassandra Driver

Monday, October 14, 13

Page 29: Using Apache Cassandra: What is this thing, and how do I use it?

Install Python Cassandra Driver

Monday, October 14, 13

Page 30: Using Apache Cassandra: What is this thing, and how do I use it?

Connect and Create a Keyspacefrom cassandra.cluster import Cluster

cluster = Cluster([''])session = cluster.connect()"creating keyspace...")KEYSPACE = "testkeyspace"session.execute(""" CREATE KEYSPACE IF NOT EXISTS %s WITH replication = { 'class': 'SimpleStrategy', 'replication_factor': '1' } """ % KEYSPACE)

Monday, October 14, 13

Page 31: Using Apache Cassandra: What is this thing, and how do I use it?

Create a"setting keyspace...")session.set_keyspace(KEYSPACE)"creating table...")session.execute(""" CREATE TABLE IF NOT EXISTS mytable ( thekey text, col1 text, col2 text, PRIMARY KEY (thekey, col1) ) """)

Monday, October 14, 13

Page 32: Using Apache Cassandra: What is this thing, and how do I use it?

Insert a Rowquery = SimpleStatement(""" INSERT INTO mytable (thekey, col1, col2) VALUES ('key1', 'a', 'b') """, consistency_level=ConsistencyLevel.ONE)"inserting row")session.execute(query)

Monday, October 14, 13

Page 33: Using Apache Cassandra: What is this thing, and how do I use it?

Insert Rows (Prepared Statement)prepared = session.prepare(""" INSERT INTO mytable (thekey, col1, col2) VALUES (?, ?, ?) """)

for i in range(10):"inserting row %d" % i) bound = prepared.bind(("key%d" % i, "b%d" % i, "c%d" % i)) session.execute(bound)

Monday, October 14, 13

Page 34: Using Apache Cassandra: What is this thing, and how do I use it?

Query Resultsfuture = session.execute_async(""" SELECT * FROM mytable WHERE thekey='key1' """)rows = future.result()"key\tcol1\tcol2")"---\t----\t----")for row in rows:"\t".join(row))

Monday, October 14, 13

Page 35: Using Apache Cassandra: What is this thing, and how do I use it?

Run It

Monday, October 14, 13

Page 36: Using Apache Cassandra: What is this thing, and how do I use it?

Cassandra Applications - Drivers

• DataStax Drivers for Cassandra

• Java

• C#

• Python

•more on the way

36Monday, October 14, 13

Page 37: Using Apache Cassandra: What is this thing, and how do I use it?

Find Out MoreCassandra:

DataStax Drivers:


Getting Started:

Developer Blog:

Cassandra Community Site:



Cassandra Summit Talks:

Monday, October 14, 13

Page 38: Using Apache Cassandra: What is this thing, and how do I use it?

©2013 DataStax Confidential. Do not distribute without consent. 38Monday, October 14, 13