![Page 1: Scaling web applications with cassandra presentation](https://reader036.vdocuments.net/reader036/viewer/2022062511/54b7a2d94a795993718b4760/html5/thumbnails/1.jpg)
introduction to cassandra
eben hewitt
september 29. 2010web 2.0 exponew york city
![Page 2: Scaling web applications with cassandra presentation](https://reader036.vdocuments.net/reader036/viewer/2022062511/54b7a2d94a795993718b4760/html5/thumbnails/2.jpg)
• director, application architecture at a global corp
• focus on SOA, SaaS, Events
• i wrote this
@ebenhewitt
![Page 3: Scaling web applications with cassandra presentation](https://reader036.vdocuments.net/reader036/viewer/2022062511/54b7a2d94a795993718b4760/html5/thumbnails/3.jpg)
agenda
• context• features• data model• api
![Page 4: Scaling web applications with cassandra presentation](https://reader036.vdocuments.net/reader036/viewer/2022062511/54b7a2d94a795993718b4760/html5/thumbnails/4.jpg)
“nosql” “big data”
• mongodb• couchdb• tokyo cabinet• redis• riak• what about?– Poet, Lotus, Xindice– they’ve been around forever…– rdbms was once the new kid…
![Page 5: Scaling web applications with cassandra presentation](https://reader036.vdocuments.net/reader036/viewer/2022062511/54b7a2d94a795993718b4760/html5/thumbnails/5.jpg)
innovation at scale• google bigtable (2006)– consistency model: strong– data model: sparse map– clones: hbase, hypertable
• amazon dynamo (2007)– O(1) dht– consistency model: client tune-able– clones: riak, voldemort
cassandra ~= bigtable + dynamo
![Page 6: Scaling web applications with cassandra presentation](https://reader036.vdocuments.net/reader036/viewer/2022062511/54b7a2d94a795993718b4760/html5/thumbnails/6.jpg)
proven
• The Facebook stores 150TB of data on 150 nodes
web 2.0• used at Twitter, Rackspace, Mahalo, Reddit,
Cloudkick, Cisco, Digg, SimpleGeo, Ooyala, OpenX, others
![Page 7: Scaling web applications with cassandra presentation](https://reader036.vdocuments.net/reader036/viewer/2022062511/54b7a2d94a795993718b4760/html5/thumbnails/7.jpg)
cap theorem
•consistency– all clients have same view of data
•availability– writeable in the face of node failure
•partition tolerance– processing can continue in the face of network failure
(crashed router, broken network)
![Page 8: Scaling web applications with cassandra presentation](https://reader036.vdocuments.net/reader036/viewer/2022062511/54b7a2d94a795993718b4760/html5/thumbnails/8.jpg)
daniel abadi: pacelc
![Page 9: Scaling web applications with cassandra presentation](https://reader036.vdocuments.net/reader036/viewer/2022062511/54b7a2d94a795993718b4760/html5/thumbnails/9.jpg)
write consistencyLevel Description
ZERO Good luck with that
ANY 1 replica (hints count)
ONE 1 replica. read repair in bkgnd
QUORUM (DCQ for RackAware) (N /2) + 1
ALL N = replication factor
Level Description
ZERO Ummm…
ANY Try ONE instead
ONE 1 replica
QUORUM (DCQ for RackAware) Return most recent TS after (N /2) + 1 report
ALL N = replication factor
read consistency
![Page 10: Scaling web applications with cassandra presentation](https://reader036.vdocuments.net/reader036/viewer/2022062511/54b7a2d94a795993718b4760/html5/thumbnails/10.jpg)
agenda
• context• features• data model• api
![Page 11: Scaling web applications with cassandra presentation](https://reader036.vdocuments.net/reader036/viewer/2022062511/54b7a2d94a795993718b4760/html5/thumbnails/11.jpg)
cassandra properties• tuneably consistent• very fast writes• highly available• fault tolerant• linear, elastic scalability• decentralized/symmetric• ~12 client languages – Thrift RPC API
• ~automatic provisioning of new nodes• 0(1) dht • big data
![Page 12: Scaling web applications with cassandra presentation](https://reader036.vdocuments.net/reader036/viewer/2022062511/54b7a2d94a795993718b4760/html5/thumbnails/12.jpg)
write op
![Page 13: Scaling web applications with cassandra presentation](https://reader036.vdocuments.net/reader036/viewer/2022062511/54b7a2d94a795993718b4760/html5/thumbnails/13.jpg)
Staged Event-Driven Architecture
• A general-purpose framework for high concurrency & load conditioning
• Decomposes applications into stages separated by queues
• Adopt a structured approach to event-driven concurrency
![Page 14: Scaling web applications with cassandra presentation](https://reader036.vdocuments.net/reader036/viewer/2022062511/54b7a2d94a795993718b4760/html5/thumbnails/14.jpg)
instrumentation
![Page 15: Scaling web applications with cassandra presentation](https://reader036.vdocuments.net/reader036/viewer/2022062511/54b7a2d94a795993718b4760/html5/thumbnails/15.jpg)
data replication
![Page 16: Scaling web applications with cassandra presentation](https://reader036.vdocuments.net/reader036/viewer/2022062511/54b7a2d94a795993718b4760/html5/thumbnails/16.jpg)
partitioner smack-down
Random Preserving• system will use MD5(key) to
distribute data across nodes• even distribution of keys
from one CF across ranges/nodes
Order Preserving• key distribution determined
by token• lexicographical ordering• required for range queries
– scan over rows like cursor in index
• can specify the token for this node to use
• ‘scrabble’ distribution
![Page 17: Scaling web applications with cassandra presentation](https://reader036.vdocuments.net/reader036/viewer/2022062511/54b7a2d94a795993718b4760/html5/thumbnails/17.jpg)
agenda
• context• features• data model• api
![Page 18: Scaling web applications with cassandra presentation](https://reader036.vdocuments.net/reader036/viewer/2022062511/54b7a2d94a795993718b4760/html5/thumbnails/18.jpg)
structure
![Page 19: Scaling web applications with cassandra presentation](https://reader036.vdocuments.net/reader036/viewer/2022062511/54b7a2d94a795993718b4760/html5/thumbnails/19.jpg)
keyspace
• ~= database• typically one per application• some settings are configurable only per
keyspace
![Page 20: Scaling web applications with cassandra presentation](https://reader036.vdocuments.net/reader036/viewer/2022062511/54b7a2d94a795993718b4760/html5/thumbnails/20.jpg)
column family
• group records of similar kind• not same kind, because CFs are sparse tables• ex:– User– Address– Tweet– PointOfInterest– HotelRoom
![Page 21: Scaling web applications with cassandra presentation](https://reader036.vdocuments.net/reader036/viewer/2022062511/54b7a2d94a795993718b4760/html5/thumbnails/21.jpg)
think of cassandra as
row-oriented• each row is uniquely identifiable by key• rows group columns and super columns
![Page 22: Scaling web applications with cassandra presentation](https://reader036.vdocuments.net/reader036/viewer/2022062511/54b7a2d94a795993718b4760/html5/thumbnails/22.jpg)
column family
n=42
user=ebenkey123
key456 user=alison
icon=
nickname=The
Situation
![Page 23: Scaling web applications with cassandra presentation](https://reader036.vdocuments.net/reader036/viewer/2022062511/54b7a2d94a795993718b4760/html5/thumbnails/23.jpg)
json-like notation
User {123 : { email: [email protected],
icon: },
456 : { email: [email protected], location: The Danger Zone}
}
![Page 24: Scaling web applications with cassandra presentation](https://reader036.vdocuments.net/reader036/viewer/2022062511/54b7a2d94a795993718b4760/html5/thumbnails/24.jpg)
0.6 example$cassandra –f$bin/cassandra-cli cassandra> connect localhost/9160
cassandra> set Keyspace1.Standard1[‘eben’][‘age’]=‘29’
cassandra> set Keyspace1.Standard1[‘eben’][‘email’]=‘[email protected]’
cassandra> get Keyspace1.Standard1[‘eben'][‘age']=> (column=6e616d65, value=39,
timestamp=1282170655390000)
![Page 25: Scaling web applications with cassandra presentation](https://reader036.vdocuments.net/reader036/viewer/2022062511/54b7a2d94a795993718b4760/html5/thumbnails/25.jpg)
a column has 3 parts
1. name– byte[]– determines sort order– used in queries– indexed
2. value– byte[]– you don’t query on column values
3. timestamp– long (clock)– last write wins conflict resolution
![Page 26: Scaling web applications with cassandra presentation](https://reader036.vdocuments.net/reader036/viewer/2022062511/54b7a2d94a795993718b4760/html5/thumbnails/26.jpg)
column comparators
• byte• utf8• long• timeuuid• lexicaluuid• <pluggable>– ex: lat/long
![Page 27: Scaling web applications with cassandra presentation](https://reader036.vdocuments.net/reader036/viewer/2022062511/54b7a2d94a795993718b4760/html5/thumbnails/27.jpg)
super column
super columns group columns under a common name
![Page 28: Scaling web applications with cassandra presentation](https://reader036.vdocuments.net/reader036/viewer/2022062511/54b7a2d94a795993718b4760/html5/thumbnails/28.jpg)
<<SCF>>PointOfInterest
super column family
<<SC>>Central Park
10017<<SC>>
Empire State Bldg
<<SC>>Phoenix
Zoo85255
desc=Fun to walk in.
phone=212. 555.11212
desc=Great view from
102nd floor!
![Page 29: Scaling web applications with cassandra presentation](https://reader036.vdocuments.net/reader036/viewer/2022062511/54b7a2d94a795993718b4760/html5/thumbnails/29.jpg)
PointOfInterest { key: 85255 { Phoenix Zoo { phone: 480-555-5555, desc: They have animals here. },
Spring Training { phone: 623-333-3333, desc: Fun for baseball fans. }, }, //end phx
key: 10019 { Central Park { desc: Walk around. It's pretty.} , Empire State Building { phone: 212-777-7777,
desc: Great view from 102nd floor. } } //end nyc}
s
super column
super column family
flexible schema
key
column
super column family
![Page 30: Scaling web applications with cassandra presentation](https://reader036.vdocuments.net/reader036/viewer/2022062511/54b7a2d94a795993718b4760/html5/thumbnails/30.jpg)
about super column families
• sub-column names in a SCF are not indexed– top level columns (SCF Name) are always indexed
• often used for denormalizing data from standard CFs
![Page 31: Scaling web applications with cassandra presentation](https://reader036.vdocuments.net/reader036/viewer/2022062511/54b7a2d94a795993718b4760/html5/thumbnails/31.jpg)
agenda
• context• features• data model• api
![Page 32: Scaling web applications with cassandra presentation](https://reader036.vdocuments.net/reader036/viewer/2022062511/54b7a2d94a795993718b4760/html5/thumbnails/32.jpg)
slice predicate
• data structure describing columns to return– SliceRange• start column name• finish column name (can be empty to stop on count)• reverse• count (like LIMIT)
![Page 33: Scaling web applications with cassandra presentation](https://reader036.vdocuments.net/reader036/viewer/2022062511/54b7a2d94a795993718b4760/html5/thumbnails/33.jpg)
read api• get() : Column– get the Col or SC at given ColPath COSC cosc = client.get(key, path, CL);
• get_slice() : List<ColumnOrSuperColumn>– get Cols in one row, specified by SlicePredicate: List<ColumnOrSuperColumn> results = client.get_slice(key, parent, predicate, CL);
• multiget_slice() : Map<key, List<CoSC>>– get slices for list of keys, based on SlicePredicate
Map<byte[],List<ColumnOrSuperColumn>> results = client.multiget_slice(rowKeys, parent, predicate, CL);
• get_range_slices() : List<KeySlice> – returns multiple Cols according to a range– range is startkey, endkey, starttoken, endtoken: List<KeySlice> slices = client.get_range_slices(
parent, predicate, keyRange, CL);
![Page 34: Scaling web applications with cassandra presentation](https://reader036.vdocuments.net/reader036/viewer/2022062511/54b7a2d94a795993718b4760/html5/thumbnails/34.jpg)
write apiclient.insert(userKeyBytes, parent, new Column(“band".getBytes(UTF8), “Funkadelic".getBytes(), clock), CL);
batch_mutate– void batch_mutate(
map<byte[], map<String, List<Mutation>>> , CL)
remove– void remove(byte[],
ColumnPath column_path, Clock, CL)
![Page 35: Scaling web applications with cassandra presentation](https://reader036.vdocuments.net/reader036/viewer/2022062511/54b7a2d94a795993718b4760/html5/thumbnails/35.jpg)
batch_mutate//create paramMap<byte[], Map<String, List<Mutation>>> mutationMap = new HashMap<byte[], Map<String, List<Mutation>>>();
//create Cols for MutsColumn nameCol = new Column("name".getBytes(UTF8),“Funkadelic”.getBytes("UTF-8"), new Clock(System.nanoTime()););Mutation nameMut = new Mutation();nameMut.column_or_supercolumn = nameCosc; //also phone, etc
Map<String, List<Mutation>> muts = new HashMap<String, List<Mutation>>();List<Mutation> cols = new ArrayList<Mutation>();cols.add(nameMut);cols.add(phoneMut);muts.put(CF, cols);//outer map key is a row key; inner map key is the CF namemutationMap.put(rowKey.getBytes(), muts);//send to serverclient.batch_mutate(mutationMap, CL);
![Page 36: Scaling web applications with cassandra presentation](https://reader036.vdocuments.net/reader036/viewer/2022062511/54b7a2d94a795993718b4760/html5/thumbnails/36.jpg)
raw thrift: for masochists only
• pycassa (python)• fauna (ruby)• hector (java)• pelops (java)• kundera (JPA)• hectorSharp (C#)
![Page 37: Scaling web applications with cassandra presentation](https://reader036.vdocuments.net/reader036/viewer/2022062511/54b7a2d94a795993718b4760/html5/thumbnails/37.jpg)
what about…
SELECT WHEREORDER BY
JOIN ON GROUP?
![Page 38: Scaling web applications with cassandra presentation](https://reader036.vdocuments.net/reader036/viewer/2022062511/54b7a2d94a795993718b4760/html5/thumbnails/38.jpg)
rdbms: domain-based model what answers do I have?
cassandra: query-based model what questions do I have?
![Page 39: Scaling web applications with cassandra presentation](https://reader036.vdocuments.net/reader036/viewer/2022062511/54b7a2d94a795993718b4760/html5/thumbnails/39.jpg)
SELECT WHEREcassandra is an index factory
<<cf>>USERKey: UserIDCols: username, email, birth date, city, state How to support this query?
SELECT * FROM User WHERE city = ‘Scottsdale’
Create a new CF called UserCity: <<cf>>USERCITYKey: cityCols: IDs of the users in that city.Also uses the Valueless Column pattern
![Page 40: Scaling web applications with cassandra presentation](https://reader036.vdocuments.net/reader036/viewer/2022062511/54b7a2d94a795993718b4760/html5/thumbnails/40.jpg)
• Use an aggregate key state:city: { user1, user2}
• Get rows between AZ: & AZ; for all Arizona users
• Get rows between AZ:Scottsdale & AZ:Scottsdale1 for all Scottsdale users
SELECT WHERE pt 2
![Page 41: Scaling web applications with cassandra presentation](https://reader036.vdocuments.net/reader036/viewer/2022062511/54b7a2d94a795993718b4760/html5/thumbnails/41.jpg)
ORDER BY
Rows
are placed according to their Partitioner:
•Random: MD5 of key
•Order-Preserving: actual key
are sorted by key, regardless of partitioner
Columns
are sorted according to
CompareWith or CompareSubcolumnsWith
![Page 42: Scaling web applications with cassandra presentation](https://reader036.vdocuments.net/reader036/viewer/2022062511/54b7a2d94a795993718b4760/html5/thumbnails/42.jpg)
![Page 43: Scaling web applications with cassandra presentation](https://reader036.vdocuments.net/reader036/viewer/2022062511/54b7a2d94a795993718b4760/html5/thumbnails/43.jpg)
![Page 44: Scaling web applications with cassandra presentation](https://reader036.vdocuments.net/reader036/viewer/2022062511/54b7a2d94a795993718b4760/html5/thumbnails/44.jpg)
is cassandra a good fit?
• you need really fast writes• you need durability• you have lots of data > GBs
>= three servers
• your app is evolving– startup mode, fluid data
structure
• loose domain data – “points of interest”
• your programmers can deal– documentation– complexity– consistency model– change– visibility tools
• your operations can deal– hardware considerations– can move data– JMX monitoring
![Page 45: Scaling web applications with cassandra presentation](https://reader036.vdocuments.net/reader036/viewer/2022062511/54b7a2d94a795993718b4760/html5/thumbnails/45.jpg)
thank you!@ebenhewitt