how to size up an apache cassandra cluster (training)

40
How To Size Up A Cassandra Cluster Joe Chu, Technical Trainer [email protected] April 2014 ©2014 DataStax Confidential. Do not distribute without consent.

Upload: planet-cassandra

Post on 26-Jan-2015

116 views

Category:

Technology


3 download

DESCRIPTION

Learn how to size up an Apache Cassandra cluster by Joe Chu.

TRANSCRIPT

Page 1: How to size up an Apache Cassandra cluster (Training)

How To Size Up A Cassandra ClusterJoe Chu, Technical [email protected]

April 2014

©2014 DataStax Confidential. Do not distribute without consent.

Page 2: How to size up an Apache Cassandra cluster (Training)

2

What is Apache Cassandra?

• Distributed NoSQL database

• Linearly scalable

• Highly available with no single point of failure

• Fast writes and reads

• Tunable data consistency

• Rack and Datacenter awareness

©2014 DataStax Confidential. Do not distribute without consent.

Page 3: How to size up an Apache Cassandra cluster (Training)

3

Peer-to-peer architecture

• All nodes are the same

• No master / slave architecture

• Less operational overhead for better scalability.

• Eliminates single point of failure, increasing availability.

©2014 DataStax Confidential. Do not distribute without consent.

Master

Slave

Slave

Peer

Peer

PeerPeer

Peer

Page 4: How to size up an Apache Cassandra cluster (Training)

4

Linear Scalability

• Operation throughput increases linearly with the number of nodes added.

©2014 DataStax Confidential. Do not distribute without consent.

Page 5: How to size up an Apache Cassandra cluster (Training)

5

Data Replication

• Cassandra can write copies of data on different nodes.

RF = 3

• Replication factor setting determines the number of copies.

• Replication strategy can replicate data to different racks and and different datacenters.

©2014 DataStax Confidential. Do not distribute without consent.

INSERT INTO user_table (id, first_name, last_name) VALUES (1, ‘John’, ‘Smith’); R1

R2

R3

Page 6: How to size up an Apache Cassandra cluster (Training)

6

Node

• Instance of a running Cassandra process.

• Usually represented a single machine or server.

©2014 DataStax Confidential. Do not distribute without consent.

Page 7: How to size up an Apache Cassandra cluster (Training)

7

Rack

• Logical grouping of nodes.

• Allows data to be replicated across different racks.

©2014 DataStax Confidential. Do not distribute without consent.

Page 8: How to size up an Apache Cassandra cluster (Training)

8

Datacenter

• Grouping of nodes and racks.

• Each data center can have separate replication settings.

• May be in different geographical locations, but not always.

©2014 DataStax Confidential. Do not distribute without consent.

Page 9: How to size up an Apache Cassandra cluster (Training)

9

Cluster

• Grouping of datacenters, racks, and nodes that communicate with each other and replicate data.

• Clusters are not aware of other clusters.

©2014 DataStax Confidential. Do not distribute without consent.

Page 10: How to size up an Apache Cassandra cluster (Training)

10

Consistency Models

• Immediate consistency

When a write is successful, subsequent reads are guaranteed to return that latest value.

• Eventual consistency

When a write is successful, stale data may still be read but will eventually return the latest value.

©2014 DataStax Confidential. Do not distribute without consent.

Page 11: How to size up an Apache Cassandra cluster (Training)

11

Tunable Consistency

• Cassandra offers the ability to chose between immediate and eventual consistency by setting a consistency level.

• Consistency level is set per read or write operation.

• Common consistency levels are ONE, QUORUM, and ALL.

• For multi-datacenters, additional levels such as LOCAL_QUORUM and EACH_QUORUM to control cross-datacenter traffic.

©2014 DataStax Confidential. Do not distribute without consent.

Page 12: How to size up an Apache Cassandra cluster (Training)

12

CL ONE

• Write: Success when at least one replica node has acknowleged the write.

• Read: Only one replica node is given the read request.

©2014 DataStax Confidential. Do not distribute without consent.

R1

R2

R3CoordinatorClient

RF = 3

Page 13: How to size up an Apache Cassandra cluster (Training)

13

CL QUORUM

• Write: Success when a majority of the replica nodes has acknowledged the write.

• Read: A majority of the nodes are given the read request.

• Majority = ( RF / 2 ) + 1

©2013 DataStax Confidential. Do not distribute without consent.©2014 DataStax Confidential. Do not distribute without consent. 13

R1

R2

R3CoordinatorClient

RF = 3

Page 14: How to size up an Apache Cassandra cluster (Training)

14

CL ALL

• Write: Success when all of the replica nodes has acknowledged the write.

• Read: All replica nodes are given the read request.

©2013 DataStax Confidential. Do not distribute without consent.©2014 DataStax Confidential. Do not distribute without consent. 14

R1

R2

R3CoordinatorClient

RF = 3

Page 15: How to size up an Apache Cassandra cluster (Training)

16

Log-Structured Storage Engine

• Cassandra storage engine inspired by Google BigTable

• Key to fast write performance on Cassandra

©2014 DataStax Confidential. Do not distribute without consent.

Memtable

SSTable SSTable SSTable

Commit Log

Page 16: How to size up an Apache Cassandra cluster (Training)

17

Updates and Deletes

• SSTable files are immutable and cannot be changed.• Updates are written as new data.• Deletes write a tombstone, which mark a row or

column(s) as deleted.• Updates and deletes are just as fast as inserts.

©2014 DataStax Confidential. Do not distribute without consent.

SSTable SSTable SSTable

id:1, first:John, last:Smith

timestamp: …405

id:1, first:John, last:Williams

timestamp: …621

id:1, deleted

timestamp: …999

Page 17: How to size up an Apache Cassandra cluster (Training)

18

Compaction

• Periodically an operation is triggered that will merge the data in several SSTables into a single SSTable.

• Helps to limits the number of SSTables to read.• Removes old data and tombstones.• SSTables are deleted after compaction

©2014 DataStax Confidential. Do not distribute without consent.

SSTable SSTable SSTable

id:1, first:John, last:Smith

timestamp:405

id:1, first:John, last:Williams

timestamp:621

id:1, deleted

timestamp:999

New SSTable

id:1, deleted

timestamp:999

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

Page 18: How to size up an Apache Cassandra cluster (Training)

19

Cluster Sizing

©2014 DataStax Confidential. Do not distribute without consent.

Page 19: How to size up an Apache Cassandra cluster (Training)

20

Cluster Sizing Considerations

• Replication Factor

• Data Size

“How many nodes would I need to store my data set?”

• Data Velocity (Performance)

“How many nodes would I need to achieve my desired throughput?” ©2014 DataStax Confidential. Do not distribute without consent.

Page 20: How to size up an Apache Cassandra cluster (Training)

21

Choosing a Replication Factor

©2014 DataStax Confidential. Do not distribute without consent.

Page 21: How to size up an Apache Cassandra cluster (Training)

22

What Are You Using Replication For?

• Durability or Availability?

• Each node has local durability (Commit Log), but replication can be used for distributed durability.

• For availability, a recommended setting is RF=3.

• RF=3 is the minimum necessary to achieve both consistency and availability using QUORUM and LOCAL_QUORUM.

©2014 DataStax Confidential. Do not distribute without consent.

Page 22: How to size up an Apache Cassandra cluster (Training)

23

How Replication Can Affect Consistency Level

• When RF < 3, you do not have as much flexibility when choosing consistency and availability.

• QUORUM = ALL

©2014 DataStax Confidential. Do not distribute without consent.

R1

R2

CoordinatorClient

RF = 2

Page 23: How to size up an Apache Cassandra cluster (Training)

24

Using A Larger Replication Factor

• When RF > 3, there is more data usage and higher latency for operations requiring immediate consistency.

• If using eventual consistency, a consistency level of ONE will have consistent performance regardless of the replication factor.

• High availability clusters may use a replication factor as high as 5.

©2014 DataStax Confidential. Do not distribute without consent.

Page 24: How to size up an Apache Cassandra cluster (Training)

25

Data Size

©2014 DataStax Confidential. Do not distribute without consent.

Page 25: How to size up an Apache Cassandra cluster (Training)

26

Disk Usage Factors

• Data Size

• Replication Setting

• Old Data

• Compaction

• Snapshots

©2014 DataStax Confidential. Do not distribute without consent.

Page 26: How to size up an Apache Cassandra cluster (Training)

27

Data Sizing

• Row and Column Data

• Row and Column Overhead

• Indices and Other Structures

©2014 DataStax Confidential. Do not distribute without consent.

Page 27: How to size up an Apache Cassandra cluster (Training)

28

Replication Overhead

• A replication factor > 1 will effectively multiply your data size by that amount.

©2014 DataStax Confidential. Do not distribute without consent.

RF = 1 RF = 2 RF = 3

Page 28: How to size up an Apache Cassandra cluster (Training)

29

Old Data

• Updates and deletes do not actually overwrite or delete data.

• Older versions of data and tombstones remain in the SSTable files until they are compacted.

• This becomes more important for heavy update and delete workloads.

©2014 DataStax Confidential. Do not distribute without consent.

Page 29: How to size up an Apache Cassandra cluster (Training)

30

Compaction

• Compaction needs free disk space to write the new SSTable, before the SSTables being compacted are removed.

• Leave enough free disk space on each node to allow compactions to run.

• Worst case for the Size Tier Compaction Strategy is

50% of the total data capacity of the node.

• For the Leveled Compaction Strategy, that is about 10% of the total data capacity.

©2014 DataStax Confidential. Do not distribute without consent.

Page 30: How to size up an Apache Cassandra cluster (Training)

31

Snapshots

• Snapshots are hard-links or copies of SSTable data files.

• After SSTables are compacted, the disk space may not be reclaimed if a snapshot of those SSTables were created.

Snapshots are created when:

• Executing the nodetool snapshot command• Dropping a keyspace or table• Incremental backups• During compaction

©2014 DataStax Confidential. Do not distribute without consent.

Page 31: How to size up an Apache Cassandra cluster (Training)

32

Recommended Disk Capacity

• For current Cassandra versions, the ideal disk capacity is approximate 1TB per node if using spinning disks and 3-5 TB per node using SSDs.

• Having a larger disk capacity may be limited by the resulting performance.

• What works for you is still dependent on your data model design and desired data velocity.

©2014 DataStax Confidential. Do not distribute without consent.

Page 32: How to size up an Apache Cassandra cluster (Training)

33

Data Velocity (Performance)

©2014 DataStax Confidential. Do not distribute without consent.

Page 33: How to size up an Apache Cassandra cluster (Training)

34

How to Measure Performance

• I/O Throughput

“How many reads and writes can be completed per second?”

• Read and Write Latency

“How fast should I be able to get a response for my read and write requests?”

©2014 DataStax Confidential. Do not distribute without consent.

Page 34: How to size up an Apache Cassandra cluster (Training)

35

Sizing for Failure

• Cluster must be sized taking into account the performance impact caused by failure.

• When a node fails, the corresponding workload must be absorbed by the other replica nodes in the cluster.

• Performance is further impacted when recovering a node. Data must be streamed or repaired using the other replica nodes.

©2014 DataStax Confidential. Do not distribute without consent.

Page 35: How to size up an Apache Cassandra cluster (Training)

36

Hardware Considerations for Performance

CPU• Operations are often CPU-intensive.• More cores are better.

Memory• Cassandra uses JVM heap memory.• Additional memory used as off-heap memory by

Cassandra, or as the OS page cache.

Disk• C* optimized for spinning disks, but SSDs will perform

better.• Attached storage (SAN) is strongly discouraged.

©2014 DataStax Confidential. Do not distribute without consent.

Page 36: How to size up an Apache Cassandra cluster (Training)

37

Some Final Words…

©2014 DataStax Confidential. Do not distribute without consent.

Page 37: How to size up an Apache Cassandra cluster (Training)

38

Summary

• Cassandra allows flexibility when sizing your cluster from a single node to thousands of nodes

• Your use case will dictate how you want to size and configure your Cassandra cluster. Do you need availability? Immediate consistency?

• The minimum number of nodes needed will be determined by your data size, desired performance and replication factor.

©2014 DataStax Confidential. Do not distribute without consent.

Page 38: How to size up an Apache Cassandra cluster (Training)

39

Additional Resources

• DataStax Documentationhttp://www.datastax.com/documentation/cassandra/2.0/cassandra/architecture/architecturePlanningAbout_c.html

• Planet Cassandrahttp://planetcassandra.org/nosql-cassandra-education/

• Cassandra Users Mailing [email protected]://mail-archives.apache.org/mod_mbox/cassandra-user/

©2014 DataStax Confidential. Do not distribute without consent.

Page 39: How to size up an Apache Cassandra cluster (Training)

40

Questions?

Questions?

©2014 DataStax Confidential. Do not distribute without consent.

Page 40: How to size up an Apache Cassandra cluster (Training)

Thank You

We power the big dataapps that transform business.

41©2014 DataStax Confidential. Do not distribute without consent.