cassandra implementation for collecting data and presenting data
DESCRIPTION
Cassandra implementation for collecting data and presenting dataTRANSCRIPT
![Page 2: Cassandra implementation for collecting data and presenting data](https://reader035.vdocuments.net/reader035/viewer/2022081519/554f44e0b4c90572088b559b/html5/thumbnails/2.jpg)
Agenda• SQL vs NOSQL• Why Cassandra• Cassandra introduction• Our architecture and design• Configuration best practice• How we write data• How we read data • Demo
![Page 3: Cassandra implementation for collecting data and presenting data](https://reader035.vdocuments.net/reader035/viewer/2022081519/554f44e0b4c90572088b559b/html5/thumbnails/3.jpg)
A highly scalable, eventually consistent, distributed, structured key-value store.
Cassandra™ is the highly scalable and high performance distributed data infrastructure. Offering distribution of data across multiple data centers and incremental scalability with no single points of failure, Cassandra is the logical choice when you need reliability without compromising performance. Cassandra is relied upon by
leading companies like Netflix, Twitter, Cisco, Rackspace, Ooyala, Openwave, and many more.
![Page 4: Cassandra implementation for collecting data and presenting data](https://reader035.vdocuments.net/reader035/viewer/2022081519/554f44e0b4c90572088b559b/html5/thumbnails/4.jpg)
SQL vs NOSQL• NOSQL
• Not just SQL, schema free• Big data• NOSQL can service heavy read/write workloads • Probably not consistent in real time read
• SQL• Can support complex join relationship• Oracle RAC solution for big data? Too expensive• Typical RDBMS implementations are tuned for small but frequent read/write transactions or for
large batch transactions with rare write access • RDBMSs (they say) have shown poor performance on data-intensive applications, including:
• Indexing a large number of documents• Serving pages on high-traffic websites• Handling the volumes of social networking data• Delivering streaming media
• Consistent in all read
![Page 5: Cassandra implementation for collecting data and presenting data](https://reader035.vdocuments.net/reader035/viewer/2022081519/554f44e0b4c90572088b559b/html5/thumbnails/5.jpg)
Why Cassandra• To solve our central netapp filer storage bottleneck issue• Choose cassandra instead of Hbase
• No Single point of failure• Fast development
• Big data and dynamically changing environment • Good fit for horizontally production environment• Low total cost of ownership
• No special hardware needed, just some x86 boxes
![Page 6: Cassandra implementation for collecting data and presenting data](https://reader035.vdocuments.net/reader035/viewer/2022081519/554f44e0b4c90572088b559b/html5/thumbnails/6.jpg)
Cassandra Design • High availability (A wily hare has three burrows )• Eventual consistency
• trade-off strong consistency in favor of high availability• allows you to choose strong consistency or allow varying degress of more relaxed consistency
• Incremental scalability(linearly scalable), Horizontal!• Nodes added to a Cassandra cluster (all done online) increase the throughput of your database
in a predictable, linear fashion for both read and write operations
• Optimistic Replication•
![Page 7: Cassandra implementation for collecting data and presenting data](https://reader035.vdocuments.net/reader035/viewer/2022081519/554f44e0b4c90572088b559b/html5/thumbnails/7.jpg)
Cassandra Design II
• All nodes are identical: decentralized/symmetric• No master or SPOF• Adding is simple• Distributed, read/write anywhere design
• Massively scalable peer-to-peer architecture• Based on the best of Amazon Dynamo and Google BigTable
• Minimal administration• Multi-datacenter replication• No caching layer required
![Page 8: Cassandra implementation for collecting data and presenting data](https://reader035.vdocuments.net/reader035/viewer/2022081519/554f44e0b4c90572088b559b/html5/thumbnails/8.jpg)
Cassandra Design III• very fast writes• fault tolerant, Guaranteed data safety• automatic provisioning of new nodes• big data• Transparent fault detection and recovery
• Cassandra utilizes gossip protocols to detect machine failure and recover when a machine is brought back into the cluster – all without your application noticing.
![Page 9: Cassandra implementation for collecting data and presenting data](https://reader035.vdocuments.net/reader035/viewer/2022081519/554f44e0b4c90572088b559b/html5/thumbnails/9.jpg)
write op
![Page 10: Cassandra implementation for collecting data and presenting data](https://reader035.vdocuments.net/reader035/viewer/2022081519/554f44e0b4c90572088b559b/html5/thumbnails/10.jpg)
Write op (continue)
• Writes go to log and memory table• Periodically memory table merged with disk table
Cassandra node
Disk
RAM
Log SSTable file
Memtable
Update
(later)
![Page 11: Cassandra implementation for collecting data and presenting data](https://reader035.vdocuments.net/reader035/viewer/2022081519/554f44e0b4c90572088b559b/html5/thumbnails/11.jpg)
Read
Query
Closest replica
Cassandra Cluster
Replica A
Result
Replica B Replica C
Digest QueryDigest Response Digest Response
Result
Client
Read repair if digests differ
![Page 12: Cassandra implementation for collecting data and presenting data](https://reader035.vdocuments.net/reader035/viewer/2022081519/554f44e0b4c90572088b559b/html5/thumbnails/12.jpg)
Configuration best practice• Put the data files on good performance RAID volumes• Start with Sun JDK 1.6+• Configure with Java Native libs• The clocks on each node must be synchronized to maintain precision
across the cluster on inserts.
![Page 13: Cassandra implementation for collecting data and presenting data](https://reader035.vdocuments.net/reader035/viewer/2022081519/554f44e0b4c90572088b559b/html5/thumbnails/13.jpg)
Data collection Architecture
Web UI (High Chart/ JQuery)
Active MQ (Message Bus)
1. collect data sent to Active MQ
2. Consume data, save to Cassandra
3. Filer the data, showing on the plots
![Page 14: Cassandra implementation for collecting data and presenting data](https://reader035.vdocuments.net/reader035/viewer/2022081519/554f44e0b4c90572088b559b/html5/thumbnails/14.jpg)
Data structure
keyspace
settings (eg,
partitioner)
column family
settings (eg, comparator, type [Std])
columnname value clock
![Page 15: Cassandra implementation for collecting data and presenting data](https://reader035.vdocuments.net/reader035/viewer/2022081519/554f44e0b4c90572088b559b/html5/thumbnails/15.jpg)
Company Logo
Our Data Model
CoreMetrics (keyspace)
LoadAvg1 (Column family)
host1_131696(row)
Column:6449, value: 0.04
Column:5546, value: 0.02
host2_131811(row)
Column:8227, value: 0.46
Column:9792, value: 1.30
![Page 16: Cassandra implementation for collecting data and presenting data](https://reader035.vdocuments.net/reader035/viewer/2022081519/554f44e0b4c90572088b559b/html5/thumbnails/16.jpg)
Company Logo
Our Data Model
CoreMetrics (keyspace)
Primary (Column family)
host1:loadAvg1 (row)
Column:1316966449, value: 0.04 Column:1316965546, value: 0.02
host2:loadAvg1 (row)
Column:1318118227, value: 0.46 Column:1318119792, value: 1.30
![Page 17: Cassandra implementation for collecting data and presenting data](https://reader035.vdocuments.net/reader035/viewer/2022081519/554f44e0b4c90572088b559b/html5/thumbnails/17.jpg)
Company Logo
Our Meta Data Model
CoreMetrics (keyspace)
PrimaryMeta (Column family)
host1.com (row)
Column:loadAvg15:Total value: 1
Column:loadAvg15:Total value: 1
host2 (row)
Column:loadAvg15:Total value: 1 Column:loadAvg15:Total value: 1
![Page 18: Cassandra implementation for collecting data and presenting data](https://reader035.vdocuments.net/reader035/viewer/2022081519/554f44e0b4c90572088b559b/html5/thumbnails/18.jpg)
Company Logo
Our Hbase Data Model
Primary (Column family)
host1:loadAvg1:1 (row: host:metric:instance)
Column:c:1316966449, value: 0.04 Column:c:1316965546, value: 0.02
host2:loadAvg1:1 (row: host:metric:instance)
Column:1318118227, value: 0.46 Column:1318119792, value: 1.30
![Page 19: Cassandra implementation for collecting data and presenting data](https://reader035.vdocuments.net/reader035/viewer/2022081519/554f44e0b4c90572088b559b/html5/thumbnails/19.jpg)
Company Logo
Our Data Model (II)• Keyspace: CoreMetrics (database name), one per application
• Column families: (metrics, each metric is a column family)
• loadAvg1• loadAvg5• etc (About 80 server metrics)
• Rows and columns: inspired by the design of Hbase and opentsdb, we use the similar way to design our rows and columns:
separate timestamp into row and column keys, which improve tremendously the reading performance
![Page 20: Cassandra implementation for collecting data and presenting data](https://reader035.vdocuments.net/reader035/viewer/2022081519/554f44e0b4c90572088b559b/html5/thumbnails/20.jpg)
How we write to cassandraMultiple data loaders connect to cassandra nodes 9160 port and insert data like this:
$CLIENT = new Cassandra::CassandraClient($PROTOCOL);
$CLIENT->set_keyspace($keyspace);
$CLIENT->insert($rowkey, $column_parent, $column, $consistency_level);
![Page 21: Cassandra implementation for collecting data and presenting data](https://reader035.vdocuments.net/reader035/viewer/2022081519/554f44e0b4c90572088b559b/html5/thumbnails/21.jpg)
How we read data from cassandra
Using pycassa to multiget of the rows and do some aggregation if too many data points returns.
get_coremetrics(metric_name, host, stime, etime, samples = 1000):
![Page 22: Cassandra implementation for collecting data and presenting data](https://reader035.vdocuments.net/reader035/viewer/2022081519/554f44e0b4c90572088b559b/html5/thumbnails/22.jpg)
Company Logo
Demo: data model view
![Page 23: Cassandra implementation for collecting data and presenting data](https://reader035.vdocuments.net/reader035/viewer/2022081519/554f44e0b4c90572088b559b/html5/thumbnails/23.jpg)
Company Logo
Demo: graphing the data
![Page 24: Cassandra implementation for collecting data and presenting data](https://reader035.vdocuments.net/reader035/viewer/2022081519/554f44e0b4c90572088b559b/html5/thumbnails/24.jpg)
Cassandra monitoring
1.Nagios plugin for cassandra2.JMX
![Page 25: Cassandra implementation for collecting data and presenting data](https://reader035.vdocuments.net/reader035/viewer/2022081519/554f44e0b4c90572088b559b/html5/thumbnails/25.jpg)
Thoughts and future
1.Migrate more applications to Cassandra2.Livestat data (Bids/Listings…)3.Help other team to do data collection and graphing?
![Page 26: Cassandra implementation for collecting data and presenting data](https://reader035.vdocuments.net/reader035/viewer/2022081519/554f44e0b4c90572088b559b/html5/thumbnails/26.jpg)
Reference URLs
• Thrift (12 language bindings!)• http://wiki.apache.org/cassandra/ThriftInterface • http://thrift.apache.org/download/
• Pycassa• http://pycassa.github.com/pycassa/tutorial.html
![Page 27: Cassandra implementation for collecting data and presenting data](https://reader035.vdocuments.net/reader035/viewer/2022081519/554f44e0b4c90572088b559b/html5/thumbnails/27.jpg)