time series with apache cassandra strata
DESCRIPTION
This talk is geared around understanding the basics of how Apache Cassandra stores and access time series data.TRANSCRIPT
![Page 1: Time series with apache cassandra strata](https://reader034.vdocuments.net/reader034/viewer/2022042606/54b6ca784a7959772b8b4587/html5/thumbnails/1.jpg)
©2013 DataStax Confidential. Do not distribute without consent.
@PatrickMcFadin
Patrick McFadinChief Evangelist
Time Series with Apache Cassandra
�1
![Page 2: Time series with apache cassandra strata](https://reader034.vdocuments.net/reader034/viewer/2022042606/54b6ca784a7959772b8b4587/html5/thumbnails/2.jpg)
Quick intro to Cassandra• Shared nothing •Masterless peer-to-peer • Based on Dynamo
![Page 3: Time series with apache cassandra strata](https://reader034.vdocuments.net/reader034/viewer/2022042606/54b6ca784a7959772b8b4587/html5/thumbnails/3.jpg)
Scaling• Add nodes to scale •Millions Ops/s Cassandra HBase Redis MySQL
THRO
UG
HPU
T O
PS/S
EC)
![Page 4: Time series with apache cassandra strata](https://reader034.vdocuments.net/reader034/viewer/2022042606/54b6ca784a7959772b8b4587/html5/thumbnails/4.jpg)
Uptime• Built to replicate • Resilient to failure • Always on
NONE
![Page 5: Time series with apache cassandra strata](https://reader034.vdocuments.net/reader034/viewer/2022042606/54b6ca784a7959772b8b4587/html5/thumbnails/5.jpg)
Easy to use• CQL is a familiar syntax • Friendly to programmers • Paxos for locking
CREATE TABLE users (! username varchar,! firstname varchar,! lastname varchar,! email list<varchar>,! password varchar,! created_date timestamp,! PRIMARY KEY (username)!);
INSERT INTO users (username, firstname, lastname, ! email, password, created_date)!VALUES ('pmcfadin','Patrick','McFadin',! ['[email protected]'],'ba27e03fd95e507daf2937c937d499ab',! '2011-06-20 13:50:00');!
INSERT INTO users (username, firstname, ! lastname, email, password, created_date)!VALUES ('pmcfadin','Patrick','McFadin',! ['[email protected]'],! 'ba27e03fd95e507daf2937c937d499ab',! '2011-06-20 13:50:00')!IF NOT EXISTS;
![Page 6: Time series with apache cassandra strata](https://reader034.vdocuments.net/reader034/viewer/2022042606/54b6ca784a7959772b8b4587/html5/thumbnails/6.jpg)
Time series in production• It’s all about “What’s happening” • Data is the new currency
“Sirca, a non-profit university consortium based in Sydney, is the world’s biggest broker of financial data, ingesting into its database 2million pieces of information a second from every major trading exchange.”*
* http://www.theage.com.au/it-pro/business-it/help-poverty-theres-an-app-for-that-20140120-hv948.html
![Page 7: Time series with apache cassandra strata](https://reader034.vdocuments.net/reader034/viewer/2022042606/54b6ca784a7959772b8b4587/html5/thumbnails/7.jpg)
Why Cassandra for Time Series
ScalesResilientGood data modelEfficient Storage Model
What about that?
![Page 8: Time series with apache cassandra strata](https://reader034.vdocuments.net/reader034/viewer/2022042606/54b6ca784a7959772b8b4587/html5/thumbnails/8.jpg)
Data Model•Weather Station Id and Time
are unique • Store as many as needed
CREATE TABLE temperature ( weatherstation_id text, event_time timestamp, temperature text, PRIMARY KEY (weatherstation_id,event_time) );
INSERT INTO temperature(weatherstation_id,event_time,temperature) VALUES ('1234ABCD','2013-04-03 07:01:00','72F'); !INSERT INTO temperature(weatherstation_id,event_time,temperature) VALUES ('1234ABCD','2013-04-03 07:02:00','73F'); !INSERT INTO temperature(weatherstation_id,event_time,temperature) VALUES ('1234ABCD','2013-04-03 07:03:00','73F'); !INSERT INTO temperature(weatherstation_id,event_time,temperature) VALUES ('1234ABCD','2013-04-03 07:04:00','74F');
![Page 9: Time series with apache cassandra strata](https://reader034.vdocuments.net/reader034/viewer/2022042606/54b6ca784a7959772b8b4587/html5/thumbnails/9.jpg)
Storage Model - Logical View
2013-04-03 07:01:00
72F
2013-04-03 07:02:00
73F
2013-04-03 07:03:00
73F
SELECT weatherstation_id,event_time,temperature FROM temperature WHERE weatherstation_id='1234ABCD';
1234ABCD
1234ABCD
1234ABCD
weatherstation_id event_time temperature
2013-04-03 07:04:00
74F1234ABCD
![Page 10: Time series with apache cassandra strata](https://reader034.vdocuments.net/reader034/viewer/2022042606/54b6ca784a7959772b8b4587/html5/thumbnails/10.jpg)
Storage Model - Disk Layout
2013-04-03 07:01:00
72F
2013-04-03 07:02:00
73F
2013-04-03 07:03:00
73F1234ABCD
2013-04-03 07:04:00
74F
SELECT weatherstation_id,event_time,temperature FROM temperature WHERE weatherstation_id='1234ABCD';
Merged, Sorted and Stored Sequentially
2013-04-03 07:05:00 !!74F
2013-04-03 07:06:00 !!75F
![Page 11: Time series with apache cassandra strata](https://reader034.vdocuments.net/reader034/viewer/2022042606/54b6ca784a7959772b8b4587/html5/thumbnails/11.jpg)
Query patterns• Range queries • “Slice” operation on disk
SELECT temperature FROM event_time,temperature WHERE weatherstation_id='1234ABCD' AND event_time > '2013-04-03 07:01:00' AND event_time < '2013-04-03 07:04:00';
2013-04-03 07:01:00
72F
2013-04-03 07:02:00
73F
2013-04-03 07:03:00
73F1234ABCD
2013-04-03 07:04:00
74F
2013-04-03 07:05:00 !!74F
2013-04-03 07:06:00 !!75F
Single seek on disk
![Page 12: Time series with apache cassandra strata](https://reader034.vdocuments.net/reader034/viewer/2022042606/54b6ca784a7959772b8b4587/html5/thumbnails/12.jpg)
Query patterns• Range queries • “Slice” operation on disk
SELECT temperature FROM event_time,temperature WHERE weatherstation_id='1234ABCD' AND event_time > '2013-04-03 07:01:00' AND event_time < '2013-04-03 07:04:00';
2013-04-03 07:01:00
72F
2013-04-03 07:02:00
73F
2013-04-03 07:03:00
73F
1234ABCD
2013-04-03 07:04:00
74F
weatherstation_id event_time temperature
1234ABCD
1234ABCD
1234ABCD
Programmers like this
Sorted by event_time
![Page 13: Time series with apache cassandra strata](https://reader034.vdocuments.net/reader034/viewer/2022042606/54b6ca784a7959772b8b4587/html5/thumbnails/13.jpg)
Ingestion models• Apache Kafka • Apache Flume • Storm • Custom Applications
Apache Kafka
Your totally!killer!application
![Page 14: Time series with apache cassandra strata](https://reader034.vdocuments.net/reader034/viewer/2022042606/54b6ca784a7959772b8b4587/html5/thumbnails/14.jpg)
Dealing with data at speed• 1 million writes per second? • 1 insert every microsecond • Collisions?
• Primary Key determines node placement • Random partitioning • Special data type - TimeUUID
Your totally!killer!application weatherstation_id='1234ABCD'
weatherstation_id='5678EFGH'
![Page 15: Time series with apache cassandra strata](https://reader034.vdocuments.net/reader034/viewer/2022042606/54b6ca784a7959772b8b4587/html5/thumbnails/15.jpg)
TimeUUID
• Also known as a Version 1 UUID • Sortable • Reversible
Timestamp to Microsecond + UUID = TimeUUID
04d580b0-9412-11e3-baa8-0800200c9a66 Wednesday, February 12, 2014 6:18:06 PM GMT
http://www.famkruithof.net/uuid/uuidgen
=
![Page 16: Time series with apache cassandra strata](https://reader034.vdocuments.net/reader034/viewer/2022042606/54b6ca784a7959772b8b4587/html5/thumbnails/16.jpg)
Way more information
• 5 minute interviews • Use cases • Free training!
!www.planetcassandra.org
![Page 17: Time series with apache cassandra strata](https://reader034.vdocuments.net/reader034/viewer/2022042606/54b6ca784a7959772b8b4587/html5/thumbnails/17.jpg)
Thank You!
Follow me for more updates all the time: @PatrickMcFadin