timeli: believing cassandra: our big-data journey to enlightenment under the cassandra paradigm

45
Time Believing Cassandra Timeli.io’s Big-Data journey to enlightenment under the C* Paradigm Keith Nordstrom, PhD. CTO and Co-Founder, Timeli.io

Upload: datastax-academy

Post on 14-Apr-2017

443 views

Category:

Technology


0 download

TRANSCRIPT

Page 1: Timeli: Believing Cassandra: Our Big-Data Journey To Enlightenment under the Cassandra Paradigm

Time

Believing Cassandra Timeli.io’s Big-Data journey to enlightenment under the C* Paradigm

Keith Nordstrom, PhD. CTO and Co-Founder, Timeli.io

Page 2: Timeli: Believing Cassandra: Our Big-Data Journey To Enlightenment under the Cassandra Paradigm

2  

Company §  Founded in 2013 §  Based in Boulder, CO & Sunnyvale, CA

Product/Business

§  Predictive asset analytics solutions §  Operational applications for connected equipment

Technology Platform §  Time series data and analytics platform §  Proprietary time series data processing layer §  Leverages “best of breed” open source software

Industry Verticals §  Oil & Gas §  Manufacturing §  Utilities – Electric, Gas & Water

Company Overview

Page 3: Timeli: Believing Cassandra: Our Big-Data Journey To Enlightenment under the Cassandra Paradigm

Who are we to talk?

²  Time Series data ingestion engine, platform, predictive analytics ²  Validation, Estimation, Regularization ²  Aggregations (ie. Coarse Graining) ²  Based on Utilities software started in Europe in 2009 ²  Added Cassandra to stack in 2011

Timeli.io

I started in late 2013 and discovered quickly something they had missed:

Cassandra can be hard to do right

Page 4: Timeli: Believing Cassandra: Our Big-Data Journey To Enlightenment under the Cassandra Paradigm

Timeli Architecture

Page 5: Timeli: Believing Cassandra: Our Big-Data Journey To Enlightenment under the Cassandra Paradigm

But first …

Cassandra: ²  Sister to Helen of Troy ² More beautiful, more sought after, wiser ²  Even the gods themselves ²  Promised a wild night to Apollo for power of prophecy ²  Reneged ²  Apollo left her with prophecy, but made it so nobody

believed her

… a minor cultural digression

Moral: Cassandra accurately predicted the Fall of Troy.

Page 6: Timeli: Believing Cassandra: Our Big-Data Journey To Enlightenment under the Cassandra Paradigm

Just like Cassandra of legend … … real-life Cassandra difficult to “believe”

²  Selects designed beforehand ²  Denormalization ² Many arcane configuration options ²  Hard to find expertise ²  Based on “tables” but not tabular ²  CQL looks like SQL. It’s not SQL.

“No indexed columns present in by-columns clause with Equal operator” “ORDER BY is only supported when the partition key is restricted by an EQ or an IN” “PRIMARY KEY column ‘timestamp’ cannot be restricted” “Cannot execute this query as it might involve data filtering and thus may have unpredictable performance.”

Page 7: Timeli: Believing Cassandra: Our Big-Data Journey To Enlightenment under the Cassandra Paradigm

What did this mean for Timeli? Example: Timeli ingests data, writes to raw, writes to processed, then coarse grains 1 or more series into “aggregations.”

Page 8: Timeli: Believing Cassandra: Our Big-Data Journey To Enlightenment under the Cassandra Paradigm

Multiple very competent RDBMS/Java/JPA architects built a time series app where the following could not be done:

SELECT * FROM aggregations where meter_id=4bbedd76-4e9e-11e5-885d-feff819cdc9f AND timestamp > 2013-01-01 AND timestamp < 2013-03-01;

Early Warning Aggregations, the primary product:

“It’s a security feature! You have to know when your data exists to get your data!”

Cassandra isn’t crazy

Page 9: Timeli: Believing Cassandra: Our Big-Data Journey To Enlightenment under the Cassandra Paradigm

New Beginnings Out of all of this, Timeli was born

What did we change?

1.  Partitioner 2.  Primary Keys and Row Keys 3.  Performance/Missing data in Collection types 4.  Batching for “Performance” 5.  Double Precision vs. BigDecimal 6.  QueryBuilder vs Prepared Statements 7.  Row Limits

Page 10: Timeli: Believing Cassandra: Our Big-Data Journey To Enlightenment under the Cassandra Paradigm

1. The Partitioner What is a partitioner in Cassandra?

Data

Cassandra Ring

²  Byte Ordered Partitioner ²  Random Partitioner ² Mumur3 Partitioner

Three Types:

B … S … S … S … Z … T …

Page 11: Timeli: Believing Cassandra: Our Big-Data Journey To Enlightenment under the Cassandra Paradigm

1. The Partitioner What is a partitioner in Cassandra?

S … S … S … T …

B …

Data

Z …

Cassandra Ring

²  Byte Ordered Partitioner ²  Random Partitioner ² Mumur3 Partitioner

Three Types:

Page 12: Timeli: Believing Cassandra: Our Big-Data Journey To Enlightenment under the Cassandra Paradigm

1. The Partitioner What is a partitioner in Cassandra?

S … T …

B … S …

Data

Z … S …

Cassandra Ring

²  Byte Ordered Partitioner ²  Random Partitioner ² Mumur3 Partitioner

Three Types:

Murmur3 is a random partitioner as well but faster

Page 13: Timeli: Believing Cassandra: Our Big-Data Journey To Enlightenment under the Cassandra Paradigm

1. The Partitioner What is a partitioner in Cassandra?

S … T …

B … S …

Data

Z … S …

Cassandra Ring

²  Byte Ordered Partitioner ²  Random Partitioner ² Mumur3 Partitioner

Three Types:

² Our partition keys were of form {UUID}|{string key} ²  UUID 1s are uniformly distributed but keys are not ²  ByteOrderedPartitioner left big gaps:

> nodetool status ts Datacenter: us-central1 ======================= Status=Up/Down |/ State=Normal/Leaving/Joining/Moving -- Address Load Tokens Owns (effective) Host ID Rack UN 10.82.79.110 4.27 GB 256 44.9% 29d0a723-fc1f-4f73-a864-97dc6df045f5 b UN 10.105.185.1 2.51 GB 256 26.4% 1d236bd9-5fb1-4423-83bc-168bac924db4 b UN 10.234.92.2 2.73 GB 256 28.7% 29e1358a-bef2-495e-80bc-3de4c4499790 b

Page 14: Timeli: Believing Cassandra: Our Big-Data Journey To Enlightenment under the Cassandra Paradigm

1. The Partitioner

Moral: Read the manual. Odds are you won’t think of consequences on your own.

Page 15: Timeli: Believing Cassandra: Our Big-Data Journey To Enlightenment under the Cassandra Paradigm

2. Primary Keys and Row Keys Aggregation Table

A coarse graining of a time series into measures on buckets of larger size than original time resolution

0

5

10 T1

T2

T3

T4

T5

T6

T7

T8

T9

T1

0 T1

1 T1

2 T1

3 T1

4 T1

5 T1

6 T1

7 T1

8 T1

9 T2

0 T2

1 T2

2 T2

3 T2

4

Original

Original

0

5

10

T1 T8 T9 T16 T24 T24

8-Hour Mean 8-Hour Max 8-Hour Min

Page 16: Timeli: Believing Cassandra: Our Big-Data Journey To Enlightenment under the Cassandra Paradigm

2. Primary Keys and Row Keys Original Persistence Model

Aggrega&on_ID   Index   Period   Count   Sum   Average   Max   Min   Measurements  

UUID   Long   DateTime   Long   Double   Double   Double   Double   Map<DateTime,  Double>  

²  Aggregation_ID: UUID/ identifier associated with aggregation metadata ²  Period: DateTime of start of aggregation ²  Index: Offset from DateTime of fixed aggregation bucket ²  Count, Sum, Average, Max, Min: values of aggregation on the bucket ² Measurements: map of all measurements included in the system

PRIMARY KEY (Aggregation_ID, Index)

Page 17: Timeli: Believing Cassandra: Our Big-Data Journey To Enlightenment under the Cassandra Paradigm

2. Primary Keys and Row Keys Original Persistence Model, Storage Representation

Aggregation_ID

Index 1

Index 2

Index 3

Period, Count, etc.

Period, Count, etc.

Period, Count, etc.

Index N

Period, Count, etc.

²  SELECT * FROM aggregations WHERE Aggregation_ID = bdb8330e-6f02-457f-8eb7-553b4db86287 ✔

Page 18: Timeli: Believing Cassandra: Our Big-Data Journey To Enlightenment under the Cassandra Paradigm

2. Primary Keys and Row Keys Original Persistence Model, Storage Representation

Aggregation_ID

Index 1

Index 2

Index 3

Period, Count, etc.

Period, Count, etc.

Period, Count, etc.

Index N

Period, Count, etc.

²  SELECT * FROM aggregations WHERE Aggregation_ID = bdb8330e-6f02-457f-8eb7-553b4db86287 ²  SELECT * FROM aggregations WHERE Aggregation_ID = bdb8330e-6f02-457f-8eb7-553b4db86287

AND Index = 1

✔ ✔

Page 19: Timeli: Believing Cassandra: Our Big-Data Journey To Enlightenment under the Cassandra Paradigm

2. Primary Keys and Row Keys Original Persistence Model, Storage Representation

Aggregation_ID

Index 1

Index 2

Index 3

Period, Count, etc.

Period, Count, etc.

Period, Count, etc.

Index N

Period, Count, etc.

²  SELECT * FROM aggregations WHERE Aggregation_ID = bdb8330e-6f02-457f-8eb7-553b4db86287 ²  SELECT * FROM aggregations WHERE Aggregation_ID = bdb8330e-6f02-457f-8eb7-553b4db86287

AND Index = 1 ²  SELECT * FROM aggregations WHERE Aggregation_ID = bdb8330e-6f02-457f-8eb7-553b4db86287

AND Index >= 1 AND Index < 3

✔ ✔

Page 20: Timeli: Believing Cassandra: Our Big-Data Journey To Enlightenment under the Cassandra Paradigm

2. Primary Keys and Row Keys Original Persistence Model, Storage Representation

Aggregation_ID

Index 1

Index 2

Index 3

Period, Count, etc.

Period, Count, etc.

Period, Count, etc.

Index N

Period, Count, etc.

²  SELECT * FROM aggregations WHERE Aggregation_ID = bdb8330e-6f02-457f-8eb7-553b4db86287 ²  SELECT * FROM aggregations WHERE Aggregation_ID = bdb8330e-6f02-457f-8eb7-553b4db86287

AND Index = 1 ²  SELECT * FROM aggregations WHERE Aggregation_ID = bdb8330e-6f02-457f-8eb7-553b4db86287

AND Index >= 1 AND Index < 3 ²  SELECT * FROM aggregations WHERE Aggregation_ID = bdb8330e-6f02-457f-8eb7-553b4db86287

AND Index = 1 AND Period > 2015-01-01 AND Period < 2015-02-01

✔ ✔

Page 21: Timeli: Believing Cassandra: Our Big-Data Journey To Enlightenment under the Cassandra Paradigm

2. Primary Keys and Row Keys Fixed Persistence Model

Aggrega&on_ID   StartDate   Count   Sum   Average   Max   Min   Measurements  

UUID   Timestamp   Long   Double   Double   Double   Double   Map<Timestamp,  Double>  

PRIMARY KEY (Aggregation_ID, StartDate)

²  Index column not required ²  Primary key allows row key and clustering

Aggregation_ID

2015-01-01 2015-01-02 2015-01-03 …

Count, etc.

Count, etc.

Count, etc.

2015-12-31

Count, etc.

²  SELECT * FROM aggregations WHERE Aggregation_ID = bdb8330e-6f02-457f-8eb7-553b4db86287 AND Index = 1 AND Period > 2015-01-01 AND Period < 2015-02-01

2015-01-31

Count, etc.

Page 22: Timeli: Believing Cassandra: Our Big-Data Journey To Enlightenment under the Cassandra Paradigm

2. Primary Keys and Row Keys

Moral: Consider which queries you need to make and design around them

Page 23: Timeli: Believing Cassandra: Our Big-Data Journey To Enlightenment under the Cassandra Paradigm

3. Performance/Missing data in Collection types Collections in C*

²  C*: supposed to denormalize data ² Measurements arriving to be included in aggregation ²  How to be sure they’re included? ²  Keep copy

Rationale

Aggrega&on_ID   StartDate   Count   Sum   Average   Max   Min   Measurements  

UUID   Timestamp   Long   Double   Double   Double   Double   Map<Timestamp,  Double>  

Page 24: Timeli: Believing Cassandra: Our Big-Data Journey To Enlightenment under the Cassandra Paradigm

3. Performance/Missing data in Collection types Collections in C*

²  C*: supposed to denormalize data ² Measurements arriving to be included in aggregation ²  How to be sure they’re included? ²  Keep copy

Rationale

Downsides ²  Lots of storage space – do we really need value? ²  In < 2.1, performance implications (serialization) ²  All values returned ²  64K limit! modulus => missing data

Aggrega&on_ID   StartDate   Count   Sum   Average   Max   Min   Measurements  

UUID   Timestamp   Long   Double   Double   Double   Double   Map<Timestamp,  Double>  

Page 25: Timeli: Believing Cassandra: Our Big-Data Journey To Enlightenment under the Cassandra Paradigm

3. Performance/Missing data in Collection types Collections in C*

Aggrega&on_ID   StartDate   Count   Sum   Average   Max   Min   Measurements  

UUID   Timestamp   Long   Double   Double   Double   Double   Blob  

²  Know start date ²  Know all measurement timestamps in processed data ²  Keep a bit for each

Solution

2015-­‐01-­‐01T00:00   2015-­‐01-­‐01T00:01   2015-­‐01-­‐01T00:02   2015-­‐01-­‐01T00:03   2015-­‐01-­‐01T00:04   2015-­‐01-­‐01T00:05   2015-­‐01-­‐01T00:06  

1   0   1   1   0   1   1  

Bitwise Verifier One minute expected timestamps, 6 minute aggregations. 2 still missing below:

Page 26: Timeli: Believing Cassandra: Our Big-Data Journey To Enlightenment under the Cassandra Paradigm

3. Performance/Missing data in Collection types

Moral: limits in Cassandra are important, not always enforced, and have consequences

Page 27: Timeli: Believing Cassandra: Our Big-Data Journey To Enlightenment under the Cassandra Paradigm

4. Batching for “Performance”

Slave

Master

Slave

Application Server

Traditional Master/Slave model

Write data

²  App server writes to remote DB ²  Across network ²  Latency! Many writes => N x 200ms ²  Solution: batch multiple commands to save

~200ms ~1-10ms

~1-10ms

Single data center

Page 28: Timeli: Believing Cassandra: Our Big-Data Journey To Enlightenment under the Cassandra Paradigm

4. Batching for “Performance”

Peer B

Peer A

Peer C

Application Server

Peers model with atomicity

Write data

²  Batches are atomic ²  CAP: can either lock DB across all nodes or perform on just one and publish ²  Cassandra chooses latter (fast writes) ²  => Batches with large numbers of writes all execute on A ²  => 1/3 the processing power

~200ms ~1-10ms

~1-10ms

Single data center

Page 29: Timeli: Believing Cassandra: Our Big-Data Journey To Enlightenment under the Cassandra Paradigm

4. Batching for “Performance”

Moral: don’t batch for speed

Page 30: Timeli: Believing Cassandra: Our Big-Data Journey To Enlightenment under the Cassandra Paradigm

5. Double Precision vs. BigDecimal

²  double a = Math.round(1.14 * 75); // round 85.5 represented as 85.4999, gets 85

²  float 10.0/3; // = 3.3333333333333335; ²  for (float f = 10f; f!=0; f-=0.1) { System.out.println(f); } ²  double x = .37; //.370000004 or .36999999998 or …

Java has some quirks with floating point representations

What do the following have in common?

Page 31: Timeli: Believing Cassandra: Our Big-Data Journey To Enlightenment under the Cassandra Paradigm

5. Double Precision vs. BigDecimal The model so far

Aggrega&on_ID   StartDate   Count  

Sum   Average   Max   Min   Measurements  

UUID   Timestamp   Long   Double   Double   Double   Double   Blob  

²  Cassandra written in Java ²  Java has floating point errors ² Our aggregated values are leaking!

Aggrega&on_ID   StartDate   Count   Measures   Measurements  

UUID   Timestamp   Long   Map<String,  BigDecimal>   Blob  

For good measure …

² Wrapped our measures in a Map for flexibility (add new measures on fly)

Page 32: Timeli: Believing Cassandra: Our Big-Data Journey To Enlightenment under the Cassandra Paradigm

5. Double Precision vs. BigDecimal

Moral: Law of Leaky Abstractions (a Java app is a Java app)

Bonus moral: use C* collections for good, not evil

Page 33: Timeli: Believing Cassandra: Our Big-Data Journey To Enlightenment under the Cassandra Paradigm

6. QueryBuilder vs Prepared Statements

CQL Driver in Java allows various types of statements

1.  Regular Statement 2.  Prepared Statement

Regular Statement:

²  Convenient ²  Readable ² QueryBuilder to help build ²  Tempting!

Page 34: Timeli: Believing Cassandra: Our Big-Data Journey To Enlightenment under the Cassandra Paradigm

6. QueryBuilder vs Prepared Statements

QueryBuilder.select().all() .from("table") .where(QueryBuilder.eq(“partition_key”,5))

App Server

Cassandra Cluster

Query Schematic (Regular Statement)

ResultSet

Page 35: Timeli: Believing Cassandra: Our Big-Data Journey To Enlightenment under the Cassandra Paradigm

6. QueryBuilder vs Prepared Statements

Problem: Regular Statements are a lot of bytes!

Bound Statements

²  Register with C* cluster ²  Text of statement sent once with placeholders ²  Subsequent requests are a key and params ²  Avoids transfer costs

Page 36: Timeli: Believing Cassandra: Our Big-Data Journey To Enlightenment under the Cassandra Paradigm

6. QueryBuilder vs Prepared Statements

App Server

Cassandra Cluster

Query Schematic (Bound Statement)

ResultSet

“select * from table where partition_key = ?” 5

Page 37: Timeli: Believing Cassandra: Our Big-Data Journey To Enlightenment under the Cassandra Paradigm

6. QueryBuilder vs Prepared Statements

Moral: Caching is your friend. Cache queries on C*, particularly ones being done many times.

Page 38: Timeli: Believing Cassandra: Our Big-Data Journey To Enlightenment under the Cassandra Paradigm

7. Row Limits The model so far: “Wide Rows”

²  Unique ID for partition ²  StartDate clustering key allows ranged ²  Count of measurements included ² Map of measures with precise storage ²  Binary representation of measurements included

Aggrega&on_ID   StartDate   Count   Measures   Measurements  

UUID   Timestamp   Long   Map<String,  BigDecimal>   Blob  

Page 39: Timeli: Believing Cassandra: Our Big-Data Journey To Enlightenment under the Cassandra Paradigm

7. Row Limits

²  Cassandra row limit => 2 billion items per row ²  Best results (Ebay) “a few hundred million per row” (~500 mil)

Practical storage limits

How much time does this represent? Time  Resolu&on   500  million  &mestamps  

1  day   ~1.37  E  6  years  

1  hour   57,077  years  

1  minute   951  years  

1  second   15.85  years  

1  millisecond   5.78  days  

Page 40: Timeli: Believing Cassandra: Our Big-Data Journey To Enlightenment under the Cassandra Paradigm

7. Row Limits

No business case has yet used aggregations on less than 1 min

For aggregations we’re probably fine

But we collect raw/processed measurements as well At millisecond resolution, <6 days not ok

Can constrain row size using compound PK ²  Have resolution on channel, Rc (milliseconds) ²  Have number of items in row K (eg. 500m) ²  Get a baseline on epoch (Jan 1, 1970 12:00AM) ²  => The Batch index can be calculated

double batchInd = Math.floor(date.getMillis()/ K * Rc)

Page 41: Timeli: Believing Cassandra: Our Big-Data Journey To Enlightenment under the Cassandra Paradigm

7. Row Limits Leads to model

Aggrega&on_ID   BatchIndex   StartDate   Count   Measures   Measurements  

UUID   Int   Timestamp   Long   Map<String,  BigDecimal>   Blob  

CREATE TABLE aggregations Aggregation_ID varchar, BatchIndex int, StartDate timestamp, Count long, Measures map<string, blob>, Measurements blob, PRIMARY KEY ((Aggregation_ID, BatchIndex), StartDate)

CQL:

Page 42: Timeli: Believing Cassandra: Our Big-Data Journey To Enlightenment under the Cassandra Paradigm

7. Row Limits

Moral: Haven’t we had enough morals for one story?

Page 43: Timeli: Believing Cassandra: Our Big-Data Journey To Enlightenment under the Cassandra Paradigm

Wrapup

Aggrega&on_ID   Index   Period   Count   Sum   Average   Max   Min   Measurements  

UUID   Long   Timestamp   Long   Double   Double   Double   Double   Map<Timestamp,  Double>  

Final model

Initial model

Aggrega&on_ID   BatchIndex   StartDate   Count   Measures   Measurements  

UUID   Int   Timestamp   Long   Map<String,  BigDecimal>   Blob  

1.  Couldn’t do ranged queries in time 2.  Ran out of space in measurement map 3.  Columnar approach to measures => less flexibility 4.  Rows not very wide

Evolution

Page 44: Timeli: Believing Cassandra: Our Big-Data Journey To Enlightenment under the Cassandra Paradigm

Wrapup Lessons Learned

1.  Read the manual. Partitioners are important. Other configuration options as well. 2.  Consider which queries you need to make and design around them. 3.  Limits in Cassandra are important, not always enforced, and have consequences.

Exceeding collection limits will lose you data. 4.  Don’t batch for speed, only for atomicity. 5.  C* is a Java app and subject to floating point errors 6.  C* collections are useful for avoiding multitable queries without joins. 7.  Cache queries on C* using Prepared/Bound statments, particularly ones being done

many times. 8.  Pay attention to row limits

Page 45: Timeli: Believing Cassandra: Our Big-Data Journey To Enlightenment under the Cassandra Paradigm

Wrapup By understanding Cassandra (& how she differs from SQL), we avoid our servers (our business) meeting this fate. Sorry Brad.