need for time series database
TRANSCRIPT
![Page 1: Need for Time series Database](https://reader036.vdocuments.net/reader036/viewer/2022062522/587f4eb91a28ab0d378b4d25/html5/thumbnails/1.jpg)
Need For Time Series Database
Pramit Choudhary, ML Engineer @eHarmony
![Page 2: Need for Time series Database](https://reader036.vdocuments.net/reader036/viewer/2022062522/587f4eb91a28ab0d378b4d25/html5/thumbnails/2.jpg)
MotivationSpeed Matters
We want to know, what’s happening NOWUser accessing data through different mobile platform, no patience
Data is scattered aroundMongoDb, Voldemort, Netezza, Hive, Whisper, may be moreFor cross platform analytical work, data is still moved around ( cause of worry )Need for simplifying the Database Tech StackIncrease in complexity as we start tracking more metrics in-regards to Mobile devices
Data-Analytics Use-cases:Most of the time we study data pattern over a period of time
e.g. 1. What are probable times for the user to get matches ? => need to start tracking the amount of time user spends during the day 2. Feature exploration and extraction: What other features could we possibly use ? => more t/f/z/p statistics tests probably ?
![Page 3: Need for Time series Database](https://reader036.vdocuments.net/reader036/viewer/2022062522/587f4eb91a28ab0d378b4d25/html5/thumbnails/3.jpg)
Re-CAPConsistency: Data remains consistent after the execution of an operation. E.g. Post update all client have the same state of the data.
Availability: Always on ( no downtime)
Partition Tolerance: System continues to function even with no communication with one another
![Page 4: Need for Time series Database](https://reader036.vdocuments.net/reader036/viewer/2022062522/587f4eb91a28ab0d378b4d25/html5/thumbnails/4.jpg)
Different CombinationsCA : Single Cite cluster, all nodes are always in
contact. e.g. SQL type RDMS
CP : Some data may not be accessible, but the rest is consistent and accurate e.g. MongoDB, HBase, Redis
AP : Available under partitioning, but no guarantee on consistency e.g. Cassandra, Riak, DynamoDb
![Page 5: Need for Time series Database](https://reader036.vdocuments.net/reader036/viewer/2022062522/587f4eb91a28ab0d378b4d25/html5/thumbnails/5.jpg)
No SQL World• Key-Value Store (Redis, Riak)
• Document Store (MongoDB, Couchbase)
• Column Store (Cassandra, Hbase, OpenTSDB)
• Graph Store (Neo4j, Node.js)
![Page 6: Need for Time series Database](https://reader036.vdocuments.net/reader036/viewer/2022062522/587f4eb91a28ab0d378b4d25/html5/thumbnails/6.jpg)
Introducing a new DB
OpenTSDBAuthor: Benoit Sigoure @ StumbleUpon
![Page 7: Need for Time series Database](https://reader036.vdocuments.net/reader036/viewer/2022062522/587f4eb91a28ab0d378b4d25/html5/thumbnails/7.jpg)
What is OpenTSDB?
Open Source Time Series Database
Store trillions of data points
Sucks up all data and keeps going
Never loses precision
Scales using HBase
Note: Using this as an example, better results with KairosDB or InfluxDB. They work on similar principles.
Author: Benoit Sigoure and Chris Larsen
![Page 8: Need for Time series Database](https://reader036.vdocuments.net/reader036/viewer/2022062522/587f4eb91a28ab0d378b4d25/html5/thumbnails/8.jpg)
Use-CasesMongoDB and Couchbase : user profiles, product catalogs, geospatial, financial products, social media, digital content, gaming, metadata, events, bills and invoices
Hbase and Cassandra : Structured, semi-structured, unstructured data, full table scans, read, intensive operations, time series interval data, geospatial data
![Page 9: Need for Time series Database](https://reader036.vdocuments.net/reader036/viewer/2022062522/587f4eb91a28ab0d378b4d25/html5/thumbnails/9.jpg)
Other Options
Author: Oliver Hankeln
![Page 10: Need for Time series Database](https://reader036.vdocuments.net/reader036/viewer/2022062522/587f4eb91a28ab0d378b4d25/html5/thumbnails/10.jpg)
What are Time Series?
Time Series: Data points for an identity over time Typical Identity:
Dotted string: web01.sys.cpu.user.0 ( no concept of filters )
OpenTSDB Identity: Metric: sys.cpu.userTags (name/value pairs): act as filters
host=web01 cpu=0
Author: Benoit Sigoure and Chris Larsen
![Page 11: Need for Time series Database](https://reader036.vdocuments.net/reader036/viewer/2022062522/587f4eb91a28ab0d378b4d25/html5/thumbnails/11.jpg)
What are Time Series?
Data Point:
Metric + Tags
+ Value: 42
+ Timestamp: 123
„ sys.cpu.user 1234567890 42 host=web01 cpu=0 „
Author: Benoit Sigoure and Chris Larsen Metric Name
Timestamp
Metric value
Filter1
Filter2
![Page 12: Need for Time series Database](https://reader036.vdocuments.net/reader036/viewer/2022062522/587f4eb91a28ab0d378b4d25/html5/thumbnails/12.jpg)
Architecture
Author: Benoit Sigoure and Chris Larsen
![Page 13: Need for Time series Database](https://reader036.vdocuments.net/reader036/viewer/2022062522/587f4eb91a28ab0d378b4d25/html5/thumbnails/13.jpg)
Another View
Author: slideshare
![Page 14: Need for Time series Database](https://reader036.vdocuments.net/reader036/viewer/2022062522/587f4eb91a28ab0d378b4d25/html5/thumbnails/14.jpg)
About TSDsWrite throughput
Are CPU boundedWorst Case: Can handle 2000 points/sec on an old 2006 dual core CPU
Read throughputDepends on the cardinality of a metricTimespan and number of data points retrieved
ReliabilityNo single point of failure no concept of master daemonDependency, needs HBase with zookeeperHas single point of failure if running over HDFS, but none with respect to database.
More info on the Wiki : http://opentsdb.net/faq.html
![Page 15: Need for Time series Database](https://reader036.vdocuments.net/reader036/viewer/2022062522/587f4eb91a28ab0d378b4d25/html5/thumbnails/15.jpg)
Simplistic View of the Table
Without OpenTSDB Hbase Table Representation
Author: Oliver Hankeln
![Page 16: Need for Time series Database](https://reader036.vdocuments.net/reader036/viewer/2022062522/587f4eb91a28ab0d378b4d25/html5/thumbnails/16.jpg)
OpenTSDB Magic“Compact columns by concatenation “
Author: Oliver Hankeln
• Tags are put at the end of the row key• Timestamp is normalized on 1hr boundaries
![Page 17: Need for Time series Database](https://reader036.vdocuments.net/reader036/viewer/2022062522/587f4eb91a28ab0d378b4d25/html5/thumbnails/17.jpg)
Row Key Size
Author: Oliver Hankeln
![Page 18: Need for Time series Database](https://reader036.vdocuments.net/reader036/viewer/2022062522/587f4eb91a28ab0d378b4d25/html5/thumbnails/18.jpg)
BenchMarksLoad Phase
![Page 19: Need for Time series Database](https://reader036.vdocuments.net/reader036/viewer/2022062522/587f4eb91a28ab0d378b4d25/html5/thumbnails/19.jpg)
Heavy Read
![Page 20: Need for Time series Database](https://reader036.vdocuments.net/reader036/viewer/2022062522/587f4eb91a28ab0d378b4d25/html5/thumbnails/20.jpg)
Heavy Read
![Page 21: Need for Time series Database](https://reader036.vdocuments.net/reader036/viewer/2022062522/587f4eb91a28ab0d378b4d25/html5/thumbnails/21.jpg)
Heavy Range Scan
![Page 22: Need for Time series Database](https://reader036.vdocuments.net/reader036/viewer/2022062522/587f4eb91a28ab0d378b4d25/html5/thumbnails/22.jpg)
Heavy Inserts
![Page 23: Need for Time series Database](https://reader036.vdocuments.net/reader036/viewer/2022062522/587f4eb91a28ab0d378b4d25/html5/thumbnails/23.jpg)
Is it being extensively used?
OVH: #3 largest cloud/hosting provider : Monitor everything includes network performance, resource utilization, application performance, customer facing metric
35 servers, 100k writes/s, 25tb raw data5 day moving window of Hbase snapshotRedis cache on top for customer facing data
![Page 24: Need for Time series Database](https://reader036.vdocuments.net/reader036/viewer/2022062522/587f4eb91a28ab0d378b4d25/html5/thumbnails/24.jpg)
Yahoo: Monitoring application performance and statistics ( 15 servers, 280k writes/s
Arista Networks: High performance network monitoring
5k writes/s uses varnish for caching
MapR
“OpenTSDB is a widely used database intended to store and analyze time-series data. Originally designed for only data center monitoring, poor ingest performance had limited the expansion of its use. This benchmark demonstrates a viable option for new applications, such as IoT and other real-time data-analysis applications, using OpenTSDB running on MapR. “ Ted Dunning, Chief Application Architect
![Page 25: Need for Time series Database](https://reader036.vdocuments.net/reader036/viewer/2022062522/587f4eb91a28ab0d378b4d25/html5/thumbnails/25.jpg)
Others
![Page 26: Need for Time series Database](https://reader036.vdocuments.net/reader036/viewer/2022062522/587f4eb91a28ab0d378b4d25/html5/thumbnails/26.jpg)
Some ReferencesBook: TimeSeries Database – Ted Dunning and Ellen Friedman ( https://www.dropbox.com/s/c1zj0l0q0qmfvo8/Time_Series_Databases.pdf?dl=0 )
Benchmarks: https://www.dropbox.com/s/g67yoxwabwb5s0g/PerformanceBenchMark.pdf?dl=0
Lessons learned: http://www.slideshare.net/cloudera/4-opentsdb-hbasecon
Some Comparisons: http://prometheus.io/docs/introduction/comparison/
![Page 27: Need for Time series Database](https://reader036.vdocuments.net/reader036/viewer/2022062522/587f4eb91a28ab0d378b4d25/html5/thumbnails/27.jpg)
Demo
![Page 28: Need for Time series Database](https://reader036.vdocuments.net/reader036/viewer/2022062522/587f4eb91a28ab0d378b4d25/html5/thumbnails/28.jpg)
Questions?