webinar: data modeling and shortcuts to success in scaling time series applications
TRANSCRIPT
![Page 1: Webinar: Data Modeling and Shortcuts to Success in Scaling Time Series Applications](https://reader031.vdocuments.net/reader031/viewer/2022030213/589ad2f21a28abc93a8b58f5/html5/thumbnails/1.jpg)
Basho Technologies | 1
Scaling Time Series Applications
BashoDorothy Pults – Product Evangelist @deepultsTom Sigler – Solution Architect @tom_sigler
DatabricksPeyman Mohajerian - Solution Architect @mohajeri
![Page 2: Webinar: Data Modeling and Shortcuts to Success in Scaling Time Series Applications](https://reader031.vdocuments.net/reader031/viewer/2022030213/589ad2f21a28abc93a8b58f5/html5/thumbnails/2.jpg)
Basho Technologies | 2
BASHO TECHNOLOGIESDistributed Systems Software for Big Data and IoT applications
2011 - Creators of Riak• Riak KV: NoSQL Key Value database• Riak TS: NoSQL Time Series database• Integrations: Spark, Redis caching, Solr, Mesos, Riak S2
120+ employees
Global Offices • Seattle (HQ), Washington DC, London, Paris
1/3 of the Fortune 50
![Page 3: Webinar: Data Modeling and Shortcuts to Success in Scaling Time Series Applications](https://reader031.vdocuments.net/reader031/viewer/2022030213/589ad2f21a28abc93a8b58f5/html5/thumbnails/3.jpg)
Basho Technologies | 3
$1.3 Trillion market spend Internet of Things in 2019
30 Billion Installed base of IoT endpoints in 2020
*Source IDC
![Page 4: Webinar: Data Modeling and Shortcuts to Success in Scaling Time Series Applications](https://reader031.vdocuments.net/reader031/viewer/2022030213/589ad2f21a28abc93a8b58f5/html5/thumbnails/4.jpg)
Basho Technologies | 4
56% have integrated IOT data
IoT is 24% of the average IT budget
20% decrease in downtime
21% increase in revenue
*Vodafone IOT Barometer
![Page 5: Webinar: Data Modeling and Shortcuts to Success in Scaling Time Series Applications](https://reader031.vdocuments.net/reader031/viewer/2022030213/589ad2f21a28abc93a8b58f5/html5/thumbnails/5.jpg)
Basho Technologies | 55
CRITICAL SUCCESS FACTORS FOR IOT
• Explore new business models
• Address Key IoT challenges like Edge Analytics
• Provide comprehensive solutions
• Engage with a broader ecosystem
![Page 6: Webinar: Data Modeling and Shortcuts to Success in Scaling Time Series Applications](https://reader031.vdocuments.net/reader031/viewer/2022030213/589ad2f21a28abc93a8b58f5/html5/thumbnails/6.jpg)
Basho Technologies | 66
100TB DAILY – IOT AND WEATHER DATA
530M personal weather stations reports each day
9M webcam uploads
2M crowd reports
> 20M IoT barometric reports
![Page 7: Webinar: Data Modeling and Shortcuts to Success in Scaling Time Series Applications](https://reader031.vdocuments.net/reader031/viewer/2022030213/589ad2f21a28abc93a8b58f5/html5/thumbnails/7.jpg)
Basho Technologies | 7
WEATHER FORECAST PREDICTS SALES
Ideal BERRY purchasing weather turns out to be low wind with temperatures below 80 degrees.
People are more likely to eat STEAK when it's warm out with higher winds but no rain, but not if it gets too hot.
![Page 8: Webinar: Data Modeling and Shortcuts to Success in Scaling Time Series Applications](https://reader031.vdocuments.net/reader031/viewer/2022030213/589ad2f21a28abc93a8b58f5/html5/thumbnails/8.jpg)
Basho Technologies | 88
EDGE ANALYTICS
• Edge Analytics
• Fog Computing
• Inverted Web
• Reverse CDN
![Page 9: Webinar: Data Modeling and Shortcuts to Success in Scaling Time Series Applications](https://reader031.vdocuments.net/reader031/viewer/2022030213/589ad2f21a28abc93a8b58f5/html5/thumbnails/9.jpg)
Basho Technologies | 99
NEW ECOSYSTEM – DATA PIPELINE
![Page 10: Webinar: Data Modeling and Shortcuts to Success in Scaling Time Series Applications](https://reader031.vdocuments.net/reader031/viewer/2022030213/589ad2f21a28abc93a8b58f5/html5/thumbnails/10.jpg)
Basho Technologies | 1010
WHAT’S NEEDED TO SCALE FOR IoT
• A database optimized for IoT data
• Review your data life cycle
• Summations and aggregation
• Data expiration
• Data cleansing
• Processing close to devices
• Scale for unstructured metadata
![Page 11: Webinar: Data Modeling and Shortcuts to Success in Scaling Time Series Applications](https://reader031.vdocuments.net/reader031/viewer/2022030213/589ad2f21a28abc93a8b58f5/html5/thumbnails/11.jpg)
Basho Technologies | 11
TIME SERIES (TS) DATA
• Consists of successive observations made over a time interval
• Structured• Time + State/Measurement • Metadata/Context• Frequency
![Page 12: Webinar: Data Modeling and Shortcuts to Success in Scaling Time Series Applications](https://reader031.vdocuments.net/reader031/viewer/2022030213/589ad2f21a28abc93a8b58f5/html5/thumbnails/12.jpg)
Basho Technologies | 12
TIME SERIES CHALLENGES AT SCALE
• Ingestion Velocity• Data Volume• Post Ingestion Workloads
– Real time– Batch
• Lifecycle/Expiry
![Page 13: Webinar: Data Modeling and Shortcuts to Success in Scaling Time Series Applications](https://reader031.vdocuments.net/reader031/viewer/2022030213/589ad2f21a28abc93a8b58f5/html5/thumbnails/13.jpg)
Basho Technologies | 13
Riak TS Overview & Architecture
![Page 14: Webinar: Data Modeling and Shortcuts to Success in Scaling Time Series Applications](https://reader031.vdocuments.net/reader031/viewer/2022030213/589ad2f21a28abc93a8b58f5/html5/thumbnails/14.jpg)
Basho Technologies | 14
WHAT IS RIAK TS?
Riak TS is a distributed NoSQL key/value store optimized for time series data.
It provides a time series database solution that is extensible and scalable.
Riak TS is derived from Riak KV and adds the ability to co-locate data by composite primary key, including quanta, for efficient sequential read i/o operations.
![Page 15: Webinar: Data Modeling and Shortcuts to Success in Scaling Time Series Applications](https://reader031.vdocuments.net/reader031/viewer/2022030213/589ad2f21a28abc93a8b58f5/html5/thumbnails/15.jpg)
Basho Technologies | 15
Why Riak TS?• Highly available• Fault Tolerant• Geo data locality• Scalability
– Operations– Real-time range query performance
15
![Page 16: Webinar: Data Modeling and Shortcuts to Success in Scaling Time Series Applications](https://reader031.vdocuments.net/reader031/viewer/2022030213/589ad2f21a28abc93a8b58f5/html5/thumbnails/16.jpg)
Basho Technologies | 16
RIAK TS MASTERLESS ARCHITECUTURE
Riak has a masterless architecture. Every node is: • homogenous• capable of serving all read and write requests• responsible for a subset of data
![Page 17: Webinar: Data Modeling and Shortcuts to Success in Scaling Time Series Applications](https://reader031.vdocuments.net/reader031/viewer/2022030213/589ad2f21a28abc93a8b58f5/html5/thumbnails/17.jpg)
Basho Technologies | 17
RIAK TS: DISTRIBUTION AND CO-LOCATION
• Variation of Dynamo• Composite key drives
grouping on disk– Partition Key– Local Key (sort)
![Page 18: Webinar: Data Modeling and Shortcuts to Success in Scaling Time Series Applications](https://reader031.vdocuments.net/reader031/viewer/2022030213/589ad2f21a28abc93a8b58f5/html5/thumbnails/18.jpg)
Basho Technologies | 18
RIAK: REPLICATION OF DATA
• Intra-cluster replication• Multi-cluster replication
put(“bucket/key”)
![Page 19: Webinar: Data Modeling and Shortcuts to Success in Scaling Time Series Applications](https://reader031.vdocuments.net/reader031/viewer/2022030213/589ad2f21a28abc93a8b58f5/html5/thumbnails/19.jpg)
Basho Technologies | 19
RIAK: HIGH AVAILABILITY
Hinted handoff allows Riak nodes to temporarily take over storage operations for a failed node and update that node with changes when it comes back online.
![Page 20: Webinar: Data Modeling and Shortcuts to Success in Scaling Time Series Applications](https://reader031.vdocuments.net/reader031/viewer/2022030213/589ad2f21a28abc93a8b58f5/html5/thumbnails/20.jpg)
Basho Technologies | 20
RIAK TS: SCALABILITYRiak TS scales in a near-linear fashion so increasing the number of a nodes in a cluster increases the number of reads and writes a cluster can handle in a predictable fashion.
Rebalancing of the cluster is a non-blocking operation, which doesn’t require downtime to perform.
If 10 nodes can serve 40,000 Writes/Second Then 20 nodes should serve 72,000+ Writes/Second
> riak-admin cluster join [email protected]
> riak-admin cluster plan
> riak-admin cluster commit
A d d i n g a n o d e
![Page 21: Webinar: Data Modeling and Shortcuts to Success in Scaling Time Series Applications](https://reader031.vdocuments.net/reader031/viewer/2022030213/589ad2f21a28abc93a8b58f5/html5/thumbnails/21.jpg)
Basho Technologies | 21
RIAK TS: QUERY
select * from GeoCheckin where time > 1453224610000 and time < 1453225490000 and deviceId = 'abc-xxx-001-001'
select MIN(temperature), AVG(temperature), MAX(temperature) from GeoCheckin where
time > 1453224610000 and time < 1453225490000 and deviceId = 'abc-xxx-001-001'
select (temperature * 2), (pressure - 1) from GeoCheckin where
time > 1453224610000 and time < 1453225490000 and deviceId = 'abc-xxx-001-001'
Arithmetic
Aggregate
Range• SQL Interface• Arithmetic Support• Aggregate
– Count()– Sum()– Mean() & Avg()– Min() & Max()– STDDEV()
• Group By• Expanded
capabilitiesin future releases
![Page 22: Webinar: Data Modeling and Shortcuts to Success in Scaling Time Series Applications](https://reader031.vdocuments.net/reader031/viewer/2022030213/589ad2f21a28abc93a8b58f5/html5/thumbnails/22.jpg)
Basho Technologies | 22
BATCH PROCESSING
• Real-time vs. Batch• Spark Connector• Parallel Extract
![Page 23: Webinar: Data Modeling and Shortcuts to Success in Scaling Time Series Applications](https://reader031.vdocuments.net/reader031/viewer/2022030213/589ad2f21a28abc93a8b58f5/html5/thumbnails/23.jpg)
Basho Technologies | 23
DATA LIFECYCLE
• Global expiry• Per table expiry
coming soon• Spark batch for
rollups/aggregation
![Page 24: Webinar: Data Modeling and Shortcuts to Success in Scaling Time Series Applications](https://reader031.vdocuments.net/reader031/viewer/2022030213/589ad2f21a28abc93a8b58f5/html5/thumbnails/24.jpg)
Basho Technologies | 24
Time SeriesData Modeling
![Page 25: Webinar: Data Modeling and Shortcuts to Success in Scaling Time Series Applications](https://reader031.vdocuments.net/reader031/viewer/2022030213/589ad2f21a28abc93a8b58f5/html5/thumbnails/25.jpg)
Basho Technologies | 25
SUPPORTED DDL DATA TYPES• VARCHAR - Any string content is valid, including Unicode. Can only be
compared using strict equality, and will not be typecast (e.g., to an integer) for comparison purposes. Use single quotes to delimit varchar strings.
• BOOLEAN - true or false (any case)• TIMESTAMP - Timestamps are integer values expressing UNIX epoch time in
UTC in milliseconds. Zero is not a valid timestamp.• SINT64 - Signed 64-bit integer• DOUBLE - This type does not comply with its IEEE specification: NaN (not a
number) and INF (infinity) cannot be used.
![Page 26: Webinar: Data Modeling and Shortcuts to Success in Scaling Time Series Applications](https://reader031.vdocuments.net/reader031/viewer/2022030213/589ad2f21a28abc93a8b58f5/html5/thumbnails/26.jpg)
Basho Technologies | 26
THE KEY
Consists of:• Partition Key
(node/partition)• Quantum (optional)• Local Key (sort order)
![Page 27: Webinar: Data Modeling and Shortcuts to Success in Scaling Time Series Applications](https://reader031.vdocuments.net/reader031/viewer/2022030213/589ad2f21a28abc93a8b58f5/html5/thumbnails/27.jpg)
Basho Technologies | 27
RIAK TS: CREATE TABLE
CREATE TABLE GeoCheckin ( deviceID varchar not null, time timestamp not null, weather varchar not null, temperature double, PRIMARY KEY (
(deviceID, quantum(time, 15, 'm')), deviceID, time
) )
Partition Key
Local Key
![Page 28: Webinar: Data Modeling and Shortcuts to Success in Scaling Time Series Applications](https://reader031.vdocuments.net/reader031/viewer/2022030213/589ad2f21a28abc93a8b58f5/html5/thumbnails/28.jpg)
Basho Technologies | 28
MODELING THE KEY
Methodology:• What questions does your
application ask?• How is the data presented?
![Page 29: Webinar: Data Modeling and Shortcuts to Success in Scaling Time Series Applications](https://reader031.vdocuments.net/reader031/viewer/2022030213/589ad2f21a28abc93a8b58f5/html5/thumbnails/29.jpg)
Basho Technologies | 29
USE CASE: PEDOMETER
• Questions– How many steps today
(distance) for user?– How many steps per
day this week for user?– Daily average?– Change in elevation?
• Key– Partition: UserID– Local: timestamp– Optimized for reads:
quantum of 1 week– Optimized for writes
quantum of 1 day
• Fields– timestamp– steps– device_id– elevation– geohash
![Page 30: Webinar: Data Modeling and Shortcuts to Success in Scaling Time Series Applications](https://reader031.vdocuments.net/reader031/viewer/2022030213/589ad2f21a28abc93a8b58f5/html5/thumbnails/30.jpg)
Basho Technologies | 30
DEMO
• Riak TS• Python client• Jupyter Notebook
• Pandas• Matplotlib
![Page 31: Webinar: Data Modeling and Shortcuts to Success in Scaling Time Series Applications](https://reader031.vdocuments.net/reader031/viewer/2022030213/589ad2f21a28abc93a8b58f5/html5/thumbnails/31.jpg)
Basho Technologies | 31
THE DATADescription Field TypeSensor Status status varchar
Exit ID exitid varchar
Timestamp ts timestamp
Average Measured Time avgMeasuredTime sint64
Average Speed avgSpeed sint64
Median Measured Time medianMeasuredTime sint64
Number of Vehicles vehicleCount sint64
Sensor ID id sint64
Report ID report_id sint64
• Vehicle traffic data• City of Aarhus,
Denmark• Two sensors placed
at each exit• 5 min intervals
![Page 32: Webinar: Data Modeling and Shortcuts to Success in Scaling Time Series Applications](https://reader031.vdocuments.net/reader031/viewer/2022030213/589ad2f21a28abc93a8b58f5/html5/thumbnails/32.jpg)
Spark and Riak: In-situ analytics beyond Hadoop
![Page 33: Webinar: Data Modeling and Shortcuts to Success in Scaling Time Series Applications](https://reader031.vdocuments.net/reader031/viewer/2022030213/589ad2f21a28abc93a8b58f5/html5/thumbnails/33.jpg)
33
Who is DatabricksWhy Us Our Product
• Creators of Apache Spark. Contribute 75% of the code - 10x more than others
• Trained 20K Spark users
• Largest number of customers deploying Spark (200+)
• Just-in-Time Data Platform – powered by Apache Spark.
• Empower your organization to swiftly build and deploy advanced analytics with Spark.
![Page 34: Webinar: Data Modeling and Shortcuts to Success in Scaling Time Series Applications](https://reader031.vdocuments.net/reader031/viewer/2022030213/589ad2f21a28abc93a8b58f5/html5/thumbnails/34.jpg)
open source data processing engine built around speed, ease of use, and sophisticated analytics
largest open source data project with 1000+ contributors
![Page 35: Webinar: Data Modeling and Shortcuts to Success in Scaling Time Series Applications](https://reader031.vdocuments.net/reader031/viewer/2022030213/589ad2f21a28abc93a8b58f5/html5/thumbnails/35.jpg)
UNIFIED ENGINE ACROSS DIVERSE WORKLOADS & ENVIRONMENTS
Scale out, fault tolerant
Python, Java, Scala, and R APIs
Standard libraries
APACHE SPARK ENGINE
![Page 36: Webinar: Data Modeling and Shortcuts to Success in Scaling Time Series Applications](https://reader031.vdocuments.net/reader031/viewer/2022030213/589ad2f21a28abc93a8b58f5/html5/thumbnails/36.jpg)
First Cellular Phones Unified DeviceSpecialized Devices
ANALOGY: EVOLUTION OF CONSUMER ELECTRONICS
![Page 37: Webinar: Data Modeling and Shortcuts to Success in Scaling Time Series Applications](https://reader031.vdocuments.net/reader031/viewer/2022030213/589ad2f21a28abc93a8b58f5/html5/thumbnails/37.jpg)
HISTORY REPEATS: FASTER, EASIER TO USE, UNIFIED
First DistributedProcessing Engine
Specialized Data Processing Engines
Unified Data Processing Engine
![Page 38: Webinar: Data Modeling and Shortcuts to Success in Scaling Time Series Applications](https://reader031.vdocuments.net/reader031/viewer/2022030213/589ad2f21a28abc93a8b58f5/html5/thumbnails/38.jpg)
Google Trends: Hadoop vs. Spark
![Page 39: Webinar: Data Modeling and Shortcuts to Success in Scaling Time Series Applications](https://reader031.vdocuments.net/reader031/viewer/2022030213/589ad2f21a28abc93a8b58f5/html5/thumbnails/39.jpg)
Analytics in-situSQL
Streaming
MLEnable SQL analytics over RiakUse Riak to store streaming data
Use Riak to serve results generated by Spark
![Page 40: Webinar: Data Modeling and Shortcuts to Success in Scaling Time Series Applications](https://reader031.vdocuments.net/reader031/viewer/2022030213/589ad2f21a28abc93a8b58f5/html5/thumbnails/40.jpg)
Riak Spark Connector
User application contacts the coordinating node returning the locations of the data using cluster replication and availability information.Then “N” Spark workers open “N” parallel connections to different nodes, which allow the application to retrieve the desired dataset “N” times faster, without generating “hot spots”.
![Page 41: Webinar: Data Modeling and Shortcuts to Success in Scaling Time Series Applications](https://reader031.vdocuments.net/reader031/viewer/2022030213/589ad2f21a28abc93a8b58f5/html5/thumbnails/41.jpg)
Demo
![Page 42: Webinar: Data Modeling and Shortcuts to Success in Scaling Time Series Applications](https://reader031.vdocuments.net/reader031/viewer/2022030213/589ad2f21a28abc93a8b58f5/html5/thumbnails/42.jpg)
Build a PoC on Databricks today.Professional services and training also available.
Contact [email protected]
or
Sign up for a trial at https://databricks.com/try-databricks
![Page 43: Webinar: Data Modeling and Shortcuts to Success in Scaling Time Series Applications](https://reader031.vdocuments.net/reader031/viewer/2022030213/589ad2f21a28abc93a8b58f5/html5/thumbnails/43.jpg)
Basho Technologies | 43
Thank You!
If you have any questions please reach out to us at basho.com/contact