ba4all sweden: state of the art in in-memory analytics
TRANSCRIPT
Use this title slide only with an image
What is the Present State of the Art Of In-Memory Analytics?
Timo Elliott, Innovation Evangelist timoelliott.com
Disclaimer
“i think you’ll find it’s a bit more
complicated than that.”
A Bit of History
LEO: Lyon’s Electronic Office, 1951
Sixty-four 5ft-long mercury tubes, each weighing half a ton, were used to provide a massive 8.75 Kb of memory (i.e. one hundred-thousandth of a today’s entry-level iPhone).
1980s – first in-memory BI tools
Usefulness limited by high cost of memory and limitations of 16bit memory addressing
640KB max memory
1995: Windows 95 & 32-bit Architectures
Qlikview, TimesTen, and others take advantage of new 32bit memory addressing to provide in-memory analytics
Complex Event Processing
Sensor readings – 10’s of thousands per second
Virtually no useful information in a single
isolated event history
e.g. Compare variance of trends
across multiple sensors against historical norms
Event window – e.g. 30 min
Alert
Extracting insight from events
Complex Event Processing
Tradtional BI: “How manyFraudulent credit card transactionsoccurred last week in Madrid?”
1 2 3 4 5 6 7 8 9
time
Complex Event Processing: “when three credit card authorizations for the same card occur in any five secondswindow, deny the requests and check for fraud.”
Continuous Queries
In-Memory and The Internet of Things
CEP Engine
Studio
Input Streams
Sensors
Messages
Transactions
Market data
Clicks
…
Other datastorage
Alerts
Dashboards
Applications
adapters
Reporting
“Traditional” Business Intelligence
SlowPainful
Expensive
Operational Data Store
Data Warehouse
Indexes
Aggregates
DataBusiness ApplicationsCopy
ETLCalculation EngineBusiness Intelligence
Query ResultsQuery
Slow
Painful
ExpensiveOperational Data Store
Data Warehouse
Indexes
Aggregates
DataBusiness ApplicationsCopy
ETL
Calculation EngineBusiness Intelligence
Query ResultsQuery
DataMarts
It’s Like An Onion…
The more layers there are, the more it makes you cry…
What Was The Problem?
Slow Disks & CPUsI/O Bottleneck
Expensive Memory
Optimized for TransactionsBI is an Afterthought
30 Year-Old Database Design Principles
Why Talk About In-Memory?
Analysts Recommend In-Memory
.
“An in-memory data platform offers more than performance benefits”
“Recommendations: Invest in an in-memory data platform to gain competitive edge”
“In-Memory Database Is Gaining Momentum Across All Use Cases”
“In-Memory Delivers Extreme Performance And Scalability”
“In-Memory Data Platform Is No Longer AnOption — It’s A Necessity!”
Companies Like Yours Are Implementing In-Memory
32%run in-memory databases at their location today
75%expect to expand their in-memory use in the next 3 years
More rapid deployments
Greater flexibility
Faster response times / less latency
21%
25%
88%
IT operations
Core business functions
Analytics
25%
42%
58%
Source: 2014 DBTA survey of IT and data managers
Top Uses
Top Benefits
Database vendors are investing in in-memory
The Forrester Wave: In-Memory Database Platforms, Q3 ‘15
All Analytics Vendors Now Support In-Memory To Some Extent
Oracle Database In-Memory Option“The Oracle Database In-Memory option dramatically accelerates the performance of analytic queries by storing data in a highly optimized columnar in-memory format.”
Microsoft SQL Server In-Memory OLTP‘When data lives totally in memory, we can use much, much simpler data structures. When a table is declared memory-optimized, all of its records live in memory.”
DB2 with BLU Acceleration“IBM DB2 with BLU Acceleration speeds analytics and reporting using dynamic in-memory columnar technologies. In-memory columnar technologies provide an extremely efficient way to scan and find relevant data.“
Qlik“In-memory indexing automatically builds and maintains all data relationships from multiple sources for unrestricted exploration”
SAP HANA“A good example of a modern in-memory database technology is SAP's HANA platform. “
Teradata“Teradata uses a hybrid approach to in-memory that intelligently puts the right data in memory to deliver high-speed in-memory performance at a fraction of the cost of putting all data in memory.“
Tableau“The Data Engine is a high-performing analytics database on your PC. It has the speed benefits of traditional in-memory solutions without the limitations that your data must fit in memory.“
Spark“Run programs up to 100x faster than Hadoop MapReduce in memory, or 10x faster on disk.“
What Is In-Memory?And why now?
What Is In-Memory?
Data access times of various storage types relative to RAM (logarithmic scale)
RAM is 300,000 times faster than hard disks
CPU register is 61 million times faster than hard disks
In-Memory Databases vs. Caching
“Much of the work that is done by a conventional, disk-optimized RDBMS is done under the assumption that data primarily resides on disk. Even when a disk-based RDBMS has been configured to hold all of its data in main memory, its performance is hobbled by assumptions of disk-based data residency. When the assumption of disk-residency is removed, complexity is dramatically reduced.”
- Oracle TimesTen Overview
In-Memory Computing Costs have Plummeted
Turning Torso: 190m
Cost of 1 Mb of memory in 2000: ≈$1
In-Memory Computing Costs have Plummeted
Cost of 1 Mb of memory today: ≈ ½ cent
75cm
And shrinking, and shrinking, and shrinking….
IKEAMICKESkrivbord399 kr
Prices Continue to Slide
DRAM production costs drop by 30% every 12 months
In-Memory Computing
Operational Data Store
Data Warehouse
Indexes
Aggregates
DataBusiness ApplicationsCopy
ETL
Calculation EngineBusiness Intelligence
Query ResultsQuery
Up to 1,000x fasterNo optimizations required Data
Marts
Row vs. Column Databases
My Filing SystemMy Wife’s Filing System
Row-based Column-based
Data WarehouseData Warehouse
Column Databases
Operational Data Store
Data Warehouse
DataBusiness ApplicationsCopy
ETL
Calculation EngineBusiness Intelligence
Query ResultsQuery
Up to 1,000x fasterMore data in less space
Massively Parallel Systems
E.g. Netezza technology now part of IBM PureSystems
E.g. Greenplum, now part of EMC
Column Stores, Compression, and Parallel Processing
E.g. DB2 with BLU acceleration
“In-Chip” Processing
E.g. SiSense
Vector-based instructionsCache-optimizedDecompression
Close collaboration between in-memory software vendors and chip developers (e.g. SAP & Intel Haswell)
Data Warehouse
Massively Parallel Hardware
Operational Data Store
DataBusiness ApplicationsCopy
ETL
Business IntelligenceQuery Results
Query
Up to 1,000x fasterOptimized for hardware
Calculation Engine
In-Database Processing
E.g. SAS & Teradata
Move Processing to the Data
Operational (OLTP)
Analytics (OLAP)
Planning Predictive
TextSearch
Spatial
Processing Engines
Relational Stores
Row based Columnar
ETLData Quality
DocumentStore
Object Graph Store
Data Warehouse
In-Database Analytics
Operational Data Store
DataBusiness ApplicationsCopy
ETL
Business IntelligenceQuery Results
Query
Up to 1,000x fasterPush processing down to dedicated hardware, less traffic
Analytic Appliance
Calculation Engine
Real-Time Data
Operational Data StoreCopy
ETL
Real-time replication — why have a separate operational data store?
DataBusiness Applications
Analytic ApplianceBusiness Intelligence
Transactions
ACID (Atomicity, Consistency, Isolation, Durability) is a set of properties that guarantee that database transactions are processed reliably.
ACIDACIDcompliance
In-Memory Enterprise Applications
E.g. Microsoft SQL Server In-Memory OLTP
In-Memory Enterprise Applications
E.g. SAP S/4 HANA
Hybrid Transactional Analytical Processing
CopyBusiness Applications
Analytic ApplianceBusiness Intelligence
Use a single platform for both analytics and applications
Data
Virtuous Circle of Technology
In-Memory
Columnar Databases
Hardware Acceleration
Calculation Engine
Columnar storage increases the amount of data that can be stored in limited memory (compared to disk)
Column databases enable easier parallelization of queries
In-memory processing gives more time for
relatively slow updates to column data
In-memory allows sophisticated calculations
in real-time
Hardware acceleration makes sophisticated calculations possible
Each technology works well on its own, but combining them all is the real opportunity — provides all of the upside benefits while mitigating the downsides
Apache Spark
MAP
Reduce
HDFS
MAP
Reduce
DataSource 2
map()
join()
cache()
transform
Hadoop V1 Spark
Lots of Support for Spark
YARN
HDFS
Other Apps
Files Files Files
HANA-Spark Adapter for improved performance between distributed systems
Compiled queries enable applications & data analysis to work more efficiently across nodes
Familiar OLAP experience on Hadoop to derive business insights from big data such as drill-down into HDFS data
Compiled Queries
Spark Adapter
Drill Downs
SAP HANA in-memory platform
Vora
Spark
Vora
SparkIn-Memory
Store
Application Services
Database Services
Integration Services
Processing Services
Vora
SparkHANA-Spark
Adaptor
HANA Smart Data Access, UDFs, Others
Extensive programming support for Scala, python, C, C++, R, and Java allow data scientists to use their tool of choice,
Enable data scientists and developers who prefer Spark R, Spark ML to mash up corporate data with Hadoop/Spark data easily
Optionally, leverage HANA’s multiple data processing engines for developing new insights from business and contextual data.
Spark Extensions
SAP HANA Vora
Persistence & Failover
Next-Generation Chips Are On Their Way
NVMnon-volatile memory
Scale Up
4,294,967,296x256x
16 bit 32 bit 64 bit64 kilobytes 4 gigabytes 16 exabytes
Directly addressable memory
What About Scale?
There are now systems with more than half a petabyte of in-memory, and growing…
Balancing Data Temperature and Costs
Hot
Warm
Cold
Data is accessed frequently
Data is not accessed frequently
Data is only accessed sporadically
Volumeof data
Performance(and direct cost)
Many different solutions possible
What Type of In-Memory Is The Right One?
Complex ROI calculations
Data volumes
Relative costs (?)
Cost of storage
Value of speed
Value of agility
Fast-Moving Market
Hybrid vs. Pure In-Memory Tradeoffsdata duplication vs single source
Legacy + In Memory Approach
ad hoc: made or done without planning because of an immediate need. (Merriam-Webster dictionary)
DISPATCH/MERGE
Results
Query
Current Data
StaleDuplicated
Data
Current DataQuery
Select all data from one memory store
Results
Pure In-Memory Approach
Unpredictable Response Times Responses based on Obsolete Data
Real-time Responses on Current Data
replicated vs real-timeunpredictable response times vs consistent response times
Top Benefits
Speed
“If things seem under control, you’re just not going fast enough.”
- Mario Andretti
Real-Time Operations
Instead of analyzing the shards of glass after the accident, what if you could catch the vase BEFORE it hit the ground?
Agility (Speed of Change)
Simplification = Lower Costs
“In-memory changes the cost equation through simplification.
It can help save costs on hardware and software, as well as reduce labor required for administration and development needs.
Based on a composite cost model, an in-memory platform can save an organization 37% across hardware, software, and labor costs, depending on various factors.”
Lower Costs
“Don’t let somebody say to you we can’t go in-memory because it’s so much more money. Acquisition costs may be higher. If you calculate out a TCO, it’s going to be less.”
Donald Feinberg, Gartner
The price of light… …is less than the cost of darkness
ROI = Return On Ignorance?
New, Simpler Infrastructures and Business Models
Weissbeerger Beverage Analytics
Conclusion
Myths & Facts
It’s a niche technology to run analytics faster
It has been around since late 1990s
The main users of in-memory analytics are SMBs
Entire industries (SaaS, social networks, financial trading, online gaming) would not exist as we know them today without in-memory computing
More than 50 software vendors deliver in-memory technology
Small number of in-memory vendors
Only for deep-pocketed organizations
New and unproven
Myths Facts
Business Impact of In-Memory Computing
• Reducing applications running cost via data base/legacy applications offloading
• Improving transactional applications performance• Enabling horizontal, elastic scalability (scale up/down)• Boosting response time in analytical applications• Low latency (<1 microsecond) application messaging• Dramatically shortening batch processes execution time• Enabling real-time, "self-service" business intelligence and
unconstrained data exploration• Detecting correlations/patterns across million of events in "a
blink of an eye"• Supporting "big data" (big data needs big memory)• Running transactional and analytical applications on the
same physical dataset
Run the business
Grow the business
Transform the business
Opportunities:
Bus
ines
s Im
pact
In-Memory Changes Everything
“In-memory computing will have a long-term, disruptive impact by radically changing users’ expectations, application design principles, products’ architecture and vendors’ strategy.”
— Gartner