ba4all sweden: state of the art in in-memory analytics

63
Use this title slide only with an image What is the Present State of the Art Of In-Memory Analytics? Timo Elliott, Innovation Evangelist timoelliott.com

Upload: timo-elliott

Post on 08-Apr-2017

4.165 views

Category:

Data & Analytics


6 download

TRANSCRIPT

Page 1: BA4All Sweden: State of the Art in In-Memory Analytics

Use this title slide only with an image

What is the Present State of the Art Of In-Memory Analytics?

Timo Elliott, Innovation Evangelist timoelliott.com

Page 2: BA4All Sweden: State of the Art in In-Memory Analytics

Disclaimer

“i think you’ll find it’s a bit more

complicated than that.”

Page 3: BA4All Sweden: State of the Art in In-Memory Analytics

A Bit of History

Page 4: BA4All Sweden: State of the Art in In-Memory Analytics

LEO: Lyon’s Electronic Office, 1951

Sixty-four 5ft-long mercury tubes, each weighing half a ton, were used to provide a massive 8.75 Kb of memory (i.e. one hundred-thousandth of a today’s entry-level iPhone).

Page 5: BA4All Sweden: State of the Art in In-Memory Analytics

1980s – first in-memory BI tools

Usefulness limited by high cost of memory and limitations of 16bit memory addressing

640KB max memory

Page 6: BA4All Sweden: State of the Art in In-Memory Analytics

1995: Windows 95 & 32-bit Architectures

Qlikview, TimesTen, and others take advantage of new 32bit memory addressing to provide in-memory analytics

Page 7: BA4All Sweden: State of the Art in In-Memory Analytics

Complex Event Processing

Sensor readings – 10’s of thousands per second

Virtually no useful information in a single

isolated event history

e.g. Compare variance of trends

across multiple sensors against historical norms

Event window – e.g. 30 min

Alert

Extracting insight from events

Page 8: BA4All Sweden: State of the Art in In-Memory Analytics

Complex Event Processing

Tradtional BI: “How manyFraudulent credit card transactionsoccurred last week in Madrid?”

1 2 3 4 5 6 7 8 9

time

Complex Event Processing: “when three credit card authorizations for the same card occur in any five secondswindow, deny the requests and check for fraud.”

Continuous Queries

Page 9: BA4All Sweden: State of the Art in In-Memory Analytics

In-Memory and The Internet of Things

CEP Engine

Studio

Input Streams

Sensors

Messages

Transactions

Market data

Clicks

Other datastorage

Alerts

Dashboards

Applications

adapters

Page 10: BA4All Sweden: State of the Art in In-Memory Analytics

Reporting

“Traditional” Business Intelligence

SlowPainful

Expensive

Operational Data Store

Data Warehouse

Indexes

Aggregates

DataBusiness ApplicationsCopy

ETLCalculation EngineBusiness Intelligence

Query ResultsQuery

Slow

Painful

ExpensiveOperational Data Store

Data Warehouse

Indexes

Aggregates

DataBusiness ApplicationsCopy

ETL

Calculation EngineBusiness Intelligence

Query ResultsQuery

DataMarts

Page 11: BA4All Sweden: State of the Art in In-Memory Analytics

It’s Like An Onion…

The more layers there are, the more it makes you cry…

Page 12: BA4All Sweden: State of the Art in In-Memory Analytics

What Was The Problem?

Slow Disks & CPUsI/O Bottleneck

Expensive Memory

Optimized for TransactionsBI is an Afterthought

30 Year-Old Database Design Principles

Page 13: BA4All Sweden: State of the Art in In-Memory Analytics

Why Talk About In-Memory?

Page 14: BA4All Sweden: State of the Art in In-Memory Analytics

Analysts Recommend In-Memory

.

“An in-memory data platform offers more than performance benefits”

“Recommendations: Invest in an in-memory data platform to gain competitive edge”

“In-Memory Database Is Gaining Momentum Across All Use Cases”

“In-Memory Delivers Extreme Performance And Scalability”

“In-Memory Data Platform Is No Longer AnOption — It’s A Necessity!”

Page 15: BA4All Sweden: State of the Art in In-Memory Analytics

Companies Like Yours Are Implementing In-Memory

32%run in-memory databases at their location today

75%expect to expand their in-memory use in the next 3 years

More rapid deployments

Greater flexibility

Faster response times / less latency

21%

25%

88%

IT operations

Core business functions

Analytics

25%

42%

58%

Source: 2014 DBTA survey of IT and data managers

Top Uses

Top Benefits

Page 16: BA4All Sweden: State of the Art in In-Memory Analytics

Database vendors are investing in in-memory

The Forrester Wave: In-Memory Database Platforms, Q3 ‘15

Page 17: BA4All Sweden: State of the Art in In-Memory Analytics

All Analytics Vendors Now Support In-Memory To Some Extent

Oracle Database In-Memory Option“The Oracle Database In-Memory option dramatically accelerates the performance of analytic queries by storing data in a highly optimized columnar in-memory format.”

Microsoft SQL Server In-Memory OLTP‘When data lives totally in memory, we can use much, much simpler data structures. When a table is declared memory-optimized, all of its records live in memory.”

DB2 with BLU Acceleration“IBM DB2 with BLU Acceleration speeds analytics and reporting using dynamic in-memory columnar technologies. In-memory columnar technologies provide an extremely efficient way to scan and find relevant data.“

Qlik“In-memory indexing automatically builds and maintains all data relationships from multiple sources for unrestricted exploration”

SAP HANA“A good example of a modern in-memory database technology is SAP's HANA platform. “

Teradata“Teradata uses a hybrid approach to in-memory that intelligently puts the right data in memory to deliver high-speed in-memory performance at a fraction of the cost of putting all data in memory.“

Tableau“The Data Engine is a high-performing analytics database on your PC. It has the speed benefits of traditional in-memory solutions without the limitations that your data must fit in memory.“

Spark“Run programs up to 100x faster than Hadoop MapReduce in memory, or 10x faster on disk.“

Page 18: BA4All Sweden: State of the Art in In-Memory Analytics

What Is In-Memory?And why now?

Page 19: BA4All Sweden: State of the Art in In-Memory Analytics

What Is In-Memory?

Data access times of various storage types relative to RAM (logarithmic scale)

RAM is 300,000 times faster than hard disks

CPU register is 61 million times faster than hard disks

Page 20: BA4All Sweden: State of the Art in In-Memory Analytics

In-Memory Databases vs. Caching

“Much of the work that is done by a conventional, disk-optimized RDBMS is done under the assumption that data primarily resides on disk. Even when a disk-based RDBMS has been configured to hold all of its data in main memory, its performance is hobbled by assumptions of disk-based data residency. When the assumption of disk-residency is removed, complexity is dramatically reduced.”

- Oracle TimesTen Overview

Page 21: BA4All Sweden: State of the Art in In-Memory Analytics

In-Memory Computing Costs have Plummeted

Turning Torso: 190m

Cost of 1 Mb of memory in 2000: ≈$1

Page 22: BA4All Sweden: State of the Art in In-Memory Analytics

In-Memory Computing Costs have Plummeted

Cost of 1 Mb of memory today: ≈ ½ cent

75cm

And shrinking, and shrinking, and shrinking….

IKEAMICKESkrivbord399 kr

Page 23: BA4All Sweden: State of the Art in In-Memory Analytics

Prices Continue to Slide

DRAM production costs drop by 30% every 12 months

Page 24: BA4All Sweden: State of the Art in In-Memory Analytics

In-Memory Computing

Operational Data Store

Data Warehouse

Indexes

Aggregates

DataBusiness ApplicationsCopy

ETL

Calculation EngineBusiness Intelligence

Query ResultsQuery

Up to 1,000x fasterNo optimizations required Data

Marts

Page 25: BA4All Sweden: State of the Art in In-Memory Analytics

Row vs. Column Databases

My Filing SystemMy Wife’s Filing System

Row-based Column-based

Page 26: BA4All Sweden: State of the Art in In-Memory Analytics

Data WarehouseData Warehouse

Column Databases

Operational Data Store

Data Warehouse

DataBusiness ApplicationsCopy

ETL

Calculation EngineBusiness Intelligence

Query ResultsQuery

Up to 1,000x fasterMore data in less space

Page 27: BA4All Sweden: State of the Art in In-Memory Analytics

Massively Parallel Systems

E.g. Netezza technology now part of IBM PureSystems

E.g. Greenplum, now part of EMC

Page 28: BA4All Sweden: State of the Art in In-Memory Analytics

Column Stores, Compression, and Parallel Processing

E.g. DB2 with BLU acceleration

Page 29: BA4All Sweden: State of the Art in In-Memory Analytics

“In-Chip” Processing

E.g. SiSense

Vector-based instructionsCache-optimizedDecompression

Close collaboration between in-memory software vendors and chip developers (e.g. SAP & Intel Haswell)

Page 30: BA4All Sweden: State of the Art in In-Memory Analytics

Data Warehouse

Massively Parallel Hardware

Operational Data Store

DataBusiness ApplicationsCopy

ETL

Business IntelligenceQuery Results

Query

Up to 1,000x fasterOptimized for hardware

Calculation Engine

Page 31: BA4All Sweden: State of the Art in In-Memory Analytics

In-Database Processing

E.g. SAS & Teradata

Page 32: BA4All Sweden: State of the Art in In-Memory Analytics

Move Processing to the Data

Operational (OLTP)

Analytics (OLAP)

Planning Predictive

TextSearch

Spatial

Processing Engines

Relational Stores

Row based Columnar

ETLData Quality

DocumentStore

Object Graph Store

Page 33: BA4All Sweden: State of the Art in In-Memory Analytics

Data Warehouse

In-Database Analytics

Operational Data Store

DataBusiness ApplicationsCopy

ETL

Business IntelligenceQuery Results

Query

Up to 1,000x fasterPush processing down to dedicated hardware, less traffic

Analytic Appliance

Calculation Engine

Page 34: BA4All Sweden: State of the Art in In-Memory Analytics

Real-Time Data

Operational Data StoreCopy

ETL

Real-time replication — why have a separate operational data store?

DataBusiness Applications

Analytic ApplianceBusiness Intelligence

Page 35: BA4All Sweden: State of the Art in In-Memory Analytics

Transactions

ACID (Atomicity, Consistency, Isolation, Durability) is a set of properties that guarantee that database transactions are processed reliably.

ACIDACIDcompliance

Page 36: BA4All Sweden: State of the Art in In-Memory Analytics

In-Memory Enterprise Applications

E.g. Microsoft SQL Server In-Memory OLTP

Page 37: BA4All Sweden: State of the Art in In-Memory Analytics

In-Memory Enterprise Applications

E.g. SAP S/4 HANA

Page 38: BA4All Sweden: State of the Art in In-Memory Analytics

Hybrid Transactional Analytical Processing

CopyBusiness Applications

Analytic ApplianceBusiness Intelligence

Use a single platform for both analytics and applications

Data

Page 39: BA4All Sweden: State of the Art in In-Memory Analytics

Virtuous Circle of Technology

In-Memory

Columnar Databases

Hardware Acceleration

Calculation Engine

Columnar storage increases the amount of data that can be stored in limited memory (compared to disk)

Column databases enable easier parallelization of queries

In-memory processing gives more time for

relatively slow updates to column data

In-memory allows sophisticated calculations

in real-time

Hardware acceleration makes sophisticated calculations possible

Each technology works well on its own, but combining them all is the real opportunity — provides all of the upside benefits while mitigating the downsides

Page 40: BA4All Sweden: State of the Art in In-Memory Analytics

Apache Spark

MAP

Reduce

HDFS

MAP

Reduce

DataSource 2

map()

join()

cache()

transform

Hadoop V1 Spark

Page 41: BA4All Sweden: State of the Art in In-Memory Analytics

Lots of Support for Spark

Page 42: BA4All Sweden: State of the Art in In-Memory Analytics

YARN

HDFS

Other Apps

Files Files Files

HANA-Spark Adapter for improved performance between distributed systems

Compiled queries enable applications & data analysis to work more efficiently across nodes

Familiar OLAP experience on Hadoop to derive business insights from big data such as drill-down into HDFS data

Compiled Queries

Spark Adapter

Drill Downs

SAP HANA in-memory platform

Vora

Spark

Vora

SparkIn-Memory

Store

Application Services

Database Services

Integration Services

Processing Services

Vora

SparkHANA-Spark

Adaptor

HANA Smart Data Access, UDFs, Others

Extensive programming support for Scala, python, C, C++, R, and Java allow data scientists to use their tool of choice,

Enable data scientists and developers who prefer Spark R, Spark ML to mash up corporate data with Hadoop/Spark data easily

Optionally, leverage HANA’s multiple data processing engines for developing new insights from business and contextual data.

Spark Extensions

SAP HANA Vora

Page 43: BA4All Sweden: State of the Art in In-Memory Analytics

Persistence & Failover

Page 44: BA4All Sweden: State of the Art in In-Memory Analytics

Next-Generation Chips Are On Their Way

NVMnon-volatile memory

Page 45: BA4All Sweden: State of the Art in In-Memory Analytics

Scale Up

4,294,967,296x256x

16 bit 32 bit 64 bit64 kilobytes 4 gigabytes 16 exabytes

Directly addressable memory

Page 46: BA4All Sweden: State of the Art in In-Memory Analytics

What About Scale?

There are now systems with more than half a petabyte of in-memory, and growing…

Page 47: BA4All Sweden: State of the Art in In-Memory Analytics

Balancing Data Temperature and Costs

Hot

Warm

Cold

Data is accessed frequently

Data is not accessed frequently

Data is only accessed sporadically

Volumeof data

Performance(and direct cost)

Many different solutions possible

Page 48: BA4All Sweden: State of the Art in In-Memory Analytics

What Type of In-Memory Is The Right One?

 

Complex ROI calculations

Data volumes

Relative costs (?)

Cost of storage

Value of speed

Value of agility

Page 49: BA4All Sweden: State of the Art in In-Memory Analytics

Fast-Moving Market

Page 50: BA4All Sweden: State of the Art in In-Memory Analytics

Hybrid vs. Pure In-Memory Tradeoffsdata duplication vs single source

Legacy + In Memory Approach

ad hoc: made or done without planning because of an immediate need. (Merriam-Webster dictionary)

DISPATCH/MERGE

Results

Query

Current Data

StaleDuplicated

Data

Current DataQuery

Select all data from one memory store

Results

Pure In-Memory Approach

Unpredictable Response Times Responses based on Obsolete Data

Real-time Responses on Current Data

replicated vs real-timeunpredictable response times vs consistent response times

Page 51: BA4All Sweden: State of the Art in In-Memory Analytics

Top Benefits

Page 52: BA4All Sweden: State of the Art in In-Memory Analytics

Speed

“If things seem under control, you’re just not going fast enough.”

- Mario Andretti

Page 53: BA4All Sweden: State of the Art in In-Memory Analytics

Real-Time Operations

Instead of analyzing the shards of glass after the accident, what if you could catch the vase BEFORE it hit the ground?

Page 54: BA4All Sweden: State of the Art in In-Memory Analytics

Agility (Speed of Change)

Page 55: BA4All Sweden: State of the Art in In-Memory Analytics

Simplification = Lower Costs

“In-memory changes the cost equation through simplification.

It can help save costs on hardware and software, as well as reduce labor required for administration and development needs.

Based on a composite cost model, an in-memory platform can save an organization 37% across hardware, software, and labor costs, depending on various factors.”

Page 56: BA4All Sweden: State of the Art in In-Memory Analytics

Lower Costs

“Don’t let somebody say to you we can’t go in-memory because it’s so much more money. Acquisition costs may be higher. If you calculate out a TCO, it’s going to be less.”

Donald Feinberg, Gartner

Page 57: BA4All Sweden: State of the Art in In-Memory Analytics

The price of light… …is less than the cost of darkness

ROI = Return On Ignorance?

Page 58: BA4All Sweden: State of the Art in In-Memory Analytics

New, Simpler Infrastructures and Business Models

Weissbeerger Beverage Analytics

Page 59: BA4All Sweden: State of the Art in In-Memory Analytics

Conclusion

Page 60: BA4All Sweden: State of the Art in In-Memory Analytics

Myths & Facts

It’s a niche technology to run analytics faster

It has been around since late 1990s

The main users of in-memory analytics are SMBs

Entire industries (SaaS, social networks, financial trading, online gaming) would not exist as we know them today without in-memory computing

More than 50 software vendors deliver in-memory technology

Small number of in-memory vendors

Only for deep-pocketed organizations

New and unproven

Myths Facts

Page 61: BA4All Sweden: State of the Art in In-Memory Analytics

Business Impact of In-Memory Computing

• Reducing applications running cost via data base/legacy applications offloading

• Improving transactional applications performance• Enabling horizontal, elastic scalability (scale up/down)• Boosting response time in analytical applications• Low latency (<1 microsecond) application messaging• Dramatically shortening batch processes execution time• Enabling real-time, "self-service" business intelligence and

unconstrained data exploration• Detecting correlations/patterns across million of events in "a

blink of an eye"• Supporting "big data" (big data needs big memory)• Running transactional and analytical applications on the

same physical dataset

Run the business

Grow the business

Transform the business

Opportunities:

Bus

ines

s Im

pact

Page 62: BA4All Sweden: State of the Art in In-Memory Analytics

In-Memory Changes Everything

“In-memory computing will have a long-term, disruptive impact by radically changing users’ expectations, application design principles, products’ architecture and vendors’ strategy.”

— Gartner

Page 63: BA4All Sweden: State of the Art in In-Memory Analytics

Thank you!

[email protected]

web site

email