innovations in business intelligence database technology · 2015-07-21 · however, the...

17
Whitepaper Innovations in Business Intelligence Database Technology www.sisense.com

Upload: others

Post on 08-Jul-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Innovations in Business Intelligence Database Technology · 2015-07-21 · However, the requirements of modern business intelligence have set a challenge that in-memory databases

www.sisense.com

Whitepaper

Innovations in Business Intelligence Database Technology

www.sisense.com

Page 2: Innovations in Business Intelligence Database Technology · 2015-07-21 · However, the requirements of modern business intelligence have set a challenge that in-memory databases

www.sisense.com

The State of Database Technology in 2015

Database technology has seen rapid developments in the past two

decades. Online Analytical Processing (OLAP and its derivatives,

MOLAP, ROLAP and HOLAP), which gained prominence in the 1990s,

gradually lost altitude in favor of in-memory databases at the start of

the 21st century.

However, the requirements of modern business intelligence have set

a challenge that in-memory databases will have a very difficult time

responding to. This, in turn, has brought on the next generation of

databases and querying – in-chip analytics. This newly developed

technology makes use of the CPU, RAM and disk storage in innovative

ways in order to tackle the complexity and size of data sets that

current BI software is forced to handle in order to provide effective

insights to end users at a reasonable timeframe.

This guide will cover:

OLAP Cubes history and overview

In-memory databases advantages and shortcomings

In-chip technology – development, overview and promise

Page 3: Innovations in Business Intelligence Database Technology · 2015-07-21 · However, the requirements of modern business intelligence have set a challenge that in-memory databases

www.sisense.com

OLAP Cubes Summary

Leading Provider:

Oracle

Pros

Centralized data integration

Fast data retrieval for specific queries

Cons

Resource intensive

Inflexibility, limited support for ad-hoc queries

Long build times

OLAP technology provided a great basis for business

intelligence 20 years ago, but suffers from several limitations

which make it a less than ideal fit for most modern BI projects. It

allows users to receive quick answers to specific pre-defined

queries but is resource intensive and problematic when it comes

to larger data sets and ad-hoc querying.

Page 4: Innovations in Business Intelligence Database Technology · 2015-07-21 · However, the requirements of modern business intelligence have set a challenge that in-memory databases

www.sisense.com

Overview

History

OLAP is a database technology was first developed in the late 1960s,

but only gained widespread commercial use in the 1990s with

Microsoft’s first release of their OLAP Services product (now Analysis

Services), based on technology acquired from Panorama Software.

At that point in time, when computer hardware wasn’t nearly as

powerful as it is today, OLAP was groundbreaking. It introduced a

spectacular way for business users (typically analysts) to easily

perform multidimensional analysis of large volumes of business data.

When Microsoft’s Multidimensional Expressions language (MDX)

came closer to becoming a standard, more and more client tools (e.g.,

Panorama NovaView, ProClarity) started popping up to provide even

more power to these users.

How it Works

An OLAP database converts table based datasets into multi-

dimensional arrays called Cubes in order to optimize querying and

data retrieval. Users can then access specific dimensions of the data

for analysis purposes.

For a simplified example, let’s think of a chain of pet stores that tracks

sales of various items across cities and over time. It might track these

figures in a series of spreadsheets such as these:

Page 5: Innovations in Business Intelligence Database Technology · 2015-07-21 · However, the requirements of modern business intelligence have set a challenge that in-memory databases

www.sisense.com

Whereas in an OLAP cube, the same information would be stored

multi-dimensionally:

Note that this illustration is

somewhat over-simplified. In reality

there can be a virtually endless

amount of dimensions, which are

not necessarily symmetrical.

To answer queries, an OLAP cube typically includes roll-up cells which

contain aggregated data, according to certain perimeters (in our

example, sales over time, or item sales by location). These

aggregations are pre-calculated when the system is “at rest” (i.e. not

being used by end-users).

Page 6: Innovations in Business Intelligence Database Technology · 2015-07-21 · However, the requirements of modern business intelligence have set a challenge that in-memory databases

www.sisense.com

Thus, once a query is made, the answer is already within the data cube

and retrieved instantaneously. However, OLAP cubes have their

drawbacks, the main ones being:

Each additional query requires a new dimension to be added to the

cube, which means duplicating the entire cube in terms of data

storage. This means that OLAP databases quickly become

resource intensive when it comes to data storage and

management.

Aggregating data requires the CPU to process every cell of the

data, which means that each new build (such as when additional

data is added) takes a relatively long time to produce.

OLAP cubes are very fast when it comes to specific, pre-designed

queries. However if a user wants to make a NEW query (e.g., avg.

sales of hamsters-per-year), this data is not pre-calculated and will

require additional dimensions to be added to the cube – a lengthy

process.

Page 7: Innovations in Business Intelligence Database Technology · 2015-07-21 · However, the requirements of modern business intelligence have set a challenge that in-memory databases

www.sisense.com

In-Memory Databases Summary

Leading Provider:

Qlik

Pros

Fast data retrieval

Support for ad-hoc queries

Cons

Expensive to implement and maintain

Scalability issues

Overview

History

In-memory databases became popular in the start of the 21st century

with the proliferation of cheap and widely available 64-bit PCs and the

In-memory technology – i.e., loading the entire database

into RAM and from there transferring it to the CPU to perform

calculations – has become a leading solution for business

intelligence, as it provides users with the ability to receive fast

answers to their queries, without the need for lengthy builds and

pre-calculations; but the size and complexity of modern data is

forcing in-memory databases to face their limitations.

Page 8: Innovations in Business Intelligence Database Technology · 2015-07-21 · However, the requirements of modern business intelligence have set a challenge that in-memory databases

www.sisense.com

adoption of columnar databases as an alternative to the row-based

systems which were the basis for OLAP cubes.

More RAM on a PC meant that more data can be quickly queried. If

crunching a million rows of data on a machine with only 2GB of RAM

was a drag, users could now add more gigabytes of RAM to their PCs

and store data in relational databases which could be queried much

faster than before.

In-memory databases have become much more prominent in recent

years. However OLAP-based solutions can still be found in massive

organization-wide implementations.

How it Works

Generally speaking, a computer has two types of data storage

mechanisms – disk (often called a hard disk) and RAM (random access

memory).

The important differences between them are outlined in the following

table:

DISK RAM

Abundant Scarce

Slow Fast

Cheap Expensive

Long term Short term

Most modern computers have 15-100 times more available disk

storage than they do RAM.

Page 9: Innovations in Business Intelligence Database Technology · 2015-07-21 · However, the requirements of modern business intelligence have set a challenge that in-memory databases

www.sisense.com

However, reading data from disk is much slower than reading the

same data from RAM. This is one of the reasons why 1GB of RAM

costs approximately 320 times that of 1GB of disk space.

In a disk-based RDBMS, there are two things that cause heavy disk

operations and therefore poor performance:

1. Table Scans: Loading of an entire table from disk to RAM (for

calculations)

2. Complex Data: Querying data scattered across many tables

and/or fields (joins)

In-memory technology aims to address both these issues by pre-

loading the entire database into RAM, and loading data from RAM to

the CPU to perform calculations and data retrieval.

All In-memory technologies share the same premise: that it is simply

much faster to perform calculations over data that is stored in RAM

than it is when that same data is stored in a table on a disk. These

technologies also benefit from the fact that 64-bit computers are

currently considered commodity hardware. Additionally, it is relatively

cheaper to add more RAM to both commodity and proprietary

hardware today than it previously was.

Page 10: Innovations in Business Intelligence Database Technology · 2015-07-21 · However, the requirements of modern business intelligence have set a challenge that in-memory databases

www.sisense.com

Illustration: Disk/RAM utilization when querying 2 fields

This technology enables a much faster time to value and significantly

less effort and money invested in developing, setting up and

maintaining analytics infrastructure.

The problem

In-memory technology performs beautifully, at small scales. When

datasets are simple and small, it enables speedy development

compared to a solution built on top of an RDBMS.

However, its main inhibitor to wide enterprise adoption has been

scalability. The challenge it continues to face is that RAM, when used

to store and analyze raw business data, tends to run out quickly and

unexpectedly. As storage sizes go, RAM is tiny and many data sets

Page 11: Innovations in Business Intelligence Database Technology · 2015-07-21 · However, the requirements of modern business intelligence have set a challenge that in-memory databases

www.sisense.com

these days are too large to fit. Moreover, each query to the database

uses up additional RAM for intermediate calculations.

Complex scenarios still require that data be extensively modified, or

even loaded into an RDBMS data warehouse, prior to being loaded

into the memory-based storage. This can happen when data sets are

complex and/or when there are many users querying the database

simultaneously and repeatedly. In such cases, the added value of

such technology is debatable and cost-saving benefits of using it

become less significant.

The fact of the matter is, data sets are getting bigger and bigger, with

companies generating more information than ever – both from

internal sources and from external ones which business executives

look to in order to gain a competitive advantage. This exponential

growth in the size of data has not been mirrored by a similar reduction

in RAM prices – while it is indeed cheaper than it was 15 years ago, it’s

still relatively expensive storage that cannot be scaled indefinitely

without procuring significant costs.

And so, at this point in time it seems that in-memory technology might

just have hit its glass ceiling, and can no longer promise reasonable

performance considering the amounts and complexity of the data that

is currently being gathered, aggregated and analyzed by modern

businesses.

Page 12: Innovations in Business Intelligence Database Technology · 2015-07-21 · However, the requirements of modern business intelligence have set a challenge that in-memory databases

www.sisense.com

ElastiCubes and In-Chip Analytics Summary

Leading Provider:

Sisense

Advantages

Fastest data retrieval

Does not require proprietary hardware or extensive RAM

Full support for ad-hoc queries

Overview

History

You might not have heard of ElastiCubes In-Chip Technology yet, as it

has only been released for commercial use a few short years ago.

However it has already become the data analytics platform of choice

for such companies as eBay, Samsung and NASA and is growing

In-Chip Technology is the latest development in database

technology. It combines the flexibility of in-memory based

querying with the speed and robustness of OLAP cubes, without

the hardware costs and difficult implementation of traditional

solutions. Although only recently developed and released, In-

Chip is quickly gaining popularity due to its increased

performance and ability to tackle complex and large data sets.

Page 13: Innovations in Business Intelligence Database Technology · 2015-07-21 · However, the requirements of modern business intelligence have set a challenge that in-memory databases

www.sisense.com

rapidly as an alternative and solution to the limitations imposed by

traditional OLAP database technologies.

ElastiCube is a unique form of database developed by SIsense, the

result of thoroughly analyzing the strengths and weaknesses of both

OLAP and in-memory technologies, while taking into consideration

the off-the-shelf hardware of today and tomorrow.

The vision was to provide a true alternative to OLAP technology,

without compromising the speediness of the development cycle and

query response times for which in-memory technologies are lauded.

This would allow a single technology to be used in BI solutions of any

scale, in any industry.

How it Works

In-Chip Analytics is the latest generation of in-memory technology for

business analytics and sets itself apart by being fast as well as

scalable. The name ElastiCube comes from the database’s unique

ability to stretch beyond the hard limitations imposed by older

generation technologies.

This technology employs a disk-based columnar database for storage

to provide fast disk reads and is able to load data from disk to RAM

(and vice versa) when is needed. The queries themselves are

processed entirely in-memory without any disk-reads throughout.

And most importantly, there is only a subset of the data physically

stored in RAM at any given time, leaving more space for other

operations to take place in parallel – in other words, RAM limitations

are not as big an issue as with previous in-memory technologies, as

there is no need to keep the entire data in RAM on a permanent basis.

Page 14: Innovations in Business Intelligence Database Technology · 2015-07-21 · However, the requirements of modern business intelligence have set a challenge that in-memory databases

www.sisense.com

This is achieved via advanced compression as well as identification of

the parts of the dataset which are not being used on a regular basis

and can be left “at rest” – typically this is around 80 percent of the

data businesses collect.

In-Chip Technology also has a unique way of handling joins. Instead

of joining tables, it uses columnar algebra to merge between fields.

This way, the join operation can be processed entirely in the CPU

cache.

Illustration: Disk/RAM utilization when querying 2 fields

The table below compares between RDBMS technology, In-Memory

technology and Sisense’s In-Chip Technology by a set of several

technical aspects:

Columnar Storage: whether the technology supports storage of

columns rather than tables.

Page 15: Innovations in Business Intelligence Database Technology · 2015-07-21 · However, the requirements of modern business intelligence have set a challenge that in-memory databases

www.sisense.com

In-Memory Query Processing: whether the technology typically

requires reads from disk during query execution

Performance Upon Installation: Fast query response to queries

involving joining, grouping and aggregating data – without lengthy

preparation work or specialized configuration.

Data Capacity: Is there a cap on data capacity beyond what can be

stored on a single hard disk (TBs of data).

Scalability Level: The ability of the technology to support growing

data volumes and concurrent usage without having to significantly

modify/re-build the solution.

Feature RDBMS In-Memory Associative

In-Chip Technology

Columnar Storage Some No Yes

In Memory Query Processing

No Yes Yes

Performance Upon Installation Slow Fast Fast

Data Capacity Unlimited Limited (by

size and RAM)

Unlimited

Scalability Level Large scale Small scale Small / Large scale

In-Chip technology further optimizes data processing by making the

most of the built-in components of today’s 64-bit commodity

hardware. Using algorithms that run beneath the OS and replace its

set of instructions, In-Chip manages to utilize the CPU to its fullest,

thus achieving unparalleled performance rates – even on huge,

complex data sets that would previously have required massive

hardware upgrades to even consider handling.

Page 16: Innovations in Business Intelligence Database Technology · 2015-07-21 · However, the requirements of modern business intelligence have set a challenge that in-memory databases

www.sisense.com

Illustration: Latencies of CPU cache, RAM and disk storage

Summary: The Future of Databases? We’ve reviewed three major database technologies employed by BI

software in the past few decades: OLAP cubes, in-memory databases,

and up and coming In-Chip Analytics.

As we have seen, both OLAP and in-memory technology suffer from

scalability issues, and there are significant doubts as to their ability to

provide a reasonable solution for the requirements of 21st century

business intelligence, in terms of data size, complexity, and cost to

implement.

In-Chip Technology is currently the most advanced way to store and

query data in rapidly changing business environments, and is

Page 17: Innovations in Business Intelligence Database Technology · 2015-07-21 · However, the requirements of modern business intelligence have set a challenge that in-memory databases

www.sisense.com

expected to be adopted by more and more companies in coming

years.

Want to learn more about In-Chip technology?

Visit sisense.com Join a Sisense Analytics Expert for a Weekly Live Demo of In-Chip

technology at work

Questions, notes, or comments on the contents of this document? We’d love to hear them! Contact us