sap hana in memory appliance

SAP HANA In Memory Appliance

TABLE OF CONTENTS

1. In Memory Computing

1.1. Move from After-Event Analysis to Real-Time Decision Making

2. SAP In-Memory Appliance (SAP HANA)

2.1. SAP HANA In Memory Computing Engine & Surroundings

2.2. Modelling & Data Loading into SAP HANA

3. SAP High-Performance Analytic Appliance 1.0 Overview

3.1. Technical Overview – Request Processing and Execution Control

3.2. Calc Engine

3.3. SAP HANA Replication Technologies

3.4. Row Store

3.4.1. Row store Architecture/Block diagram

3.4.2. Row Store Architecture Operations flow

3.4.3. Indexes for Row Store tables

3.5. Column Store

3.5.1. Column Store Operations flow

3.5.2. Delta management in Column store

3.6. Persistence Layer

3.7. Modeling

3.8. SAP HANA In memory Computing Studio

3.8.1. SLT integration in SAP HANA

3.8.2. Information Modeler Terminology

3.9. HANA Modeling Process Flow

4. SAP HANA Backup and Recovery

5. HANA Proof of Concept – Oil & Gas Industry

1

SAP HANA – High Performance Analytical Appliance

1. In Memory Computing:

Technology that allows the processing of massive quantities of real time data in the main memory of the server to provide

immediate results from analyses and transactions. In memory computing leverages multicore architectures equipped with

large volumes of memory in direct access, to crunch through very large volumes of data virtually in seconds. Special CPU

cache conscious data structures and parallelized algorithms have been designed by SAP to fully leverage the hardware in

order to deliver extreme performance for a new generation of information rich enterprise applications. In memory computing is

delivered via the SAP HANA appliance which incorporates the SAP HANA database.

Advantages:

Loads of real time data along with historical data with high data throughput which helps users to Flexible real time

Analytics, Improved Business Performance and competitive advantage.

Make Better Decisions Faster - This allows new ways to look at the business based on instant /immediate access to

relevant information then react quickly based on real time information which leads less reliance on IT to gain insight

needed.

Enable Innovative New Applications - Combine high volume transactions with analytics for improved BI, this

Accelerate transactional and operational systems for real access and improved decision making which Enable

planning and forecasting applications based on real time operational data combined with analytics.

Reduce IT Burden and Mitigate risks - SAP HANA Applications dramatically reduces hardware and maintenance

costs, In Memory solutions based on proven mature technology which are easy to implement, on disruptive and fast

to implement.

In memory can also help to improve the profitability analysis (CO/PA) by significantly reducing report run times.

In memory computing can also be used for agile data marts, enabling business departments to gain a higher level of

flexibility when creating add-hoc data marts for specific business problems.

Note: Dramatically improved hardware economics and technology innovations in software has now made it possible for SAP to

deliver on its vision of the Real-Time Enterprise with In-Memory business applications

2

1.1. Move from After-Event Analysis to Real-Time Decision Making – with SAP ICE (In Memory Computing Engine)

The persistently increasing quantity of data from enterprise applications as well as from the web is a great opportunity but also

a challenge at the same time. Comprehensive data from different sources, such as operational systems, data warehouses and

the web, enable intensive analysis, but it is difficult to manage and often causes unacceptable response times. And time is

money! In fact, slow access even prevents businesses from analyzing data, which would allow them to make informed

decisions. Some interesting queries would not just take hours, but even days. And once the result is there, it is too late for

immediate reaction, as the background data has already changed.

SAP ICE helps to overcome such hurdles, as huge amounts of real-time data can be processed in the main memory of a

server, thus dramatically accelerating data access for analysis. From the business point of view this enables faster decisions

based on in-depth data analysis.

In our case SAP HANA, based on innovative in-memory technology, does not simply accelerate data access, but provides a

quantum leap in data analyses by giving you access to transactional data at your fingertips. And SAP HANA completely

changes the way in which data can be used.

2. SAP In-Memory Appliance (SAP HANA)

SAP HANA is a modern platform for real-time analytics and applications. It enables organization to analyze business

operations based on large volume and variety of detailed data in real-time, as it happens. SAP in-memory computing is the

core technology underlying SAP HANA platform.

SAP HANA is a combination of hardware and software specifically made to process massive real time data using In-Memory

computing. It is an in-memory engine with database, data integration and aggregation capabilities for analyzing operational

and transactional databases.

SAP HANA is a flexible, data-source-agnostic appliance that allows customers to analyze large Volumes of SAP ERP data in

real-time, avoiding the need to materialize transformations.

SAP HANA integrates a number of SAP components including the SAP In-Memory Database, Sybase Replication technology

and SAP LT (Landscape Transformation) Replicator.

3

SAP In-Memory Database - The SAP In-Memory Database is a hybrid in-memory database that combines row-based, column-

based, and object-based database technology. It is optimized to exploit parallel processing capabilities of modern multi

core/CPU architectures. With this architecture, SAP applications can benefit from current hardware technologies.

The SAP In-Memory Database is at the heart of SAP offerings like SAP HANA that help customers to improve their operational

efficiency, agility, and flexibility.

The appliance is designed to facilitate the integration into existing compute centers. It uses standard communication protocols

such as ODBC and JDBC to communicate with other systems.

In addition to real-time analytics, SAP is also delivering new class of real-time applications, powered by SAP HANA platform.

The platform can be deployed as an appliance or delivered via the cloud.

Suppose assume that HANA is an engine of a car and BW is body of the Car, So HANA would be inside the OLAP BW

System for business operations/needs/analysis. For now HANA is in ramp-up phase so it has to be tested with all sorts of Data

bases to bring SAP HANA to SAP customers from Proof of Concept (POC) to Go-Live. Because of these reasons still we have

no idea of the body of the car.

4

2.1. SAP HANA In Memory Computing Engine & Surroundings:

SAP HANA is a preconfigured out of the box Appliance

In memory computing engine

In memory computing studio as a frontend for modelling and administration.

HANA is connected to ERP systems, Frontend modelling studio can be used for load control and replication server

management

Two types of Relational Data stores in HANA : Row Store, Column Store

SAP BOBJ tools can directly report HANA

Data from HANA can also be used in MS Excel

Row Store – Traditional Relational Database, the difference is that all the rows are in memory in HANA where as they

are stored in a hard drive in traditional databases.

Column Store – The data is stored in columns like in SAP BWA

Persistency Layer: In memory is great by it is volatile and data can be lost with power outage or hardware failures. To

avoid this HANA has a Persistency Layer component which makes sure that all the data in memory is also store in a

hard drive which is not volatile

Session Management: This component takes care of logon services

Two processing engines – Well, data is in memory which is good but How do I extract/report on the data? HANA has

two processing engines one is based on SQL which accepts SQL quires and the other one is based on MDX .

HANA Supports Sybase Replication Server – Sybase Replication Server can be used for real-time synchronization of

data between ERP and HANA

5

http://www.erphowtos.com/component/jooanswers/details/What-is-MDX-_694.html

http://www.erphowtos.com/sap-business-objects.html

2.2. Modelling & Data Loading into SAP HANA:

Modelling in HANA can be done in following ways.

Specify which tables are stored in HANA, first part is to get the Meta data and then schedule data replication jobs &

load the data into HANA using Replication Server.

Manage Data Services to Model & load the data from SAP BW and other 3rd party systems.

Manage connections to ERP instances, current release does not support connecting to several ERP instances

Do modelling in HANA- in memory studio itself (This is independent of Data services).

You can also do modelling can also be done in Business Objects Universes which is nothing but joining fact and

dimensional tables.

Reporting:

Client tools can access HANA directly; Like MS EXCEL, SAP BI 4.0 Reporting tools, Dashboard Design Tool

(Xcelsius) etc can also access HANA directly.

Third party reporting tools can leverage ODBC, JDBC and ODBO (for MDX requests) drivers in HANA for reporting.

HANA supports BICS interface

Note: Administration Part can be managed from SAP HANA studio for memory issues.

3. SAP High-Performance Analytic Appliance 1.0 Overview.

SAP HANA in memory Appliance:

An appliance for processing high volumes of transactional data in real time

Includes tools for data modeling, data and lifecycle management, security, operations

Provides support for multiple interfaces based on industry standards

6

http://www.erphowtos.com/component/jooanswers/details/What-is-BICS-connection-_674.html

http://www.erphowtos.com/sap-business-objects.html

Features:

In-Memory software bundled with hardware delivered from the hardware partner (HP, IBM, CISCO and Fujitsu).

In-Memory Computing Engine.

Tools for data modeling, data and life cycle management, security, operations, etc.

Real-time Data replication via Sybase Replication Server.

Support for multiple interfaces.

Content Packages (Extractors and Data Models) introduced over time.

Analyze information in real-time at unprecedented speeds on large volumes of non-aggregated data.

Create flexible analytic models based on real-time and historic business data.

Foundation for new category of applications (e.g., planning, simulation) to significantly outperform current applications

in category.

3.1. Technical Overview – Request Processing and Execution Control

In SAP HANA every query execution would follow few step depends upon the query type, In memory computing uses

Calculation models which gives extreme performance and flexibility with calculations on the fly.

7

A calc model can be generated on the fly based on input script ,also defines parameterized calculation schema for highly

optimized reusable queries Because Calculation model supports all types of scripted operations.

Once SQL, MDX statements are passed to calculation models Optimizer which is included in calculation engine optimizes the

input statements for better performance.

From above architectural diagram, it consists of multiple interfaces (SQL Script, MDX and planning engine interface) for

multiple query types. All these Domain-specific programming languages or models converted into calculation models.

But a standard SQL processed directly by DB engine.

After a Calculation model has been defined Calculation Engine will create a logical execution plan for calculation models,

means defines the priorities for the steps in operations and execute user defined functions.

Relational Engine is an in memory DB property which is mainly needful for physical execution plan,

DB optimizer which is a part of Relational Engine will produce physical executing plan. Here performance issues, turnaround

time will be considered for query execution.

3.2. Calc Engine:

Query execution flow will be defined here, the operations in the query will be executed depends the priorities of the instructions

in the query. No matter the priorities of the query instructions System will use maximum resources to achieve max through put.

The easiest way to think of Calculation Models is to see them as dataflow graphs, where the modeler can define data sources

as inputs and different operations (join, aggregation, projection…) on top of them for data manipulations.

The Calculation Engine will break up a model, for example some SQL Script, into operations that can be processed in parallel

(rule based model optimizer). Then these operations will be passed to the database optimizer which will determine the best

plan for accessing row or column stores (algebraic transformations and cost based optimizations based on database

statistics).

8

Note: Planning Engine Will be included in next release. Will include planning functions like distribute and copy functions.

Example SQL Function Execution:

CREATE FUNCTION FUNC1 ( IN p1 INT, IN T1 ttype1,In T2 ttype2, OUT outtab TYPE2)

BEGIN

V1 = SELECT C,D FROM @T1@ WHERE D > @p1@; // QUERY 1

V2 = SELECT A,B FROM @T2@ WHERE B < 1000; // QUERY 2

CALLS FUNC2(@v2@,v3);

V4 = SELECT c,f FROM @V1@, @V3@ WHERE b > 0; //QUERY 3

CALLS FUNC3(@V4@, outtab);

END

3.3. SAP HANA Replication Technologies:

For Analysing & Reporting on top of SAP HANA data has to be replicated from Source System to SAP In memory database.In

SAP HANA supports 3 types of replication methods.

9

Trigger-Based Replication- can be done using Standard SAP Netweaver LandscapeTransformation Replicator based on

capturing database changes at a high level of abstraction in the source ERP system.Here once the data loading is started the

changes to source system will be captured parallally with the replication process.

ETL-Based Replication- it uses SAP Business Objects Data Services to specify and load the relavant business data into SAP

HANA.3rd Pary data providers can be integrated using this method.

Log-Based Replication- It uses Sybase Replication method based on capturing table changes from low level database log

files. Database changes are propagated on a per database transaction basis, which are then replayed on the IMDB to

maintain consistency.

3.4. Row Store:

Row store is one of the relational engines to store data in row format. It is interfaced from calculation/ execution layer in HANA

Architecture. It is a pure in memory store where data persistence is managed in persistence layer.

Note: Page Management is executed in Persistence layer, so mapping between indexes and data volumes are done.

3.4.1. Row store Architecture/Block diagram:

In Row Store Block diagram we have 5 key components.

Transactional version memory is the heart for row store; it contains temporary data versions which are useful for database

operations like Write, Insert, Read, etc. All write operations mainly goes into Transaction version memory. All write operations

are INSERTed into Persistent Segment, This moves all the visible version from memory to persistence segment permanently

then these outdated entries will be cleared from Transaction Version memory. Transactional Version memory is needed for

Multi-Version Concurrency Control (MVCC).

Note: MVCC is a concurrency control method commonly used by database management systems to provide concurrent

access to the database and in programming languages to implement transactional memory.

Segments are physically storage area that contains the actual data (contents of row store tables) in pages. Note: Pages are

fixed length storage locations

Page Manager is a process which manages memory allocation for Pages in Segment Area. It keeps tracks of used / free

pages for row store table data.row store tables are liked list of memory pages and these pages are grouped into segments.

Version Memory Consolidation works like a Garbage collector for MVCC.

10

Persistence Layer will be invoked when WRITE operation are done by Transactional Version memory. This layer allows us to

perform save points for Database Operations.

3.4.2. Row Store Architecture Operations flow:

Write Operations mainly goes to Transactional Version Memory and also INSERT writes to Persisted Segment.

Persisted Segment contains data that may be seen by any ongoing transactions and holds the data that has been committed

before any active transaction started.

Version Memory Consolidation moves “visible version” from Transactional Version memory into Persisted Segment based on

commit ID.It clears the outdated record version from Transactional Version Memory.

3.4.3. Indexes for Row Store tables:

Each row store table has a primary index which points to ROWID of row store table.ROWID consists of segment & Page ID for

respective records. Using the Segment & page ID records will be searched in persisted segment layer. These indexes will be

created on-the-fly when system loads tables into memory when system startup so these indexes are volatile. All these index

table definitions are stored in row store table metadata. Secondary indexes can be created if needed.

11

3.5. Column Store:

Column store is one of the relational engines. It is interfaced from calculation/ execution layer in HANA Architecture. It is a

pure in memory store where data persistence is managed in persistence layer.

Column store engine significantly improves the read functionality and write functionality aswell.Data in column store is highly

compressed. Column store doesn’t contain real data file virtually access to real files. Column store consists of Optimizer &

Executor which handles Queries and execution plan.

In column store engine mainly we have two components Main Store & Delta Store.

Main Store is highly compressed & read optimized so data will be read from Main Store itself.

Delta Store is mainly used for fast Write operations. The data between these two layers will be merged asynchronously. This

Asynchronous merge will move the data from Delta store to Main store.

3.5.1. Column Store Operations flow:

As we know Column store has two storage areas (Main & Delta) enables high compression and high write performance at the

same time.

Write operations are done in Delta store, this update is performed by inserting a new entry into the delta storage.

Compression in Main store is done by creating dictionary and applying further compression methods. This speeds up data

load into CPU cache & search operations. This compression is performed during delta merge operation.

Read operations are done from both main & delta store then merge the results, this engine uses multi version concurrency

control (MVCC) to ensure the consistent read operations.

3.5.2. Delta management in Column store:

Delta merge operation move the changes ( new/change data ) in delta storage into the compressed and read optimized main

storage.This operations is done asychronously.

12

Even during the merge operation, the columnar table will still be available for read and write operations. To fulfil this, a second

delta and main storage are used internally.

Note : This merge operation can also be triggered manually with an SQL statement.

3.6. Persistence Layer:

Persistence layer is needed because main memory is volatile.Persistence layer provides backups and restore functionality

during database restart , power outage etc, So data will be stored in non-volatile way.

One persistence layer takes care about row store and column store in in memory computing engine.Persistence layer provides

Regular “savepoints” that provides full persisted image of DB at the time of savepoint, Logs capturing all DB transactions since

last save point (redo logs and undo logs written) restore DB from latest save point onwards and ability to create "snapshots"

used for backups.

System Restart and Population of In-memory Stores:

During the system restart Last save point must be restored plus undo logs must be read and uncommitted transactions saved

with last save point and apply redo logs and complete content of row store is loaded into memory during start process.

Flags are set for column store to specify which tables are loaded during system restart.Only tables set flag are loaded into

memory during startup, if table is set flag for loading on demand the restore procedure is invoked on first access.

From the above diagram At the time of system crash the transactions (Transaction T1) which are commited but not all records

were stored into system will require redo operation, and the transactions not committed ( Transaction T2) will require undo

operations means no record will be added to system out of Transaction T2,so whole Transaction T2 has to be commited

again.

3.7. Modeling:

Basically we have two relational data stores in HANA, Row store and Column store, But out of two Modeling is possible for

column tables only. Information Modeler which is key component in HANA studio for modeling is compatible with column

tables because two reasons

Replication server creates tables in column store per default

Data services create tables in column store per default

SQL statements can define column table definitions like CREATE COLUMN TABLE, ALTER TABLE etc.

13

System generated table are stored where they best fits, Administrative tables , schema definition tables and statistics server

table will be created in row store .

Few Administrative tables in column store:

Schema _SYS_BI -> metadata of created views + master data for MDX

Schema _SYS_BIC -> some generated tables for MDX

Schema _SYS_REPO ->e.g. lists of active/modified versions of models

3.8. SAP HANA In memory Computing Studio:

This place where we model the databases, Information Modeler/Composer which runs on java based eclipse tool.

Information modeler/Composer has some pre-defined feature for modeling the database, It can support/allows different

database views and allows to publish or consume at 4 levels of modeling they are Attribute view, analytic view, analytic view

with enhanced Attribute view and calculation view.

These information models are just virtual definitions don’t store physical data, but information modeler allows loading physical

data into it.

Information modeler/Composer allows import/export data source schemas, mass and selective loads. It allows data

provisioning for SAP business Applications which allow loading / replicating the Applications.

3.8.1. SLT integration in SAP HANA:

SAP Landscape Transformation is a procedure to get the SAP Source data into HANA means entire SAP BW data sources

will be replicated into HANA studio for modeling. Upon fetching the data sources into HANA once model it as business needs.

To propose a SAP BI system on HANA, BI should be in at least Net weaver 7.3 version.

Note: For modeling in HANA studio SQL script language needed.

3.8.2. Information Modeler Terminology:

Data in Information Modeler can be represented by Attributes and measures.

Attributes are nothing but descriptive data can be thought of characteristics in SAP BE terminology, Measures are data that

can be quantified and calculated know as key figures in SAP BW terminology.

Models / Views can be represented by Attribute view, Analytic view and calculation view.

Attribute View – Attributes are modeled using Attribute view. Attribute view can be regarded as a master data table which

further linked to fact table in Analytic view. In attribute view measure can be defined as attribute for modeling. In Simple terms

Attribute view can be treated as Dimension in SAP BW terminology. Attribute view supports left Outer, right Outer, full Outer

and text table joins and it supports all cardinality conditions except N: N.

Analytic View – This can be regarded as Cube where face table Transactional data is connected to attribute view. But here

Analytical view doesn’t contain data rather data will be stored in column store or table view based on analytic view structure.

Attribute’s and measure’s properties can be modified using property tab.

14

All the views are organized in different folders under information modeler Packages.

There are three main views one can select from when previewing data.

Raw Data – table format of data

Distinct Values – graphical and text format identifying unique values

Analysis – select fields (attributes and measures) to display in graphical format

Calculation View:

Here we can define view / models with custom functions and calculations.SQL script will helpful in defining the calculation

view. But SQL script can’t change data any data unlike SQL procedures they are read only.

Hierarchies: Information modeler supports leveled hierarchy structure for having multiple attribute structure and parent child

hierarchy.

3.9. HANA Modeling Process Flow:

Import Source System metadata - Here Physical tables structure are created dynamically (1:1 schema definition of source

system tables). SLT will be done in case of SAP BW environment.

Provision Data – Once metadata has been replicated the tables are loaded with content/data.

Create Information Models – Upon loading the physical data into System modeling will be performed, Information models will

be defined based on business requirement.

Deploy – Once modeling is done Column views are created and activated based on Information model structures. These are

the views which allow fast accessing of business data for Reporting and these view are included with indexes for fast

accessing.

Consume – After Column views are activated they were ready for reporting, depends on the choice of the client tools BICS,

SQL, and MDX.

4. SAP HANA Backup and Recovery:

The SAP HANA database holds the bulk of its data in memory for maximum performance, but still uses persistent storage to

provide a fallback in case of failure.

During normal operation of the database, data is automatically saved from memory to disk at regular save points. Additionally,

all data changes are captured in the log. Data and log are automatically saved to disk at regular save points, the log is also

saved to disk after each COMMIT of a database transaction

After a power failure, the database can be restarted like any disk-based database and returns to its last consistent state by

replaying the log since the last save point.

Following actions will be done while restarting the system after power failure.

15

Last save point is reloaded

Uncommitted transactions are rolled back using the undo information contained in the Save point, committed

transactions are rolled forward using the log

Data is loaded back into memory:

Tables will be loaded slowly (lazy reloading) to keep the restart time short.

Complete content of the row store is loaded, column store tables are loaded if marked for preloading If a table

has been marked for loading on demand, it is reloaded when they are first accessed

Note: While save points and log writing protect your data against power failures, this does not help when the persistent storage

(disk) itself is damaged.

5. HANA Proof of Concept – Oil & Gas Industry

Here the results of the POC were outstanding with HANA technology resulting in lightening fast computation of time and

resource heavy Strategic reports, while providing un-compromising preciseness on the Operational reporting needs.

Eventually summarizes how quicker business decisions and “Speed to Business” are possible using HANA at any level of an

organization.

Oil downstream business is characterized by sales and distribution of fuels, lubricants, bitumen and services to several

customers across different lines of business such as aviation, marine and commercial and retail customers. Considering

Aviation alone, the business volumes looks like depicted below. We are talking about business spanning across 90 countries

about 1200 airports and fuelling an airplane on average every 15 seconds and recording about several millions of sales orders

annually.

Consider a case where as a management team of an Oil company, you like to settle disputes with an airport operator and offer

him same day dispute resolution, which can add a lot of brand value to the company.

Here the aim is to build a reports on HANA which will read enormous volumes of data and to see how Flexibility, Preciseness,

Speed and Granular where needed can be demonstrated.

Dispute Resolution Report:

Sales orders analyzed for disputes against the received POS data.Volumes in errors are detected quickly and disputes

resolution initiated.

Operational report

Note: HANA offers parallel thread based data loading for which incredible speeds have been seen.

Report Execution:

Uploaded records with dispute: 33000

Upload speed: 30 seconds (on 4 parallel threads)

Query Speed: 0.5 seconds

16

References:

http://www.sap.com/platform/in-memory-computing

http://www.sdn.sap.com/irj/sdn/in-memory

http://www.sdn.sap.com/irj/scn/weblogs?blog=/pub/wlg/21575

http://www.sap.com/hana/index.epx

https://www.experiencesaphana.com/community/learn/content

http://www.jonerp.com/ for podcasts/discussions.

17

http://www.jonerp.com/

https://www.experiencesaphana.com/community/learn/content

http://www.sap.com/hana/index.epx

http://www.sdn.sap.com/irj/scn/weblogs?blog=/pub/wlg/21575

http://www.sdn.sap.com/irj/sdn/in-memory

http://www.sap.com/platform/in-memory-computing

sap hana in memory appliance

Documents

calculation execution layer

transactional version memory

concept oil

regular save points

sap bw terminology

transaction version memory

memory computing studio

row store table