abap performance guidelines using the sap hana database_sql_hour

Dr. Michael Rüter, Performance & Scalability, SAP AG

07/2012

Performance Guidelines for ABAP using the SAP HANA Database

Confidential

© 2012 SAP AG. All rights reserved. 2Confidential

Agenda

Guidelines Status quo – current guidelines with classic DBs HANA innovations How is ABAP effected Adapted guidelines for Hana

For details check the recording + slide desk from a PUMA session I gave recently:

https://wiki.wdf.sap.corp/wiki/display/TIPCrossPM/Recordings+of+Previous+HANA+Sessions

Search for my name (once written Rueter and once Rüter)

https://wiki.wdf.sap.corp/wiki/display/TIPCrossPM/Recordings+of+Previous+HANA+Sessions


Open SQL Statements in OLTP

Typical characteristics of OLTP SQL statements: Mainly SQL statements reading all attributes from tables with many columns

without any aggregation but with very selective WHERE condition SELECTs are supported by many dedicated DB indexes designed by the

application developers for a traditional DB system Most of the SQL statements are no joins or very simple joins (header-item) INSERTs and UPDATEs changing very few rows IN or OR lists due to FOR ALL ENTRIES


Open SQL Statements in OLTP

Existing performance guidelines for traditional databases (as defined in the product standard “Performance”, e.g.) are mostly still valid for HANA.

These common “SQL best practices” or “rules of thumb” are summarized on the following pages.


SQL Best Practices: Keep the Result Set Small

Don’t retrieve rows from the database and discard them on the application server using CHECK or EXIT, e.g. in SELECT loops.

Make the WHERE clause as specific as possible.


SQL Best Practices: Minimize Amount of Transferred Data

Use SELECT with a field list instead of SELECT * in order to transfer just the columns you really need.

Use aggregate functions (COUNT, MIN, MAX, SUM, AVG) instead of transferring all the rows to the application server.


SQL Best Practices: Reduce the Number of Round Trips

Use JOINs and / or subqueries instead of nested SELECT loops.

Use SELECT … FOR ALL ENTRIES instead of lots of SELECTs or SELECT SINGLEs.

Use array variants of INSERT, UPDATE, MODIFY, and DELETE.


SQL Best Practices: Keep Load Away From the Database

Avoid reading data redundantly.

Use table buffering (if possible) and don’t bypass it.

Define appropriate secondary indexes specific for HDB if needed at all.


Agenda



Innovations in Hardware and Software Technology

Innovations inhardware technology

~1 TB RAM / server

Throughput 100GB/s

Significant reduction in cost / GB

Multi-Core architecture

Massive scaling with blade servers

Row and column based data store

Compression

Partitioning

No pre-calculated aggregates

Innovations inSAP software technology

SAP HANA’s strength comes from innovations in hardware and software

It is important to know these for understanding the performance


Technology / Architecture: Multi Core

Leverage the Multi-Core Architecture To obtain a minimal runtime of your application in HANA

you want to leverage the multi core architecture that enables a high degree of parallelization.

If already one user utilizes the server by 100% the runtime will already increase as soon as only two users run in parallel.

Keep this in mind defining runtime KPIs and for sizing of your application.


Technology / Architecture: Row vs. Column Store

Row Store versus Column Store: Usually applications on HANA want to run OLTP and

OLAP load on the same data.

For that it is needed to have the important application master and transactional data stored in the column store.

For tables not needed for OLAP, for frequently changed tables, or tables containing unstructured data (LOBs,..) the row store might be the better choice. Prominent example are queue tables, meta data, number range intervals, message payload,…


Technology / Architecture: Partitioning

Typical Use Cases for Partitioning: Column tables limitation: Max. 2 billion records can be stored in

one table without partitioning. With partitioning 2 billion rows can be stored in each partition currently

Scale-out: Using several servers the partitions of one table may be distributed over the landscape. Queries could be processed on multiple servers

Parallelization: Operations are better parallelized by using several threads working on different partitions in parallel

Partition pruning: Scans are done on relevant partitions only and the load is reduced. Precondition is, that the query must match the given partition specification


Technology / Architecture: Compression

Compression: The column store representations allows a very good

compression of typical business data (using dictionary compression). Typically you can expect factor 3-10 for compression compared to a traditional RDMS depending on the structure of the data.

This reduces the penalty of a de-normalized data model compared to a normalized one because the redundant data in the de-normalized model is compressed very well. Taking also into account that joining many tables is usually slower than accessing one table you should consider whether a fully normalized data model is the best solution. Be aware that on the other hand insert/updates might become slower in a de-normalized model.

Because GUIDs have a very bad compression, check especially in a normalized data model with many tables, if using GUIDs is necessary everywhere.


Technology / Architecture: No Redundant Data

No Pre-Calculated Aggregates: Avoid as much redundant data as possible. With HANA

there might be no need anymore for index like tables, materialized views, totals/sum tables, aggregates.

The avoidance of that data helps performance/scalability mainly when creating or changing data. But check that read performance is still sufficient.

Don’t forget possible archiving or data deletion requirements

Especially in scale-out scenarios totals/sum or aggregate tables can harm the performance a lot

Avoiding redundant data also helps to minimize the needed memory in HANA. Keep in mind that the memory for the HANA server is the most important cost driver for the hardware at the moment.


Technology / Architecture: Architecture of HANA

It is important to know the high level architecture of the HANA to understand the most important performance characteristics.


Agenda

Motivation and Basics


Q&A, discussion, outlook


Technology / Architecture: In-Memory / SAP Table Buffers / Other Buffers

Which DB tables are in memory? (all, or manual classification like for the table buffer)?

Per se all DB tables are buffered on HANA (this is managed by Hana - no classification necessary).

Are SAP table buffers still needed with HANA?

Yes, the table buffer is still necessary Using the classic table buffer is about 10 times

faster compared to a HANA in-memory access. There is no change in the rules when to use table

buffering.

What about other buffers?

No changes compared to the classical world


Technology / Architecture: In-Memory / SAP Table Buffers / Other Buffers – Access Times

What are standard execution times (single record)?

Old rules are still valid

• Itab is factor 10 faster than table buffer

• Table buffer is factor 10 faster than DB cache / HANA

• DB cache / HANA is factor 10 faster than DB disk access


Technology / Architecture: Indexes in column store

Are indexes still needed in the column store?

No secondary indexes are needed as a default

• Exceptions:• Very large tables and specific WHERE clauses

may need additional indexes

• Indexes may be created on selective columns, no concatenated indexes (HANA can handle several indexes in parallel)


Technology / Architecture: DDIC View / HANA Views

When to use HANA views or DDIC Views?

If extra HANA specific features are not needed DDIC views should be used.


Technology / Architecture: Code Push down

Which part of ABAP Open SQL should be rewritten in procedures and pushed down to HANA?

all ABAP programs that result in high traffic between DB and appserver

• Experience shows that normally not some small ABAP code segments are replaced by a DB procedure but a complete report or a complete business functionality is pushed down to Hana. (see appendix)


Agenda

Guidelines Status quo – current guidelines with classic DBs HANA innovations How is ABAP effected Adapted guidelines for HANA


All known "golden rules" are still valid:

- Restrict the result set of your SELECT - Only select the data / fields you really need - Use array operations instead of single operations

There is only a shift in priority for certain rules: In a column store and in-memory DB the following SQL statements are accelerated:

- All statements that involve physical I/O on traditional databasesE.g. WHERE clauses accessing DB table fields without index support

- All statements that scan a large dataset and provide a small result set E.g. all kinds of aggregations (group by, count, sum, avg, having …, �big full table / full index / range scans with a small result set

In a column store the following SQL "mistakes" are even more expensive - Retrieving unneeded columns (SELECT *) - Frequent single row based access (e.g. SELECT SINGLE in a LOOP, nested SELECTs)

Adapted Guidelines for HANA


OLTP SQLs: Index Support

Rule Comment

Index support for selective WHERE clauses

Not as important as it used to be due to the changes in architecture (column store, compression). Indexes might still be needed though.

Indexing in HANA is still under investigation.


OLTP SQLs: Filters

Rule Comment

No CHECK or EXIT in SELECT …ENDSELECT loops.

SELECT f1 FROM ….

CHECK. / EXIT.

ENDSELECT.

SELECT f1 FROM … WHERE … INTO...

Negative effect on HANA can be even worse than on traditional DBs especially if a lot of fields are selected (see next slide). All filters should be pushed down to HANA as early as possible.

Statement should be rewritten, might be a candidate for a procedure.


OLTP SQLs: Field lists

Rule Comment

Restrict the field list in SELECT (no SELECT *)

SELECT * …

SELECT f1, f2 WHERE f7 = ?;

---

SELECT f1, f2 WHERE f7 = ?;SELECT f3, f4 WHERE f7 = ?;

SELECT f1, f2, f3 ,f4 WHERE f7 = ?;

Negative effect on HANA can be even worse than on traditional DBs.

Statement should be rewritten, might be a candidate for a procedure.

However statement executions are more important than fields. All fields that are needed should be selected in one SQL statement.


SELECT FROM… WHERE … UP TO n ROWSRun time with varying number of rows and columns

0 50 100 150 200 250 300 3500

20

40

60

80

100

120

140

160

10 rows

100 rows

1000 rows

number of selected columns

run time [ms]

Conclusion:

The more rows are selected, the more important becomes the optimization for field lists. Large factors (>20) are possible for 1000 rows.


OLTP SQLs: Number of Executions

Rule Comment

Avoid SELECTs in LOOPs and nested SELECTs. Strive to use array SELECTs, Joins, Subqueries, FAE, Views.

SELECT … FROM t_head … WHERE SELECT… FROM t_item … WHERE ENDSELECT.ENDSELECT.

SELECT … FROM t_head JOIN t_item ON …WHERE…


Statement should be rewritten into one statement, might be a candidate for a procedure.


OLTP SQLs: Aggregations

Rule Comment

Use SQL aggregations instead of

SELECT … FROM table … WHERE … sum = sum + table-amount.ENDSELECT.

SELECT SUM(amount) … FROM table WHERE…


Statement should be rewritten using aggregations, might be a candidate for a procedure.

Keep in mind that buffered tables should be read from the SAP table buffer.


OLTP SQLs: Table buffer

Rule Comment

Use the SAP table buffer, avoid bypassing the table buffer.

No change on HANA.


OLTP SQLs: FOR ALL ENTRIES

Rule Comment

Ensure that the driver table of a FOR ALL ENTRIES statement is never emptyand does not contain duplicates.

No change on HANA.

Keep in mind that a JOIN or a subquery is often the better alternative and should be preferred over the FOR ALL ENTRIES.


OLTP SQLs: FOR ALL ENTRIES

Rule Comment

Avoid DB changes in LOOPs. Negative effect on HANA can be even worse than on traditional DBs.

Statement should be rewritten using arrays, might be a candidate for a procedure.

Thank You!

Contact information:

SAP AG, Walldorf


Appendix

Appendix

Partitioning Specifications

Application Design


Technology / Architecture: Partitioning

Partitioning Pros and Cons: If it is possible to use time-based partitioning for huge

tables, this shall be done. This involves active partition management by the application.

Use as many columns in the partitioning schema as required for good balancing, but try to use only those that are typically specified in the query.

Unfortunately this is often hard to tell if not impossible. In the worst case only single selects may leverage pruning.

Beware: If there is a unique constraint on a non-key column or non-

primary key, the performance suffers exponentially with the number of partitions on other servers. Therefore, if partitioning is required, consider a low number of partitions.

If joins are used frequently in scale-out scenarios, partitioning implies a performance penalty.


Range Partitioning

CREATE COLUMN TABLE mytab (a INT, b INT, c INT, PRIMARY KEY (a,b))PARTITION BY RANGE (a)(PARTITION 1 <= VALUES < 5, PARTITION 5 <= VALUES < 20, PARTITION VALUE = 44, PARTITION OTHERS

)

Create dedicated partitions for ranges of certain values (e.g. year) In-depth knowledge of table data is required Partitions can be created / dropped individually Not suited for load distribution When inserting/modifying rows, the target partition is determined by the defined ranges When having primary keys, only these columns can be used Works only on data types: string, integers and dates


2009 2010 2011 *

Sales

RANGEYear



Hash Partitioning

CREATE COLUMN TABLE mytab (a INT, b INT, c INT, PRIMARY KEY (a,b)) PARTITION BY HASH (a, b) PARTITIONS 4

Distribute evenly rows to partitions for load balancing In-depth knowledge of table data is NOT required Specify the partitions columns When having primary keys, only these columns can be used Use as many partitioning columns as required to achieve a good variety of values for an

even distribution and to fit typical query requests

Part 1 Part 2 Part 3 Part 4

Sales

HASHYear



Round-Robin Partitioning

CREATE COLUMN TABLE mytab (a INT, b INT, c INT) PARTITION BY ROUNDROBIN PARTITIONS 4

Distribute evenly rows to partitions for load balancing Table must not have primary keys No need to specify the partitions columns No pruning possible Depending on the scenario it is possible that the data within semantically related tables

resides on the same server

Part 1 Part 2 Part 3 Part 4

Sales

ROUND ROBIN



Multi-Level Partitioning

CREATE COLUMN TABLE mytab (a INT, b INT, c INT, PRIMARY KEY (a,b))PARTITION BY HASH (a, b) PARTITIONS 4 SUBPARTITION BY RANGE (c)(

PARTITION 1 <= VALUES < 5,PARTITION 5 <= VALUES < 20 );

No key column restriction on 2nd level partitions Max 2 Levels

2009 2010 2011 2012

Part 1

Sales

2009 2010 2011 2012

Part 2

2009 2010 2011 2012

Part 2

HASH-RANGERegion, Year


Adoption of Application DesignGeneral Guidelines

Try to avoid updates and use HANA as an ‘insert only’ database

No cluster tables needed anymore due to good column store compression

Some buffers on application server are still needed due to performance reasons Table buffer for customizing or meta data

Transactional buffers for master and transactional data

Shared objects for buffering more complex entities

Like with a traditional DB try to minimize the communication overhead between the application and the HANA. Try to minimize the number of roundtrips and the data volume transferred. If mass data is transferred evaluate if code push down can be used to optimize.


Adoption of Application DesignCode Push Down

To leverage the performance of HANA application logic has to be pushed to the DB.


Adoption of Application DesignAn example for code push down: Liquidity Forecast in ByD

Measurements

• Runtime of „old“ Liquidity Forecast Run vs. new SQL script based implementation.

Improvements / Key Messages

• Runtime performance improved by factor 100+• Business Object „Liquidity Forecast” and Mass Data Run

Object “Liquidity Forecast Run” became obsolete

Data Volume- execution based on 18.000+ register entries

Figures

Before(implementation via mass data run object on MaxDB/TREX)

Net runtime of background job(scheduling activities excluded) ~232 seconds

After (implementation via SQL Script on HANA)

Total runtime of SQL Script ~2 seconds

5000 18000 50000 5000000.1

1

10

100

1000

10000

Before(backgroundjob)

After(SQL script)

# register entries

sec


Liquidity Forecast – Old Solution without Code Push Down

MaxDB

Trade Receivables/Payables Register


Tax Receivables/Payables RegisterTax Receivables/Payables Register Payment RegisterPayment Register Expected Liquidity

ItemExpected Liquidity

Item

FSI/TREX

Liquidity Forecast BOLiquidity Forecast BOLiquidity Forecast Creation RunLiquidity Forecast Creation Run

Create instance execute actions

execute actions / retrieve data

MDAVMDAV

1. schedule run

2. launch analytics report


1. schedule run

Liquidity Forecast – New Solution with Code Push Down

FSI/TREX

Liquidity ForecastLiquidity ForecastLiquidity Forecast Creation RunLiquidity Forecast Creation Run

Create instance execute actions

Virtual MDAVVirtual MDAV

launch analytics report

HANA



Tax Receivables/Payables RegisterTax Receivables/Payables Register Payment RegisterPayment Register Expected Liquidity

ItemExpected Liquidity

Item

execute actions / retrieve data

Direct Access

SQL Script


Adoption of Application DesignGeneral Remarks about Code Push Down

Significant performance improvement for OLTP like scenarios can only by expected by optimizing the application logic (push down to HANA).

The performance improvement can be several orders of magnitude.

The possible performance boost is especially huge for processes working on mass data.

Performance analysis and tuning of applications pushed to the HANA is not easy. A lot of try and error is still needed.

Getting high degree of parallelization for free from the HANA is not guaranteed. You might still have to take care of that within you application design.

Try to avoid imperative programming but use declarative programming.

abap performance guidelines using the sap hana database_sql_hour

Documents

hana2012 sap ag

entries2012 sap ag

performance2012 sap

sap hana database confidential

common sql best practices

classic dbshana innovations

performance scalability

use table