abap performance guidelines using the sap hana database_sql_hour
DESCRIPTION
ABAP Performance Guidelines Using the SAP HANA Database_SQL_hourTRANSCRIPT
Dr. Michael Rüter, Performance & Scalability, SAP AG
07/2012
Performance Guidelines for ABAP using the SAP HANA Database
Confidential
© 2012 SAP AG. All rights reserved. 2Confidential
Agenda
Guidelines Status quo – current guidelines with classic DBs HANA innovations How is ABAP effected Adapted guidelines for Hana
For details check the recording + slide desk from a PUMA session I gave recently:
https://wiki.wdf.sap.corp/wiki/display/TIPCrossPM/Recordings+of+Previous+HANA+Sessions
Search for my name (once written Rueter and once Rüter)
© 2012 SAP AG. All rights reserved. 3Confidential
Open SQL Statements in OLTP
Typical characteristics of OLTP SQL statements: Mainly SQL statements reading all attributes from tables with many columns
without any aggregation but with very selective WHERE condition SELECTs are supported by many dedicated DB indexes designed by the
application developers for a traditional DB system Most of the SQL statements are no joins or very simple joins (header-item) INSERTs and UPDATEs changing very few rows IN or OR lists due to FOR ALL ENTRIES
© 2012 SAP AG. All rights reserved. 4Confidential
Open SQL Statements in OLTP
Existing performance guidelines for traditional databases (as defined in the product standard “Performance”, e.g.) are mostly still valid for HANA.
These common “SQL best practices” or “rules of thumb” are summarized on the following pages.
© 2012 SAP AG. All rights reserved. 5Confidential
SQL Best Practices: Keep the Result Set Small
Don’t retrieve rows from the database and discard them on the application server using CHECK or EXIT, e.g. in SELECT loops.
Make the WHERE clause as specific as possible.
© 2012 SAP AG. All rights reserved. 6Confidential
SQL Best Practices: Minimize Amount of Transferred Data
Use SELECT with a field list instead of SELECT * in order to transfer just the columns you really need.
Use aggregate functions (COUNT, MIN, MAX, SUM, AVG) instead of transferring all the rows to the application server.
© 2012 SAP AG. All rights reserved. 7Confidential
SQL Best Practices: Reduce the Number of Round Trips
Use JOINs and / or subqueries instead of nested SELECT loops.
Use SELECT … FOR ALL ENTRIES instead of lots of SELECTs or SELECT SINGLEs.
Use array variants of INSERT, UPDATE, MODIFY, and DELETE.
© 2012 SAP AG. All rights reserved. 8Confidential
SQL Best Practices: Keep Load Away From the Database
Avoid reading data redundantly.
Use table buffering (if possible) and don’t bypass it.
Define appropriate secondary indexes specific for HDB if needed at all.
© 2012 SAP AG. All rights reserved. 9Confidential
Agenda
Guidelines Status quo – current guidelines with classic DBs HANA innovations How is ABAP effected Adapted guidelines for Hana
© 2012 SAP AG. All rights reserved. 10Confidential
Innovations in Hardware and Software Technology
Innovations inhardware technology
~1 TB RAM / server
Throughput 100GB/s
Significant reduction in cost / GB
Multi-Core architecture
Massive scaling with blade servers
Row and column based data store
Compression
Partitioning
No pre-calculated aggregates
Innovations inSAP software technology
SAP HANA’s strength comes from innovations in hardware and software
It is important to know these for understanding the performance
© 2012 SAP AG. All rights reserved. 11Confidential
Technology / Architecture: Multi Core
Leverage the Multi-Core Architecture To obtain a minimal runtime of your application in HANA
you want to leverage the multi core architecture that enables a high degree of parallelization.
If already one user utilizes the server by 100% the runtime will already increase as soon as only two users run in parallel.
Keep this in mind defining runtime KPIs and for sizing of your application.
© 2012 SAP AG. All rights reserved. 12Confidential
Technology / Architecture: Row vs. Column Store
Row Store versus Column Store: Usually applications on HANA want to run OLTP and
OLAP load on the same data.
For that it is needed to have the important application master and transactional data stored in the column store.
For tables not needed for OLAP, for frequently changed tables, or tables containing unstructured data (LOBs,..) the row store might be the better choice. Prominent example are queue tables, meta data, number range intervals, message payload,…
© 2012 SAP AG. All rights reserved. 13Confidential
Technology / Architecture: Partitioning
Typical Use Cases for Partitioning: Column tables limitation: Max. 2 billion records can be stored in
one table without partitioning. With partitioning 2 billion rows can be stored in each partition currently
Scale-out: Using several servers the partitions of one table may be distributed over the landscape. Queries could be processed on multiple servers
Parallelization: Operations are better parallelized by using several threads working on different partitions in parallel
Partition pruning: Scans are done on relevant partitions only and the load is reduced. Precondition is, that the query must match the given partition specification
© 2012 SAP AG. All rights reserved. 14Confidential
Technology / Architecture: Compression
Compression: The column store representations allows a very good
compression of typical business data (using dictionary compression). Typically you can expect factor 3-10 for compression compared to a traditional RDMS depending on the structure of the data.
This reduces the penalty of a de-normalized data model compared to a normalized one because the redundant data in the de-normalized model is compressed very well. Taking also into account that joining many tables is usually slower than accessing one table you should consider whether a fully normalized data model is the best solution. Be aware that on the other hand insert/updates might become slower in a de-normalized model.
Because GUIDs have a very bad compression, check especially in a normalized data model with many tables, if using GUIDs is necessary everywhere.
© 2012 SAP AG. All rights reserved. 15Confidential
Technology / Architecture: No Redundant Data
No Pre-Calculated Aggregates: Avoid as much redundant data as possible. With HANA
there might be no need anymore for index like tables, materialized views, totals/sum tables, aggregates.
The avoidance of that data helps performance/scalability mainly when creating or changing data. But check that read performance is still sufficient.
Don’t forget possible archiving or data deletion requirements
Especially in scale-out scenarios totals/sum or aggregate tables can harm the performance a lot
Avoiding redundant data also helps to minimize the needed memory in HANA. Keep in mind that the memory for the HANA server is the most important cost driver for the hardware at the moment.
© 2012 SAP AG. All rights reserved. 16Confidential
Technology / Architecture: Architecture of HANA
It is important to know the high level architecture of the HANA to understand the most important performance characteristics.
© 2012 SAP AG. All rights reserved. 17Confidential
Agenda
Motivation and Basics
Guidelines Status quo – current guidelines with classic DBs HANA innovations How is ABAP effected Adapted guidelines for Hana
Q&A, discussion, outlook
© 2012 SAP AG. All rights reserved. 18Confidential
Technology / Architecture: In-Memory / SAP Table Buffers / Other Buffers
Which DB tables are in memory? (all, or manual classification like for the table buffer)?
Per se all DB tables are buffered on HANA (this is managed by Hana - no classification necessary).
Are SAP table buffers still needed with HANA?
Yes, the table buffer is still necessary Using the classic table buffer is about 10 times
faster compared to a HANA in-memory access. There is no change in the rules when to use table
buffering.
What about other buffers?
No changes compared to the classical world
© 2012 SAP AG. All rights reserved. 19Confidential
Technology / Architecture: In-Memory / SAP Table Buffers / Other Buffers – Access Times
What are standard execution times (single record)?
Old rules are still valid
• Itab is factor 10 faster than table buffer
• Table buffer is factor 10 faster than DB cache / HANA
• DB cache / HANA is factor 10 faster than DB disk access
© 2012 SAP AG. All rights reserved. 20Confidential
Technology / Architecture: Indexes in column store
Are indexes still needed in the column store?
No secondary indexes are needed as a default
• Exceptions:• Very large tables and specific WHERE clauses
may need additional indexes
• Indexes may be created on selective columns, no concatenated indexes (HANA can handle several indexes in parallel)
© 2012 SAP AG. All rights reserved. 21Confidential
Technology / Architecture: DDIC View / HANA Views
When to use HANA views or DDIC Views?
If extra HANA specific features are not needed DDIC views should be used.
© 2012 SAP AG. All rights reserved. 22Confidential
Technology / Architecture: Code Push down
Which part of ABAP Open SQL should be rewritten in procedures and pushed down to HANA?
all ABAP programs that result in high traffic between DB and appserver
• Experience shows that normally not some small ABAP code segments are replaced by a DB procedure but a complete report or a complete business functionality is pushed down to Hana. (see appendix)
© 2012 SAP AG. All rights reserved. 23Confidential
Agenda
Guidelines Status quo – current guidelines with classic DBs HANA innovations How is ABAP effected Adapted guidelines for HANA
© 2012 SAP AG. All rights reserved. 24Confidential
All known "golden rules" are still valid:
- Restrict the result set of your SELECT - Only select the data / fields you really need - Use array operations instead of single operations
There is only a shift in priority for certain rules: In a column store and in-memory DB the following SQL statements are accelerated:
- All statements that involve physical I/O on traditional databasesE.g. WHERE clauses accessing DB table fields without index support
- All statements that scan a large dataset and provide a small result set E.g. all kinds of aggregations (group by, count, sum, avg, having …, �big full table / full index / range scans with a small result set
In a column store the following SQL "mistakes" are even more expensive - Retrieving unneeded columns (SELECT *) - Frequent single row based access (e.g. SELECT SINGLE in a LOOP, nested SELECTs)
Adapted Guidelines for HANA
© 2012 SAP AG. All rights reserved. 25Confidential
OLTP SQLs: Index Support
Rule Comment
Index support for selective WHERE clauses
Not as important as it used to be due to the changes in architecture (column store, compression). Indexes might still be needed though.
Indexing in HANA is still under investigation.
© 2012 SAP AG. All rights reserved. 26Confidential
OLTP SQLs: Filters
Rule Comment
No CHECK or EXIT in SELECT …ENDSELECT loops.
SELECT f1 FROM ….
CHECK. / EXIT.
ENDSELECT.
SELECT f1 FROM … WHERE … INTO...
Negative effect on HANA can be even worse than on traditional DBs especially if a lot of fields are selected (see next slide). All filters should be pushed down to HANA as early as possible.
Statement should be rewritten, might be a candidate for a procedure.
© 2012 SAP AG. All rights reserved. 27Confidential
OLTP SQLs: Field lists
Rule Comment
Restrict the field list in SELECT (no SELECT *)
SELECT * …
SELECT f1, f2 WHERE f7 = ?;
---
SELECT f1, f2 WHERE f7 = ?;SELECT f3, f4 WHERE f7 = ?;
SELECT f1, f2, f3 ,f4 WHERE f7 = ?;
Negative effect on HANA can be even worse than on traditional DBs.
Statement should be rewritten, might be a candidate for a procedure.
However statement executions are more important than fields. All fields that are needed should be selected in one SQL statement.
© 2012 SAP AG. All rights reserved. 28Confidential
SELECT FROM… WHERE … UP TO n ROWSRun time with varying number of rows and columns
0 50 100 150 200 250 300 3500
20
40
60
80
100
120
140
160
10 rows
100 rows
1000 rows
number of selected columns
run time [ms]
Conclusion:
The more rows are selected, the more important becomes the optimization for field lists. Large factors (>20) are possible for 1000 rows.
© 2012 SAP AG. All rights reserved. 29Confidential
OLTP SQLs: Number of Executions
Rule Comment
Avoid SELECTs in LOOPs and nested SELECTs. Strive to use array SELECTs, Joins, Subqueries, FAE, Views.
SELECT … FROM t_head … WHERE SELECT… FROM t_item … WHERE ENDSELECT.ENDSELECT.
SELECT … FROM t_head JOIN t_item ON …WHERE…
Negative effect on HANA can be even worse than on traditional DBs.
Statement should be rewritten into one statement, might be a candidate for a procedure.
© 2012 SAP AG. All rights reserved. 30Confidential
OLTP SQLs: Aggregations
Rule Comment
Use SQL aggregations instead of
SELECT … FROM table … WHERE … sum = sum + table-amount.ENDSELECT.
SELECT SUM(amount) … FROM table WHERE…
Negative effect on HANA can be even worse than on traditional DBs.
Statement should be rewritten using aggregations, might be a candidate for a procedure.
Keep in mind that buffered tables should be read from the SAP table buffer.
© 2012 SAP AG. All rights reserved. 31Confidential
OLTP SQLs: Table buffer
Rule Comment
Use the SAP table buffer, avoid bypassing the table buffer.
No change on HANA.
© 2012 SAP AG. All rights reserved. 32Confidential
OLTP SQLs: FOR ALL ENTRIES
Rule Comment
Ensure that the driver table of a FOR ALL ENTRIES statement is never emptyand does not contain duplicates.
No change on HANA.
Keep in mind that a JOIN or a subquery is often the better alternative and should be preferred over the FOR ALL ENTRIES.
© 2012 SAP AG. All rights reserved. 33Confidential
OLTP SQLs: FOR ALL ENTRIES
Rule Comment
Avoid DB changes in LOOPs. Negative effect on HANA can be even worse than on traditional DBs.
Statement should be rewritten using arrays, might be a candidate for a procedure.
Thank You!
Contact information:
SAP AG, Walldorf
© 2012 SAP AG. All rights reserved. 35Confidential
Appendix
Appendix
Partitioning Specifications
Application Design
© 2012 SAP AG. All rights reserved. 36Confidential
Technology / Architecture: Partitioning
Partitioning Pros and Cons: If it is possible to use time-based partitioning for huge
tables, this shall be done. This involves active partition management by the application.
Use as many columns in the partitioning schema as required for good balancing, but try to use only those that are typically specified in the query.
Unfortunately this is often hard to tell if not impossible. In the worst case only single selects may leverage pruning.
Beware: If there is a unique constraint on a non-key column or non-
primary key, the performance suffers exponentially with the number of partitions on other servers. Therefore, if partitioning is required, consider a low number of partitions.
If joins are used frequently in scale-out scenarios, partitioning implies a performance penalty.
© 2012 SAP AG. All rights reserved. 37Confidential
Range Partitioning
CREATE COLUMN TABLE mytab (a INT, b INT, c INT, PRIMARY KEY (a,b))PARTITION BY RANGE (a)(PARTITION 1 <= VALUES < 5, PARTITION 5 <= VALUES < 20, PARTITION VALUE = 44, PARTITION OTHERS
)
Create dedicated partitions for ranges of certain values (e.g. year) In-depth knowledge of table data is required Partitions can be created / dropped individually Not suited for load distribution When inserting/modifying rows, the target partition is determined by the defined ranges When having primary keys, only these columns can be used Works only on data types: string, integers and dates
Partitioning Specifications
2009 2010 2011 *
Sales
RANGEYear
© 2012 SAP AG. All rights reserved. 38Confidential
Partitioning Specifications
Hash Partitioning
CREATE COLUMN TABLE mytab (a INT, b INT, c INT, PRIMARY KEY (a,b)) PARTITION BY HASH (a, b) PARTITIONS 4
Distribute evenly rows to partitions for load balancing In-depth knowledge of table data is NOT required Specify the partitions columns When having primary keys, only these columns can be used Use as many partitioning columns as required to achieve a good variety of values for an
even distribution and to fit typical query requests
Part 1 Part 2 Part 3 Part 4
Sales
HASHYear
© 2012 SAP AG. All rights reserved. 39Confidential
Partitioning Specifications
Round-Robin Partitioning
CREATE COLUMN TABLE mytab (a INT, b INT, c INT) PARTITION BY ROUNDROBIN PARTITIONS 4
Distribute evenly rows to partitions for load balancing Table must not have primary keys No need to specify the partitions columns No pruning possible Depending on the scenario it is possible that the data within semantically related tables
resides on the same server
Part 1 Part 2 Part 3 Part 4
Sales
ROUND ROBIN
© 2012 SAP AG. All rights reserved. 40Confidential
Partitioning Specifications
Multi-Level Partitioning
CREATE COLUMN TABLE mytab (a INT, b INT, c INT, PRIMARY KEY (a,b))PARTITION BY HASH (a, b) PARTITIONS 4 SUBPARTITION BY RANGE (c)(
PARTITION 1 <= VALUES < 5,PARTITION 5 <= VALUES < 20 );
No key column restriction on 2nd level partitions Max 2 Levels
2009 2010 2011 2012
Part 1
Sales
2009 2010 2011 2012
Part 2
2009 2010 2011 2012
Part 2
HASH-RANGERegion, Year
© 2012 SAP AG. All rights reserved. 41Confidential
Adoption of Application DesignGeneral Guidelines
Try to avoid updates and use HANA as an ‘insert only’ database
No cluster tables needed anymore due to good column store compression
Some buffers on application server are still needed due to performance reasons Table buffer for customizing or meta data
Transactional buffers for master and transactional data
Shared objects for buffering more complex entities
Like with a traditional DB try to minimize the communication overhead between the application and the HANA. Try to minimize the number of roundtrips and the data volume transferred. If mass data is transferred evaluate if code push down can be used to optimize.
© 2012 SAP AG. All rights reserved. 42Confidential
Adoption of Application DesignCode Push Down
To leverage the performance of HANA application logic has to be pushed to the DB.
© 2012 SAP AG. All rights reserved. 43Confidential
Adoption of Application DesignAn example for code push down: Liquidity Forecast in ByD
Measurements
• Runtime of „old“ Liquidity Forecast Run vs. new SQL script based implementation.
Improvements / Key Messages
• Runtime performance improved by factor 100+• Business Object „Liquidity Forecast” and Mass Data Run
Object “Liquidity Forecast Run” became obsolete
Data Volume- execution based on 18.000+ register entries
Figures
Before(implementation via mass data run object on MaxDB/TREX)
Net runtime of background job(scheduling activities excluded) ~232 seconds
After (implementation via SQL Script on HANA)
Total runtime of SQL Script ~2 seconds
5000 18000 50000 5000000.1
1
10
100
1000
10000
Before(backgroundjob)
After(SQL script)
# register entries
sec
© 2012 SAP AG. All rights reserved. 44Confidential
Liquidity Forecast – Old Solution without Code Push Down
MaxDB
Trade Receivables/Payables Register
Trade Receivables/Payables Register
Tax Receivables/Payables RegisterTax Receivables/Payables Register Payment RegisterPayment Register Expected Liquidity
ItemExpected Liquidity
Item
FSI/TREX
Liquidity Forecast BOLiquidity Forecast BOLiquidity Forecast Creation RunLiquidity Forecast Creation Run
Create instance execute actions
execute actions / retrieve data
MDAVMDAV
1. schedule run
2. launch analytics report
© 2012 SAP AG. All rights reserved. 45Confidential
1. schedule run
Liquidity Forecast – New Solution with Code Push Down
FSI/TREX
Liquidity ForecastLiquidity ForecastLiquidity Forecast Creation RunLiquidity Forecast Creation Run
Create instance execute actions
Virtual MDAVVirtual MDAV
launch analytics report
HANA
Trade Receivables/Payables Register
Trade Receivables/Payables Register
Tax Receivables/Payables RegisterTax Receivables/Payables Register Payment RegisterPayment Register Expected Liquidity
ItemExpected Liquidity
Item
execute actions / retrieve data
Direct Access
SQL Script
© 2012 SAP AG. All rights reserved. 46Confidential
Adoption of Application DesignGeneral Remarks about Code Push Down
Significant performance improvement for OLTP like scenarios can only by expected by optimizing the application logic (push down to HANA).
The performance improvement can be several orders of magnitude.
The possible performance boost is especially huge for processes working on mass data.
Performance analysis and tuning of applications pushed to the HANA is not easy. A lot of try and error is still needed.
Getting high degree of parallelization for free from the HANA is not guaranteed. You might still have to take care of that within you application design.
Try to avoid imperative programming but use declarative programming.