best practices - bw data loading & performance
TRANSCRIPT
SAP Business Information Warehouse Getting a better view of the business
The SAP Business Information Warehouse (SAP BW) allows you to analyze data from operative SAP applications and other business applications, and external data sources such as databases, online services and the Internet. The administrator functions are designed for controlling, monitoring and maintaining all data retrieval processes. It becomes critical to keep the BW healthy to have optimal results.
Let’s see how!
1. Compress Data Regularly
ScenarioWhen data is loaded into an InfoCube, the data is
organized by requests (InfoPackage ID's). Each request has its own request ID, which is included in the primary key of fact table and in the packet dimension. However, because the request ID is included in the primary key of the fact table, it is possible that the same data record (all characteristics agree, with the exception of the request ID) can be written more than once in the fact table. This unnecessarily increases the volume of data, and reduces performance in reporting.
The system must perform aggregation using the request ID every time a query is executed !
Compression solves this problem.Compressing the InfoCube eliminates the duplicate
records by converting all request IDs to ZERO and rolling up the key figures for records that have common values for all characteristics.
Therefore…InfoCubes should be compressed regularly.
Uncompressed cubes increase data volume and have negative effect on query and aggregate build performance. If too many uncompressed requests are allowed to build up in an InfoCube, this can eventually cause unpredictable and severe performance problems.
After compressing InfoCube 0SD_C03, the runtime was reduced from 30 min to 5 min !
2. Build Indices
Inefficient SQLs can be Expensive !The SELECT does a FULL Table Scan !
INDEXES Solves the problemAn Index is a data structure sorted by values
containing pointers to data records of a table.An Index can improve reading performance when data
is searched for values of fields contained in the index.
It’s WORTH the maintenanceBuild Secondary Indexes wherever necessary to improve the loading performance.
An Index can improve the following operations: . Select...where <table fields> = <value> . Update...where <table fields> = <value . Delete...where <table fields> = <value . Table Joins (<table1.field1> = <table2.field2>
We did this for tables /BIC/AYCPSSPBI00 and /BIC/AYCPSSPBH00
and received a 90 % performance improvement
3. Partitioning – Divide and Rule
What is it ?Basically partitioning helps in accessing the data in
smaller chunks rather than going thru complete fact table.
By using partitioning you can split up the whole dataset for an InfoCube into several, smaller, physically independent and redundancy-free units. Thanks to this separation, performance is increased when reporting, or also when deleting data from the InfoCube
F & E Fact Table Partitioning
BENEFITS
Almost every reasonable OLAP query has a restriction based on time. Such restrictions can be exploited by the query optimizer to focus on those fact table partitions/fragments that contain the relevant data.
The technical term for this is partition pruning, as the query processing is pruned to the relevant partitions/fragments of the fact table.
Irrelevant partitions/fragments can be discarded at an early stage.
4.Build Aggregates – Let’s have some baby Cubes
Resources aren’t Free!The main limited resources on the database server are:CPU capacityMemory capacityNumber of physical I/O operations that can be performed
efficiently
Expensive SQL statements read too much data from the database compared to what is needed in the query. Expensive SQL statements can have a negative impact onperformance overall. Improper or unnecessary use of the
database buffers results in displacements of other data blocks, and
affects overall system performance
Indicators of Missing Aggregates
Indicators of Missing AggregatesRatio records selected (DBSEL) / records
transferred (DBRANS) > 10Records selected > 10,000Database time over 30 percent of total runtimeDatabase time higher than three seconds
Query performance may be poor due to missing aggregates: If much data has to be selected on the database, much more data than necessary is selected onthe database.
It works!After the creation of aggregate for the InfoCube
YDME_CA1, the performance of the query YSDMGTCAMPAIGNCA was improved by 97% from 1150 seconds to 30 seconds.
Coke is in process of creating a number of aggregates
for all major reports.
5. Archive Data – Old is not always GOLD
Archiving Process
ILM – Information Lifecycle Management
Clear the unused
ILM - Benefits
6. Star Schema – Design it Well
Text
SID Tables
Master
Hierarchies
Hierarchies
Master
SID Tables
Text
Hierarchies
Master
SID Tables
Text
Hierarchies
Master
SID Tables
Text
Hierarchies
Master
SID Tables
Text
Hierarchies
Master
SID Tables
Text
Text
SID Tables
Master
Hierarchies
Text
SID Tables
Master
Hierarchies
Text
SID Tables
Master
Hierarchies
DimensionTable
Text
SID Tables
Master
Hierarchies
DimensionTable
DimensionTable
DimensionTable
DimensionTable
Hierarchies
Master
SID Tables
Text
FACT
Big Performance comes in small size
Small dimensions.
Few dimensions (less important than small dimensions).
Only as many details as necessary.
Hierarchies only if necessary.
Time-dependent structures only if necessary.
Avoid MIN, MAX - Aggregation for key figures in huge InfoCubes
RSRV Tool Check and correct the dimensions in RSRV to remove
the unused entries in the dimension tables.Run the transaction RSRV.Select 'All elementary tests' -> 'Transaction data'.Double click on 'Entries not used in the Dimension of
an InfoCube'.Click on both new entries and enter the InfoCube
name and the dimension name. Then ‘Execute’ to get the check result.
To correct entries in the dimension tables, click
'Correct Error' button.
First hand experienceAfter reducing the size of dimension table /BIC/DYCPS_MRP23 , the performance was improved by 73% - That’s SUBSTNCIAL !
7. PSA Maintenance – Make it a HabitPSA Size can adversely affect the performance of the Production system.
Vigil your PSA Tablespace Size.Do a monthly review of Top 30 PSAs in the system.Include PSA Deletion steps in the Process chains.Do a weekly PSA Deletion process.
After incorporating these processes we could save 600-700 GB Space !
8. Parallelism – Time is moneyAvoid FULL Loads as far as possible.
Divide the heavy loads in parallel loads based on selections.
Check if the old data is no longer needed and the data selection can be reduced.
Design Process chains with parallelism.
Always perform adequate testing on the optimal parallelism and performance… don’t go overboard.
9. CO-PA Datasource Lock
Coke Bone of Contention for past 5 yrs
CO-PA Datasource extracts Actual & Plan Data
CO-PA deltas are based on Timestamp with a safety interval of 30 minutes.
The CO-PA Actual extractor runs every hour during Business critical weeks.
The CO-PA Actual extractor was failing every hour with error message
“ The selected datasource is locked by another process”
The 5 year Legacy met its endWe could identify the root cause by debugging the
Extractor thoroughly from head to toe.
A conflict between CO-PA and VF01 was unearthed.
VF01 is a billing transaction which establishes an update lock on structure CESVB.
CO-PA extractor also accesses the CSEVB structure while actual and plan extraction and fails if it doesn’t get the lock on it.
ResolutionWe designed the below solution and made it a Best Practice during the Business critical weeks.
Vigil SM12 in R/3 for CESVB.Stop CO-PA Extractor if there is a lock on CESVB.Monitor this lock as VF01 holds it for few seconds.If the lock persists for quite some time, delete the
lock after taking necessary approvals.Start the CO-PA extractor once CESVB lock is
released.We also discussed this with SAP and an extractor
code change in WIP, which will try to access CESVB 10 times before failing. Presently the try is only once.
10.Miscellaneous
Drop index of a cube before loading.Distribute work load among multiple server instancesPrefer delta load: as it loads only newly added or
modified records.We should deploy parallelism. Multiple Info packages
should be run simultaneously.Update routines and transfer routines should be
avoided unless necessary. And the routine should be a optimized code.
We should prefer to load master data and then transaction data because when u load master data, SID is generated and this SID is used in Transaction
data.
THANK YOU