performance optimization guide

204
PUBLIC SAP Data Services Document Version: 4.2 Support Package 14 (14.2.14.0) – 2021-06-02 Performance Optimization Guide © 2021 SAP SE or an SAP affiliate company. All rights reserved. THE BEST RUN

Upload: others

Post on 03-Dec-2021

12 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Performance Optimization Guide

PUBLICSAP Data ServicesDocument Version: 4.2 Support Package 14 (14.2.14.0) – 2021-06-02

Performance Optimization Guide

© 2

021 S

AP S

E or

an

SAP affi

liate

com

pany

. All r

ight

s re

serv

ed.

THE BEST RUN

Page 2: Performance Optimization Guide

Content

1 What's in this Guide?. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

2 Naming conventions and variables. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

3 Environment Test Strategy. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113.1 Tune source OS and database server. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113.2 Tune target OS and database server. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123.3 Tune the network. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .133.4 Tune Job Server OS and job options. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

Set Monitor sample rate. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15Collect statistics for cache optimization. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

4 Measure job performance. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 174.1 Data Services processes and threads. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 174.2 Checking system utilization. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

CPU utilization. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20CPU bottleneck factors. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .21Memory. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22Analyze trace logs for task duration. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24Read the Monitor log for execution statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25Read the Performance Monitor for execution statistics. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27Read Operational Dashboard for execution statistics. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .28

5 Job execution strategies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 305.1 Using advanced tuning options. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .31

6 Maximize push-down operations. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 336.1 Full push-down operations. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .34

Auto-correct load. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 356.2 Partial push-down operations. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 366.3 Operations that cannot be pushed down. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 376.4 Push-down examples. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38

Example 1: Collapsing transforms to push down operations . . . . . . . . . . . . . . . . . . . . . . . . . . . .39Example 2: Full push down from source to target. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40Example 3: Full push down for auto correct load to target . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41Example 4: Partial push down to source. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42Example 5: Push-down SQL join. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43

6.5 Query transform viewing optimized SQL. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44

2 PUBLICPerformance Optimization Guide

Content

Page 3: Performance Optimization Guide

6.6 Data_Transfer transform for push-down operations. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45Push down an operation after a blocking operation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46Using Data_Transfer tables to speed up auto correct loads . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47

6.7 Linked datastores. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49Linked datastores database software requirements. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50Use linked remote servers. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52Syntax for generated SQL statements. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54Tuning performance at the data flow or Job Server level. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54

7 Cache data. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 567.1 Cache data sources. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .577.2 Cache joins. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 587.3 Setting data flow cache type. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 597.4 Cache source of lookup function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 607.5 Lookup table as source outer join. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 617.6 Cache table comparisons. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 627.7 Specify a pageable cache directory. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 637.8 Use persistent cache. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .647.9 Monitoring and tuning caches. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66

Setting cache type automatically. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66Monitoring and tuning In Memory and Pageable caches. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .67Accessing the Administrator Performance Monitor. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69

8 Parallel Execution. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 718.1 Parallel data flows and work flows. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71

Changing the maximum number of engines. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 738.2 Parallel execution in data flows. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73

Degree of parallelism. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74Table partitioning. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .85Combining table partitioning and DOP. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95Parallel process threads for flat files. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99

9 Distribute data flow execution. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1059.1 Run as separate process. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1069.2 Multiple processes with Data_Transfer. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .107

Data_Transfer transform. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107Example 1: Sub data flow that pushes down joins. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .108Example 2: Sub data flow that pushes down memory-intensive operations. . . . . . . . . . . . . . . . .110

9.3 Multiple processes for a data flow. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112Example 1: Multiple sub data flows and DOP of 1. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .113Example 2: Run multiple sub data flows with DOP greater than 1. . . . . . . . . . . . . . . . . . . . . . . . 115

10 Using grid computing to distribute data flow execution. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117

Performance Optimization GuideContent PUBLIC 3

Page 4: Performance Optimization Guide

10.1 Server Group. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11710.2 Distribution levels for data flow execution. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .118

Job level. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118Data flow level. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119Sub data flow level. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121

11 Bulk Loading and Reading. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12311.1 Google BigQuery ODBC bulk loading. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12511.2 Configuring bulk loading for Hive. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12611.3 Bulk Loading in IBM DB2 Universal Database. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128

When to use each DB2 bulk-loading method. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129About the DB2 CLI load method. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 130IBM DB2 UDB bulk load Import method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132

11.4 Bulk loading in Informix. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133Set Informix server variables. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133

11.5 Bulk loading in Microsoft SQL Server. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .134Enabling the SQL Server ODBC bulk copy API. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135Network packet size option. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135Maximum rejects option. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 136Bulk loading with DataDirect Wire Protocol SQL Server ODBC driver . . . . . . . . . . . . . . . . . . . . . 137

11.6 Bulk loading in Netezza. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141Netezza bulk-loading process. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .141Options overview. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142Configuring bulk loading for Netezza. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143Netezza log files: nzlog and nzbad. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 144

11.7 Bulk Loading in PostgreSQL. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145Bulk Loading options. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 146

11.8 Bulk loading in Oracle. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .147Use table partitioning for Oracle bulk loading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149Oracle bulk loading method and mode combinations. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 150Example 1: File method, Direct-path mode, and Number of loaders . . . . . . . . . . . . . . . . . . . . . . 151Example 2: API method, Direct-path mode, partitioned tables. . . . . . . . . . . . . . . . . . . . . . . . . .152

11.9 Bulk loading in SAP HANA. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15411.10 Bulk loading in SAP ASE. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15511.11 Bulk loading in SAP Sybase IQ (SAP IQ). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 156

Configuring bulk loading for SAP Sybase IQ (SAP IQ). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 157SAP IQ log files. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .158

11.12 Bulk loading and reading in Teradata. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 159Data file. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 160Named Pipes. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 161When to use each Teradata bulk-loading method. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .164Parallel Transporter (TPT) method. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 167

4 PUBLICPerformance Optimization Guide

Content

Page 5: Performance Optimization Guide

Teradata standalone utilities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 172Teradata UPSERT operation for bulk loading. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 184

12 Performance options for tuning system performance. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18612.1 Source-based performance tuning. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 187

Join rank settings. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 187Minimize extracted data with CDC. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 193Array fetch size. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 194

12.2 Target-based performance options. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 195Loading method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 196Rows per commit. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 197

12.3 Job design performance options. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .198Minimizing data type conversion. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 198Minimizing locale conversion. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 200Precision in operations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 200

Performance Optimization GuideContent PUBLIC 5

Page 6: Performance Optimization Guide

1 What's in this Guide?

The Performance Optimization Guide contains information about how to use the features in SAP Data Services to improve performance.

For example, to enhance job execution performance, enable specific job execution settings. Determine whether your jobs can benefit from multi-threading or parallel execution. Also read about push-down operations and bulk loading.

6 PUBLICPerformance Optimization Guide

What's in this Guide?

Page 7: Performance Optimization Guide

2 Naming conventions and variables

This documentation uses specific terminology, location variables, and environment variables that describe various features, processes, and locations in SAP Business Objects and SAP Data Services.

Terminology

SAP Data Services documentation uses the following terminology:

● The terms Data Services system and SAP Data Services mean the same thing.● The term BI platform refers to SAP BusinessObjects Business Intelligence platform.● The term IPS refers to SAP BusinessObjects Information platform services.

NoteData Services requires BI platform components. However, when you don't use other SAP applications, IPS, a scaled back version of BI, also provides these components for Data Services.

● CMC refers to the Central Management Console provided by the BI or IPS platform.● CMS refers to the Central Management Server provided by the BI or IPS platform.

Variables

The following table describes the location variables and environment variables that are necessary when you install and configure Data Services and required components.

Variables Description

INSTALL_DIR The installation directory for SAP applications such as Data Services.

Default location:

● For Windows: C:\Program Files (x86)\SAP BusinessObjects

● For UNIX: $HOME/sap businessobjects

NoteINSTALL_DIR isn't an environment variable. The in­stallation location of SAP software can be different than what we list for INSTALL_DIR based on the location that your administrator sets during installation.

Performance Optimization GuideNaming conventions and variables PUBLIC 7

Page 8: Performance Optimization Guide

Variables Description

BIP_INSTALL_DIR The directory for the BI or IPS platform.

Default location:

● For Windows: <INSTALL_DIR>\SAP BusinessObjects Enterprise XI 4.0

ExampleC:\Program Files (x86)\SAP BusinessObjects\SAP BusinessObjects Enterprise XI 4.0

● For UNIX: <INSTALL_DIR>/enterprise_xi40

NoteThese paths are the same for both BI and IPS.

NoteBIP_INSTALL_DIR isn't an environment variable. The installation location of SAP software can be different than what we list for BIP_INSTALL_DIR based on the location that your administrator sets during installation.

<LINK_DIR> An environment variable for the root directory of the Data Services system.

Default location:

● All platforms<INSTALL_DIR>\Data Services

ExampleC:\Program Files (x86)\SAP BusinessObjects\Data Services

8 PUBLICPerformance Optimization Guide

Naming conventions and variables

Page 9: Performance Optimization Guide

Variables Description

<DS_COMMON_DIR> An environment variable for the common configuration di­rectory for the Data Services system.

Default location:

● If your system is on Windows (Vista and newer):<AllUsersProfile>\SAP BusinessObjects\Data Services

NoteThe default value of <AllUsersProfile> environ­ment variable for Windows Vista and newer is C:\ProgramData.

ExampleC:\ProgramData\SAP BusinessObjects\Data Services

● If your system is on Windows (Older versions such as XP)<AllUsersProfile>\Application Data\SAP BusinessObjects\Data Services

NoteThe default value of <AllUsersProfile> environ­ment variable for Windows older versions is C:\Documents and Settings\All Users.

ExampleC:\Documents and Settings\All Users\Application Data\SAP BusinessObjects\Data Services

● UNIX systems (for compatibility)<LINK_DIR>

The installer automatically creates this system environment variable during installation.

NoteStarting with Data Services 4.2 SP6, users can desig­nate a different default location for <DS_COMMON_DIR> during installation. If you can't find the <DS_COMMON_DIR> in the listed default location, ask

Performance Optimization GuideNaming conventions and variables PUBLIC 9

Page 10: Performance Optimization Guide

Variables Description

your System Administrator to find out where your de­fault location is for <DS_COMMON_DIR>.

<DS_USER_DIR> The environment variable for the user-specific configuration directory for the Data Services system.

Default location:

● If you're on Windows (Vista and newer):<UserProfile>\AppData\Local\SAP BusinessObjects\Data Services

NoteThe default value of <UserProfile> environment variable for Windows Vista and newer versions is C:\Users\{username}.

● If you're on Windows (Older versions such as XP):<UserProfile>\Local Settings\Application Data\SAP BusinessObjects\Data Services

NoteThe default value of <UserProfile> environment variable for Windows older versions is C:\Documents and Settings\{username}.

NoteThe system uses <DS_USER_DIR> only for Data Services client applications on Windows. UNIX plat­forms don't use <DS_USER_DIR>.

The installer automatically creates this system environment variable during installation.

10 PUBLICPerformance Optimization Guide

Naming conventions and variables

Page 11: Performance Optimization Guide

3 Environment Test Strategy

Knowing your SAP Data Services environment helps determine what components to test and adjust.

The environment for Data Services includes database servers, operating systems, networks, and job servers. Before you test and tune jobs, consider all aspects of the environment in which Data Services functions.

To test and tune jobs, work with all of the components in the following order:

1. Source operating system and source database server2. Target operating system and target database server3. Network4. Operating system for Data Services Job Server5. Job options in Data Services

In addition to SAP documentation, use your UNIX or Windows operating system documentation and the documentation for your database management system. Operating system and database server documentation may contain specific techniques, commands, and utilities to help you measure and tune your environment.

Tune source OS and database server [page 11]To read data from disks quickly, tune the source operating system and database server.

Tune target OS and database server [page 12]To write data to disks quickly, tune the target operating system and database server.

Tune the network [page 13]When the read and write processes involve going through a network, tune the network so that it moves large amounts of data efficiently with minimal overhead.

Tune Job Server OS and job options [page 14]Tune the Job Server operating system and set job execution options to improve performance and take advantage of the self-tuning features of SAP Data Services.

3.1 Tune source OS and database server

To read data from disks quickly, tune the source operating system and database server.

Source operating system tuning

To improve the speed of reading large amounts of data, make the OS input and output (I/O) operations as fast as possible using read-ahead. Most operating systems offer read-ahead for tuning server performance. Read-ahead improves performance by adjusting the size of each cache block for I/O operation. The default value for the I/O operation is 4–8 KB per block of cache. When you have large files to input and output, setting the I/O operation to a higher number may improve performance. For example, try to set the I/O operation to at least 64 KB and run tests to determine the affect on processing speed.

Performance Optimization GuideEnvironment Test Strategy PUBLIC 11

Page 12: Performance Optimization Guide

See your operating system documentation for information about using the read-ahead protocol and setting the I/O operation.

Database tuning

Tune your database on the source side so that it performs SELECT statements efficiently. Check the following methods for improving SELECT statements:

● Create indexes on applicable fields that you use in the SQL SELECT statement.● Increase the size of each I/O operation from the database server so they match the OS read-ahead I/O

operation size.● Enable more cached data in the database server by increasing the size of the shared buffer.● Cache tables that are small enough to fit in the shared buffer. For example, when jobs access the same

table on a database server, and the table fits in the shared buffer, cache the table. Caching data on database servers reduces the number of I/O operations and speeds up access to database tables.

See your database server documentation for more information about techniques, commands, and utilities that help you measure and tune the source databases in your jobs.

Parent topic: Environment Test Strategy [page 11]

Related Information

Tune target OS and database server [page 12]Tune the network [page 13]Tune Job Server OS and job options [page 14]

3.2 Tune target OS and database server

To write data to disks quickly, tune the target operating system and database server.

Operating system tuning

To enhance performance when you process large datasets, make your operating system input and output (I/O) operations as fast as possible. For example, most operating systems offer the asynchronous I/O operation, which can improve performance.

See your operating system documentation for information about setting the I/O operation.

12 PUBLICPerformance Optimization Guide

Environment Test Strategy

Page 13: Performance Optimization Guide

Database tuning

To perform INSERT and UPDATE commands as quickly as possible, tune your database on the target side. There are several ways to improve the performance of INSERT and UPDATE in the database layer.

For example, for Oracle, improve INSERT and UPDATE performance in the following ways:

● Turn off archive logging.● Turn off redo logging for all tables.● Tune rollback segments for better performance.● Put redo log files and data files on a raw device when possible.● Increase the size of the shared buffer.

See your database server documentation for more information about techniques, commands, and utilities that can help you measure and tune the target databases in your jobs.

Parent topic: Environment Test Strategy [page 11]

Related Information

Tune source OS and database server [page 11]Tune the network [page 13]Tune Job Server OS and job options [page 14]

3.3 Tune the network

When the read and write processes involve going through a network, tune the network so that it moves large amounts of data efficiently with minimal overhead.

Do not underestimate the importance of network tuning, even when you have a very fast network with lots of bandwidth.

One setting you can make for network tuning is to set network buffers to reduce the number of round trips to the database servers across the network. For example, adjust the size of the network buffer in the database client so that each client request completely fills a small number of network packets.

Parent topic: Environment Test Strategy [page 11]

Related Information

Tune source OS and database server [page 11]Tune target OS and database server [page 12]

Performance Optimization GuideEnvironment Test Strategy PUBLIC 13

Page 14: Performance Optimization Guide

Tune Job Server OS and job options [page 14]

3.4 Tune Job Server OS and job options

Tune the Job Server operating system and set job execution options to improve performance and take advantage of the self-tuning features of SAP Data Services.

Job Server operating system

Data Services jobs are multi-threaded applications. Typically a single data flow in a job initiates one AL_ENGINE process that in turn initiates at least four threads.

For maximum performance benefits, consider the following:

● Design the operating system so that it runs one AL_ENGINE process per CPU at a time.● Tune the Job Server OS so that threads spread to all available CPUs.

Job options

Perform the following tasks before you tune job execution options:

● Tune the database and operating system on the source and target computers.● Adjust the size of the network buffer.● Make sure that your data flow design is optimal.

After you've done the prework, use the following information from the job execution to make job performance improvements:

● Monitor sample rates● Collect and use statistics for optimization

Set Monitor sample rate [page 15]Control the frequency of updates to the monitor log file for better performance.

Collect statistics for cache optimization [page 16]SAP Data Services has a self-tuning feature in the Execution Properties dialog box named Collect statistics for optimization.

Parent topic: Environment Test Strategy [page 11]

Related Information

Tune source OS and database server [page 11]

14 PUBLICPerformance Optimization Guide

Environment Test Strategy

Page 15: Performance Optimization Guide

Tune target OS and database server [page 12]Tune the network [page 13]

Checking system utilization [page 19]Change default paging limits [page 24]

3.4.1 Set Monitor sample rate

Control the frequency of updates to the monitor log file for better performance.

During job execution, SAP Data Services writes information to the monitor log file and job events at regular intervals. Increase the interval frequency for better job performance.

By default, Data Services updates the monitor log file and job events every 5 seconds. To have Data Services make fewer calls to the operating system for the monitor log and job events, increase the number of seconds between updates. Fewer calls, however, means that you see less frequent updates about the job execution. Before you change the default setting, consider the importance of performance improvements against fewer job execution updates. The Monitor sample rate option is in the Execution Properties dialog box.

TipIf you use a virus scanner on your files, exclude the SAP Data Services log from the virus scan. Otherwise, the virus scan analyzes the log repeatedly during the job execution, which causes a performance degradation.

Parent topic: Tune Job Server OS and job options [page 14]

Related Information

Collect statistics for cache optimization [page 16]Read the Monitor log for execution statistics [page 25]Execution Options

Performance Optimization GuideEnvironment Test Strategy PUBLIC 15

Page 16: Performance Optimization Guide

3.4.2 Collect statistics for cache optimization

SAP Data Services has a self-tuning feature in the Execution Properties dialog box named Collect statistics for optimization.

When you enable this option, Data Services uses the job statistics to choose an optimal cache type: In-memory or pageable.

Run a test job with Collect statistics for optimization enabled using data volumes that represent your production environment. Data Services collects statistics that include the number of rows and width of each row.

To keep the caching type at the optimal setting, ensure that you rerun the job when the data volume changes.

NoteThe Collect statistics for optimization option is not enabled by default, however, after you've enabled it for the first execution of the job, it is enabled for subsequent job executions.

Parent topic: Tune Job Server OS and job options [page 14]

Related Information

Set Monitor sample rate [page 15]Execution OptionsCache data [page 56]

16 PUBLICPerformance Optimization Guide

Environment Test Strategy

Page 17: Performance Optimization Guide

4 Measure job performance

To measure job performance, use the information that SAP Data Services produces from job executions.

Adjust your jobs to enhance job performance by evaluating the processes and threads used by Data Services during job execution. Use the following output information:

● System utilization● Log files● Monitor logs● Performance Monitor● Operational Dashboard

Data Services processes and threads [page 17]To enhance job performance, adjust the number of concurrent processes and threads during job execution.

Checking system utilization [page 19]To determine the effect of multiple processes and threads on job performance, check specific system resources during testing.

4.1 Data Services processes and threads

To enhance job performance, adjust the number of concurrent processes and threads during job execution.

At a basic level, job execution involves extracting data from a source, transforming the data, and loading the transformed data into a target. A more complicated job may involve reading data from multiple sources, transforming data over multiple processes, and loading data to multiple targets. The number of processes and threads varies based on how complicated your jobs are.

Processes

Data Services uses two processes during job execution: AL_JOBSERVER and AL_ENGINE.

Process descriptions

Process Description

AL_JOBSERVER Launches a job and monitors the job execution.

Data Services initiates one AL_JOBSERVER process for each Job Server that you configure on your computer.

AL_JOBSERVER doesn't use a lot of CPU power.

Performance Optimization GuideMeasure job performance PUBLIC 17

Page 18: Performance Optimization Guide

Process Description

AL_ENGINE Runs when a job starts. Data Services runs an AL_ENGINE process for each data flow in a job. It runs AL_ENGINE as a single process for real-time jobs.

The number of AL_ENGINE processes initiated by a batch job depends on the number of:

● Parallel work flows● Parallel data flows● Subdata flows

Threads

One data flow initiates one AL_ENGINE process, which creates one thread per data flow object. A data flow object falls into three categories: Source, transform, or target. For example, a data flow with two sources, a query, and a target could initiate four threads.

If you use parallel objects in data flows, the thread count increases to approximately one thread for each source or target table partition. If you set the Degree of parallelism (DOP) option for your data flow to greater than one, the thread count per transform increases. For example, a DOP of 5 allows five concurrent threads for a Query transform.

To run objects within data flows in parallel, use the following features:

● Table partitioning● Degree of parallelism for data flows● File multi-threading

Parent topic: Measure job performance [page 17]

Related Information

Checking system utilization [page 19]Table partitioning [page 85]Degree of parallelism [page 74]Parallel process threads for flat files [page 99]

18 PUBLICPerformance Optimization Guide

Measure job performance

Page 19: Performance Optimization Guide

4.2 Checking system utilization

To determine the effect of multiple processes and threads on job performance, check specific system resources during testing.

The number of processes and threads concurrently executing affects the utilization of the following system resources:

● CPU● Memory● Disk● Network

To monitor these system resources, use the tools listed in the following table based on operating system.

Operating system Tool

UNIX Use UNIX or Linux Top command or a third-party utility (such as Glance for HP-UX).

When you run Top, the system presents the total number of running processes and all related details such as CPU usage.

Windows Use the Performance tab of the Windows Task Manager dur­ing job execution.

CPU utilization [page 20]SAP Data Services is designed to maximize the use of CPUs and memory available to run the job.

CPU bottleneck factors [page 21]To help determine the source of CPU bottlenecks on database servers, understand the contributing factors.

Memory [page 22]Memory use is an indicator for determining whether to adjust the number of concurrent processes and threads.

Analyze trace logs for task duration [page 24]Trace logs show the progress of an execution through each component or object of a job.

Read the Monitor log for execution statistics [page 25]The Monitor log file indicates how many rows SAP Data Services produces or loads for a job.

Read the Performance Monitor for execution statistics [page 27]The Performance Monitor displays execution times for each work flow, data flow, and sub data flow within a job.

Read Operational Dashboard for execution statistics [page 28]The Management Console Operational Dashboard provides graphical depictions of SAP Data Services job execution statistics.

Parent topic: Measure job performance [page 17]

Performance Optimization GuideMeasure job performance PUBLIC 19

Page 20: Performance Optimization Guide

Related Information

Data Services processes and threads [page 17]

4.2.1 CPU utilization

SAP Data Services is designed to maximize the use of CPUs and memory available to run the job.

The total number of concurrent threads a job runs depends on job design and environment. Test your job while watching multi-threaded processes to see how much CPU and memory the job requires. Make needed adjustments to your job design and environment and test again to confirm improvements.

For example, if you run a job and see that the CPU utilization is very high, try decreasing the degree of parallelism (DOP) setting or run fewer parallel jobs or data flows. Resetting DOP and the number of parallel jobs could prevent CPU thrashing. On the other hand, if you see that the system uses only a half of a CPU, or you run eight jobs on an eight-way computer and CPU usage is only 50%, you can interpret CPU utilization in the following ways:

● That SAP Data Services can push most of the processing down to source and/or target databases.● That there are CPU bottlenecks in the database server or the network connection.

NoteBottlenecks on database servers prevent readers or loaders in jobs to use Job Server CPUs efficiently.

Parent topic: Checking system utilization [page 19]

Related Information

CPU bottleneck factors [page 21]Memory [page 22]Analyze trace logs for task duration [page 24]Read the Monitor log for execution statistics [page 25]Read the Performance Monitor for execution statistics [page 27]Read Operational Dashboard for execution statistics [page 28]Parallel Execution [page 71]Using grid computing to distribute data flow execution [page 117]Array fetch size [page 194]

20 PUBLICPerformance Optimization Guide

Measure job performance

Page 21: Performance Optimization Guide

4.2.2 CPU bottleneck factors

To help determine the source of CPU bottlenecks on database servers, understand the contributing factors.

CPU Bottlenecks on database servers prevent readers or loaders in jobs to use Job Server CPUs efficiently. To help you determine the source of CPU bottlenecks, read about factors that can cause CPU bottlenecks in the following table.

CPU bottleneck factors

Factor Additional notes

Disk service time on database server computers Disk service time should be below 15 milliseconds. Consult your server documentation for methods of improving per­formance based on disk service time. For example, the fol­lowing factors may improve disk service time:

● A fast disk controller● Move database server log files to a raw device● Increase log size

Number of threads per process allowed on each database server operating system.

Split a data flow into separate processes or reduce the DOP.

Network connection speed Determine the rate in which your system is transferring data across the network.

If the network is the source of the bottleneck, try changing your job execution distribution level from subdata flow to data flow. Or try changing the job so it executes the entire data flow on the local server.

If your network has a large capacity, try configuring the job to retrieve multiple rows from the source using fewer re­quests.

Under-utilized system Try increasing the value for the Degree of parallelism option and increase the number of parallel jobs and data flows.

ExampleWhen the bottleneck factor is the number of threads per process on each database server operating system, split a data flow into separate processes or reduce the DOP. The following list provides suggestions for splitting the data flow or reducing the DOP:

● On HP-UX, the number of kernel threads per process is configurable. The CPU-to-thread ratio default is 1:1 (one to one). Try setting the number of kernel threads per CPU to from 512 through 1024.

● On Solaris and AIX, the number of threads per process is not configurable and depends on system resources. If a process terminates with a message like "Cannot create threads," consider tuning the job using SAP Data Services job-tuning features.

● Use the Run as a separate process option to split a data flow● Use the Data Transfer transform to create two subdata flows to execute sequentially. Since a different

AL_ENGINE process executes each subdata flow, the number of threads needed for the data flow is 50% less than your previous job design.

Performance Optimization GuideMeasure job performance PUBLIC 21

Page 22: Performance Optimization Guide

● If you set the Degree of parallelism option in the data flow, also decrease the setting for DOP in the data flow Properties dialog box.

Parent topic: Checking system utilization [page 19]

Related Information

CPU utilization [page 20]Memory [page 22]Analyze trace logs for task duration [page 24]Read the Monitor log for execution statistics [page 25]Read the Performance Monitor for execution statistics [page 27]Read Operational Dashboard for execution statistics [page 28]Degree of parallelism [page 74]Splitting a dataflow into sub data flows [page 106]

4.2.3 Memory

Memory use is an indicator for determining whether to adjust the number of concurrent processes and threads.

There are several cases in which memory utilization is affected during job execution. The following factors affect memory utilization:

● Low physical memory● Under-utilized memory● Pageable cache exceeds limits

Parent topic: Checking system utilization [page 19]

Related Information

CPU utilization [page 20]CPU bottleneck factors [page 21]Analyze trace logs for task duration [page 24]Read the Monitor log for execution statistics [page 25]Read the Performance Monitor for execution statistics [page 27]Read Operational Dashboard for execution statistics [page 28]

Run as separate process [page 106]Maximize push-down operations [page 33]

22 PUBLICPerformance Optimization Guide

Measure job performance

Page 23: Performance Optimization Guide

Data_Transfer transform [page 107]Using grid computing to distribute data flow execution [page 117]Cache data [page 56]

4.2.3.1 Utilize memory

Make specific system and data flow settings so your system uses memory more efficiently.

Low physical memory

When your system has low physical memory, try the following solutions to enhance performance:

● Add more memory to the Job Server.● Redesign your data flow to run memory-consuming operations in separate subdata flows, which use less

memory. To access memory on multiple machines, distribute the subdata flows over multiple Job Servers.● Redesign your data flow to push down memory-consuming operations to the database server.

ExampleA data flow performs the following tasks:

● Reads data from a table● Joins the read data to an existing table● Groups data using the group_by operation to calculate an average. Group_by can occur in memory.

To better use memory, redesign the data flow to include a staging table after the join and before the group_by operation. Configure the staging table so that SAP Data Services processes the data on a different computer. When a subdata flow reads the staged data and continues with the group processing, it utilizes memory from the database server on the other computer.

Under-utilized memory

If your system has a large amount of physical memory, you may be under-utilizing memory. To better-utilize your memory, cache more data. Caching data improves the performance of transforms because it reduces the number of times the system must access the database.

There are two types of caches available: In-memory and pageable.

Related Information

Cache data [page 56]

Performance Optimization GuideMeasure job performance PUBLIC 23

Page 24: Performance Optimization Guide

4.2.3.2 Change default paging limits

When pageable cache exceeds the virtual memory limits of your computer, increase the default paging limits.

SAP Data Services uses the pageable cache method for processing data flows. On Windows, UNIX, and Linux, the virtual memory available to the AL_ENGINE process is 3.5 GB.

NoteThe system reserves 500 MB of virtual memory for other engine operations that total 4 GB.

Change the default limit by increasing the value of MAX_64BIT_PROCESS_VM_IN_MB in the DSConfig.txt file.

If your system needs more memory than the default virtual memory, Data Services starts paging to continue executing the data flow. Avoid paging by using the following techniques:

● Split the data flow into subdata flows. Each subdata flow uses the amount of memory set by the virtual memory limits.

● Run each data flow or each memory-intensive operation within a data flow as a separate process. Each process uses unused memory from the other processes to improve performance and throughput.

● To decrease the memory demand on the Job Server computer, push down memory-intensive operations from the Job Server computer to the database server computer.

Related Information

Distribute data flow execution [page 105]Maximize push-down operations [page 33]

4.2.4 Analyze trace logs for task duration

Trace logs show the progress of an execution through each component or object of a job.

The following table contains a depiction of a trace log. The trace log shows a separate process ID (Pid) and time stamp for the job, data flow, and each sub data flow.

Pid Tid Type Time stamp Message

... ... ... ...

5696 5964 JOB 2/11/2012 11:56:37 PM Processing job <Job_Group_Or­ders>.

... 4044 ... ... ...

4252 DATAFLOW 2/11/2012 11:56:38 PM Process to execute data flow <DF_Group_Orders> is started.

24 PUBLICPerformance Optimization Guide

Measure job performance

Page 25: Performance Optimization Guide

Pid Tid Type Time stamp Message

1604 984 DATAFLOW 2/11/2012 11:56:42 PM Process to execute sub data flow <DF_1_Group_Orders> is started.

... ... ... ...

5648 5068 DATAFLOW 2/11/2012 11:56:48 PM Process to execute sub data flow <DF_2_Group_Orders> is started.

... ... ... ...

Trace logs also include messages about sub data flows, caches, and statistics.

Parent topic: Checking system utilization [page 19]

Related Information

CPU utilization [page 20]CPU bottleneck factors [page 21]Memory [page 22]Read the Monitor log for execution statistics [page 25]Read the Performance Monitor for execution statistics [page 27]Read Operational Dashboard for execution statistics [page 28]Run as separate process [page 106]Cache data [page 56]Log

4.2.5 Read the Monitor log for execution statistics

The Monitor log file indicates how many rows SAP Data Services produces or loads for a job.

By viewing the Monitor log during job execution, you can observe the progress of row-counts to determine the location of bottlenecks. Use the Monitor log to answer questions such as the following:

● What transform is running at the moment?● How many rows have been processed so far?

NoteThe frequency in which the Monitor log refreshes the statistics is based on the Monitor sample rate setting in Execution Options.

● How long does it take to build the cache for a lookup or comparison table?

NoteIf it takes a long time to build the cache, change to the persistent cache type.

Performance Optimization GuideMeasure job performance PUBLIC 25

Page 26: Performance Optimization Guide

● How long does it take to process the cache?● How long does it take to sort?

NoteIf the job takes a long time to sort, redesign the data flow to push down the sort operation to the database server.

● How much time elapses before a blocking operation sends out the first row?

NoteA blocking operation prevents Data Services from performing a full push-down operation during job execution. If the data flow contains resource-intensive operations after a blocking operation, add the Data_Transfer transforms to push-down the resource-intensive operations to the database server.

Access the Monitor log during job execution in the following ways:

● Click the Monitor icon at the top of the execution dialog in Designer when it is activated.● Click the Monitor link in the Batch Job Status page of the Management Console Administrator.

ExampleThe following is an example of the information in the Monitor log for a very simple batch job that ended with warnings:

Log: monitor_05_31_2019_12_36_02_11_3fe9558a_0758_40789fc9+a2ea80d4ec50.txt Job Server: <job_server_name name>

Job name: JOB_sample_job

Path name State Row CountElapsed time (secs)

Absolute time (secs)

+JOB_sample_job/sample_reader

STOP 42 0.015 3.548

/JOB_sample_job/transform_name

PROCEED 42 0.000 00.000

/JOB_sample_job/Query

PROCEED 42 0.000 00.000

/JOB_sample_job/sample_loader

READY 42 0.000 00.000

NoteThe Absolute time column in the Monitor log displays the total time from the start of the job execution to the completion of the data flow execution.

Parent topic: Checking system utilization [page 19]

26 PUBLICPerformance Optimization Guide

Measure job performance

Page 27: Performance Optimization Guide

Related Information

CPU utilization [page 20]CPU bottleneck factors [page 21]Memory [page 22]Analyze trace logs for task duration [page 24]Read the Performance Monitor for execution statistics [page 27]Read Operational Dashboard for execution statistics [page 28]Set Monitor sample rate [page 15]Use persistent cache [page 64]Data_Transfer transform for push-down operations [page 45]

4.2.6 Read the Performance Monitor for execution statistics

The Performance Monitor displays execution times for each work flow, data flow, and sub data flow within a job.

Use the Performance Monitor to answer the following questions:

● What data flows are possible bottlenecks?● How much time did a data flow or sub data flow take to execute?● How many rows did the data flow or sub data flow process?● How much memory did a specific data flow use?

NoteMemory statistics in the Cache Size column display in the Performance Monitor only when you select the Collect statistics for monitoring option in the Execution Options setting.

Parent topic: Checking system utilization [page 19]

Related Information

CPU utilization [page 20]CPU bottleneck factors [page 21]Memory [page 22]Analyze trace logs for task duration [page 24]Read the Monitor log for execution statistics [page 25]Read Operational Dashboard for execution statistics [page 28]

Performance Optimization GuideMeasure job performance PUBLIC 27

Page 28: Performance Optimization Guide

4.2.6.1 Viewing the Performance Monitor

View the Performance Monitor in the Administrator of the Management Console.

Access the Management Console from within Designer using the Tools menu or from the Start menu in Windows.

1. Select Batch <repository_name> where <repository_name> is the name of the repository on which the job ran.

2. On the Batch Job Status page, find the applicable job execution instance.3. Under Job Information for the instance, click Performance Monitor.4. To view the cache size for each object in the data flow, click the data flow name under the Data Flow

column.

Management Console opens the Transforms tab that displays the cache size and the object name, type, start time, end time, execution time, and row count for each object in the data flow.

Related Information

Monitoring and tuning In Memory and Pageable caches [page 67]

4.2.7 Read Operational Dashboard for execution statistics

The Management Console Operational Dashboard provides graphical depictions of SAP Data Services job execution statistics.

The Operational Dashboard provides graphs and statistics from which you can easily view operational information. View the status and performance of your job executions for a specific repository, or all repositories over a given time period, such as the last day or week, or the last 24 hours.

The Operational Dashboard provides the job history statistics information. Use the history and statistics for more processing control and increased job performance. Use the information in the detailed visualized dashboard to:

● Determine how many jobs succeeded, failed, or had warnings over a given time period.● Identify job performance by checking the CPU or buffer uses of dataflow.● Monitor job scheduling and management for better performance.

Parent topic: Checking system utilization [page 19]

Related Information

CPU utilization [page 20]

28 PUBLICPerformance Optimization Guide

Measure job performance

Page 29: Performance Optimization Guide

CPU bottleneck factors [page 21]Memory [page 22]Analyze trace logs for task duration [page 24]Read the Monitor log for execution statistics [page 25]Read the Performance Monitor for execution statistics [page 27]

4.2.7.1 Comparing execution times for the same job over time

To compare execution times for jobs and data flow, use the Operational Dashboard and the Administrator in the Management Console.

Open the Management Console through the Tools menu in Designer, or by using the Start menu in Windows.

1. On the Home page of the Management Console, click Operational Dashboard.2. In the Dashboard tab, view a pie chart of SAP Data Services job execution statistics. View a bar graph

showing the job successes or failures over a specified time period.3. View the job execution history of a specific job by double-clicking the name of the job in the table.4. View audit data by double-clicking an instance of the job in the Job Execution History table.5. To close the Job Execution Details and the Job Details dialog boxes, click the “X” in the upper right corner

of the screen.6. To go back to the Home page, click Home in the upper right of the screen.7. Open the Administrator.8. Select the Job Execution History node in the navigation tree at left.9. Select a batch job from the Available batch jobs dropdown list.10. Select the number of days from the View history for dropdown list.

The results appear in a table, which displays repository name, job name, start and end time, execution time, status, and so on.

11. Select the Data Flow Execution History tab.12. Enter either a data flow name in Data Flow, or select a job name from the Job Name dropdown list.13. Select a number from the View history for dropdown list.

Related information appears in a table that contains repository name, data flow name, job name, start and end time of the data flow execution, and so on.

Related Information

Operational DashboardJob Execution History node

Performance Optimization GuideMeasure job performance PUBLIC 29

Page 30: Performance Optimization Guide

5 Job execution strategies

To improve performance, use job execution strategies to manage job execution.

The following table describes job execution strategies for improving performance.

Strategy Description

Increase push-down operations SAP Data Services automatically distributes the processing workload by pushing down as many SQL SELECT operations as possible to the source database server.

Data flow design influences the number of SQL operations that the software pushes to the database server. Before exe­cuting a job, view the SQL operations and adjust your design to maximize the SQL that is pushed down.

Improve throughput Use the following features to improve throughput:

● Caching for faster data access: Caching into memory limits the number of times the system has to access the database.

● Bulk loading to the target database: The software sup­ports database bulk loading engines including the Ora­cle bulk load API. Consider running multiple bulk load­ing processes in parallel when applicable.

Implement advanced tuning techniques Use the following advanced tuning techniques:

● Parallel processing● Parallel threads● Server groups and distribution levels

Using advanced tuning options [page 31]To improve performance for jobs with CPU-intensive and memory-intensive operations, use the advanced tuning techniques.

Related Information

Maximize push-down operations [page 33]Parallel Execution [page 71]Cache data [page 56]

30 PUBLICPerformance Optimization Guide

Job execution strategies

Page 31: Performance Optimization Guide

5.1 Using advanced tuning optionsTo improve performance for jobs with CPU-intensive and memory-intensive operations, use the advanced tuning techniques.

The following table describes advanced tuning techniques.

Advanced tuning technique Description

Parallel processes To save processing time, processes multiple operations in a job simultaneously.

Ensure that you do not connect individual work flows to data flows in the Designer workspace. For more details about par­allel processes, see the topics in Parallel Execution [page 71].

Parallel threads To control the number of instances for a source, target, and transform that can run in parallel within a data flow, use the following options:

● Partitioned source tables● Partitioned target tables● Degree of parallelism

Each instance runs as a separate thread and can run on a separate CPU.

Server groups and distribution levels A server group is a group of Job Servers from different com­puters formed into a logical component called a server group.

Server groups:

● Automatically measure resource availability on each Job Server in the group

● Distributes scheduled batch jobs to the computer with the lightest load at runtime

● Provides a hot backup method

NoteIn a hot backup, if one Job Server in the server group is down, another Job Server in the group processes the job.

Distribution levels: Distribute resource-intensive operations across multiple Job Servers within the server group. For more information about distribution levels, see the topics in Distribute data flow execution [page 105].

Parent topic: Job execution strategies [page 30]

Performance Optimization GuideJob execution strategies PUBLIC 31

Page 32: Performance Optimization Guide

Related Information

Server GroupsUsing grid computing to distribute data flow execution [page 117]

32 PUBLICPerformance Optimization Guide

Job execution strategies

Page 33: Performance Optimization Guide

6 Maximize push-down operations

To optimize performance, SAP Data Services pushes down as many transform operations as possible to the source or target database and combines as many operations as possible into one request to the database.

The optimizer, which is the optimization application inside the engine, pushes down as many SELECT operations as possible to the source database. Then it combines as many operations as possible into one request and sends it to the database server. Push-down operations free up your system resources for processing jobs.

You can control the push-down operations through data flow design. For example, filters and aggregations minimize the amount of data your system sends over the network, and the number of rows retrieved. Therefore, design your data flow so that the job pushes down filters and aggregations to the database server.

After you save your job, Data Services generates database-specific SQL SELECT statements for SQL sources and targets based on the data flow diagram. You can examine and use the generated SQL to make the data flow design more efficient. For example, if you notice that your data flow includes resource-intensive operations such as joins, add the Data_Transfer transform to your data flow to push down the join operation to the database. Resource-intensive operations include joins, GROUP BY, ORDER BY, and DISTINCT.

Data Services uses full and partial push-down operations. When determining which operations to push to the database, Data Services examines the database and its environment. Some operations cannot be pushed down.

Push-down types

Push-down type Description

Full Pushes down all transform operations to the database serv­ers and the data streams directly from the source database to the target database.

Partial Pushes down the SELECT statements to the source data­base server.

For a list of supported push-down operators, functions and transforms for your database type, see SAP Note 2212730 .

Full push-down operations [page 34]Data Services pushes down all applicable transform operations to the database and data streams.

Partial push-down operations [page 36]The optimizer in SAP Data Services pushes down SELECT statements to the source database when a full push-down operation is not possible.

Operations that cannot be pushed down [page 37]There are some operations that SAP Data Services cannot push down for several reasons.

Push-down examples [page 38]Push-down examples help you learn how to take advantage of the push-down features in SAP Data Services.

Query transform viewing optimized SQL [page 44]

Performance Optimization GuideMaximize push-down operations PUBLIC 33

Page 34: Performance Optimization Guide

View the optimized SQL before you execute a job to verify that the software generates the commands you expect.

Data_Transfer transform for push-down operations [page 45]The Data_Transfer transform enables SAP Data Services to push down certain resource-intensive operations to a transfer object that pushes operations to the database server for more efficient processing.

Linked datastores [page 49]SAP Data Services refers to the communication paths between databases as database links.

Related Information

Push-down optimizer

6.1 Full push-down operations

Data Services pushes down all applicable transform operations to the database and data streams.

The optimizer, which is the optimization application inside the engine, always tries to do a full push-down operation first. It pushes down all transform operations to the databases and the data streams directly from the source database to the target database. The process for a full push-down operation consists of the following:

● Data Services sends SQL INSERT INTO...SELECT statements to the target database server.● A SELECT statement retrieves data from the source.

Data Services performs a full push-down operation to the source and target databases under the following conditions:

● All of the operations between the source table and target table are supported push-down operations by the database type.

● The source and target tables are either from the same datastore, they have datastores with a database link defined between them, or the datastore has linked remote servers.

To ensure that the optimizer performs a full push-down from the source to the target, you may have to use one or more of the following features in your job setup:

● Data_Transfer transform● Linked datastores● Linked Remote Servers

Auto-correct load [page 35]SAP Data Services has an auto-correct load feature that ensures that it does not send duplicated rows to the target table.

Parent topic: Maximize push-down operations [page 33]

34 PUBLICPerformance Optimization Guide

Maximize push-down operations

Page 35: Performance Optimization Guide

Related Information

Partial push-down operations [page 36]Operations that cannot be pushed down [page 37]Push-down examples [page 38]Query transform viewing optimized SQL [page 44]Data_Transfer transform for push-down operations [page 45]Linked datastores [page 49]

6.1.1 Auto-correct load

SAP Data Services has an auto-correct load feature that ensures that it does not send duplicated rows to the target table.

When you select Yes for Auto correct load in the target table editor, also select Yes for Allow merge or upsert. With Allow merge or upsert, the optimizer may use a MERGE statement to improve the performance of auto-correct load.

For a full push-down, when Data Services can push all other operations in the data flow to the source database, it can also push the auto-correct load operation to the target database.

The MERGE statement that Data Services generates for the Allow merge or upsert option uses the syntax SQL MERGE INTO <target>. To complete the MERGE statement, ensure that you set the Ignore columns with value and Ignore columns with null options. The MERGE statement implements the options during auto-correct load.

The following table describes the auto-correct load options, which are in the target table editor for supported databases.

Auto-correct load options

Option Description

Allow merge or upsert Specifies whether the optimizer may use a MERGE state­ment to improve the performance of auto correct load func­tionality.

Ignore columns with value Specifies a value that might appear in a source column that you do not want updated in the target table during auto-cor­rect loading.

Ignore columns with null Prevents Data Services from updating NULL source columns in the target table during auto-correct loading.

For full descriptions of the update control options in the target table editor, see the Reference Guide.

Parent topic: Full push-down operations [page 34]

Performance Optimization GuideMaximize push-down operations PUBLIC 35

Page 36: Performance Optimization Guide

Related Information

Update Control

6.2 Partial push-down operations

The optimizer in SAP Data Services pushes down SELECT statements to the source database when a full push-down operation is not possible.

The following table contains the operations within the SELECT statement that Data Services can push to the database.

SELECT statement operations for push-down

Operation Description

Aggregations Typically used with a Group by statement. Always produces a data set smaller than or the same size as the original data set.

Distinct rows Outputs unique rows only.

Filtering Produces a data set smaller than or equal to the original data set.

Joins Typically produces a data set smaller than or similar in size to the original table. Data Services can push down joins when either of the following conditions exist:

● The source tables to be joined are in the same data­store.

● The source tables to be joined are in datastores that have a database link defined between them.

Ordering Sorts data sets that fit in memory. Does not affect data-set size. Recommendation: Push down the Order By for very large data sets.

Projection Produces a subset of columns that you map on the Mapping tab in the query editor. Produces a smaller data set because it only returns columns needed by subsequent operations in a data flow.

Functions Translates most functions that have equivalents in the un­derlying database. Translated functions include decode, ag­gregation, and string.

Parent topic: Maximize push-down operations [page 33]

36 PUBLICPerformance Optimization Guide

Maximize push-down operations

Page 37: Performance Optimization Guide

Related Information

Full push-down operations [page 34]Operations that cannot be pushed down [page 37]Push-down examples [page 38]Query transform viewing optimized SQL [page 44]Data_Transfer transform for push-down operations [page 45]Linked datastores [page 49]

6.3 Operations that cannot be pushed down

There are some operations that SAP Data Services cannot push down for several reasons.

SAP Data Services cannot push some transform operations to the database for the following reasons:

● When expressions have functions that do not have database correspondents.● When load operations contain triggers.● When jobs contain transforms other than the Query transform.● When joins involve sources on different database servers that do not have a defined link between them.

Similarly, the software cannot always combine operations into single requests. For example, when a stored procedure contains a COMMIT statement or does not return a value, the software cannot combine the stored procedure SQL with the SQL for other operations in a query.

Data Services only pushes operations supported by the DBMS down to that DBMS server. Therefore, for best performance, try not to intersperse SAP Data Services transforms among operations that can be pushed down to the database.

For a list of supported push-down operators and functions for your database type, see 2212730 .

Parent topic: Maximize push-down operations [page 33]

Related Information

Full push-down operations [page 34]Partial push-down operations [page 36]Push-down examples [page 38]Query transform viewing optimized SQL [page 44]Data_Transfer transform for push-down operations [page 45]Linked datastores [page 49]

Performance Optimization GuideMaximize push-down operations PUBLIC 37

Page 38: Performance Optimization Guide

6.4 Push-down examples

Push-down examples help you learn how to take advantage of the push-down features in SAP Data Services.

The variety of examples in this section help you understand the concept of full and partial pushdowns as well as using the auto correct load options, and collapsing transforms to combine SELECT statements.

Example 1: Collapsing transforms to push down operations [page 39]SAP Data Services collapses commands from two Query transforms so that there is a single command to push down to the database server.

Example 2: Full push down from source to target [page 40]SAP Data Services can perform a full push down even when the source and target do not use the same datastore.

Example 3: Full push down for auto correct load to target [page 41]The optimizer in SAP Data Services does a full push-down operation with auto-correct load enabled for supported databases.

Example 4: Partial push down to source [page 42]SAP Data Services performs a partial push-down operation when the data flow contains operations that cannot be passed to the database.

Example 5: Push-down SQL join [page 43]SAP Data Services pushes down a query when the tables in a join meet the requirements for a push-down operation.

Parent topic: Maximize push-down operations [page 33]

Related Information

Full push-down operations [page 34]Partial push-down operations [page 36]Operations that cannot be pushed down [page 37]Query transform viewing optimized SQL [page 44]Data_Transfer transform for push-down operations [page 45]Linked datastores [page 49]

38 PUBLICPerformance Optimization Guide

Maximize push-down operations

Page 39: Performance Optimization Guide

6.4.1 Example 1: Collapsing transforms to push down operations

SAP Data Services collapses commands from two Query transforms so that there is a single command to push down to the database server.

Data Services collapses all transforms in a data flow to form the minimum set of transformations. Then it determines the best way to push-down the operations to the database. The software pushes all possible operations on tables of the same database down to the database server.

ExampleThe following data flow collapses the Query1 and Query2 transforms and extracts the specified rows. There is one source, so the extracted rows are from the same data source.

The first query selects only the rows in the source where column A contains a value greater than 100. The second query refines the extraction further, reducing the number of columns returned and further reducing the qualifying rows.

The software collapses the two queries into a single command for the DBMS to execute. The following command uses AND to combine the WHERE clauses from the two queries:

SELECT A, MAX(B), C FROM source WHERE A > 100 AND B = C GROUP BY A, C

Instead of pushing down two commands, one from each Query transform, Data Services is able to combine the commands into one command to push down to the database server.

Parent topic: Push-down examples [page 38]

Performance Optimization GuideMaximize push-down operations PUBLIC 39

Page 40: Performance Optimization Guide

Related Information

Example 2: Full push down from source to target [page 40]Example 3: Full push down for auto correct load to target [page 41]Example 4: Partial push down to source [page 42]Example 5: Push-down SQL join [page 43]

6.4.2 Example 2: Full push down from source to target

SAP Data Services can perform a full push down even when the source and target do not use the same datastore.

When the source and target are not from the same datastore, you can use a Data_Transfer transform in the data flow, or create a link between the datastores. If you don't create a link between the datastores, use a Data_Transfer transform. A Data_Transfer transform splits processing into two subdata flows, where one data flow has the resource-consuming operations pushed down to the database server.

ExampleFor the data flow in Example 1: Collapsing transforms to push down operations [page 39], a full push down passes the following statement to the database:

INSERT INTO target (A, B, C) SELECT A, MAX(B), C FROM source WHERE A > 100 AND B = C GROUP BY A, C

However, if the source and target do not use the same datastore, use one of the following features to ensure a full push-down:

● Add a Data _Transfer transform before the target object in the data flow.● Define a database link between the two datastores.

Parent topic: Push-down examples [page 38]

Related Information

Example 1: Collapsing transforms to push down operations [page 39]Example 3: Full push down for auto correct load to target [page 41]Example 4: Partial push down to source [page 42]Example 5: Push-down SQL join [page 43]Using Data_Transfer tables to speed up auto correct loads [page 47]

40 PUBLICPerformance Optimization Guide

Maximize push-down operations

Page 41: Performance Optimization Guide

6.4.3 Example 3: Full push down for auto correct load to target

The optimizer in SAP Data Services does a full push-down operation with auto-correct load enabled for supported databases.

The optimizer can perform a full push down when you enable the Auto correct load and Allow merge or upsert options. Data Services generates a MERGE statement when you enable Allow merge or upsert. The SQL statement uses the MERGE into the target with a SELECT statement from the source.

For the Allow merge or upsert option to generate a MERGE statement, the following factors must exist in the source and target tables:

● The primary key of the source table must be a subset of the primary key of the target table.● The source row must be unique on the target primary key; there cannot be duplicate rows in the source

data.

If your source and target tables meet these conditions, the optimizer pushes down the operation using a database-specific method to identify, update, and insert rows into the target table.

ExampleA data flow has Oracle source and target tables that are in the same datastore. The data flow has the following options set to Yes: Auto correct load and Allow merge or upsert.

The push-down operation passes the following statement to an Oracle database:

MERGE INTO "ODS"."TARGET" s USINGSELECT "SOURCE"."A" A , "SOURCE"."B" B , "SOURCE"."C" C FROM "ODS"."SOURCE" "SOURCE" ) nON ((s.A = n.A))WHEN MATCHED THENUPDATE SET s."B" = n.B, s."C" = n.CWHEN NOT MATCHED THENINSERT /*+ APPEND */ (s."A", s."B", s."C" ) VALUES (n.A , n.B , n.C)

NoteEach applicable database type that supports auto-correct load uses similar MERGE and SELECT statements.

Parent topic: Push-down examples [page 38]

Related Information

Example 1: Collapsing transforms to push down operations [page 39]Example 2: Full push down from source to target [page 40]

Performance Optimization GuideMaximize push-down operations PUBLIC 41

Page 42: Performance Optimization Guide

Example 4: Partial push down to source [page 42]Example 5: Push-down SQL join [page 43]

6.4.4 Example 4: Partial push down to source

SAP Data Services performs a partial push-down operation when the data flow contains operations that cannot be passed to the database.

If the data flow contains operations that cannot be passed to the DBMS, the software optimizes the transformation differently than Examples 2 and 3.

ExampleIf Query1 called func(A) > 100, where func is a Data Services custom function, then the software generates two commands:

● The first query contains the following generated command, which the source database executes:

SELECT A, B, C FROM source WHERE B = C

● The second query contains the following generated command, which Data Services executes:

SELECT A, MAX(B), C FROM Query1 WHERE func(A) > 100 GROUP BY A, C

Data Services executes the second command because func is a custom function that the database does not support.

Parent topic: Push-down examples [page 38]

Related Information

Example 1: Collapsing transforms to push down operations [page 39]Example 2: Full push down from source to target [page 40]Example 3: Full push down for auto correct load to target [page 41]Example 5: Push-down SQL join [page 43]

42 PUBLICPerformance Optimization Guide

Maximize push-down operations

Page 43: Performance Optimization Guide

6.4.5 Example 5: Push-down SQL join

SAP Data Services pushes down a query when the tables in a join meet the requirements for a push-down operation.

After you configure the dataflow, confirm that Data Services will use a push-down operation by reviewing the optimized SQL. If the optimized SQL shows a single SELECT statement, then Data Services performs a full push-down operation.

ExampleIn the following data flow, the Department and Employee tables are joined with an inner join. The result of the inner join is further joined with a left outer join to the Bonus table.

The resulting optimized SQL contains a single select statement. Therefore, Data Services pushes down the entire query to the database server.

SELECT DEPARTMENT.DEPTID, DEPARTMENT.DEPARTMENT, EMPLOYEE.LASTNAME, BONUS.BONUSFROM (DEPARTMENT INNER JOIN EMPLOYEE(ON DEPARTMENT.DEPTID=EMPLOYEE.DEPTID))LEFT OUTER JOIN BONUS ON (EMPLOYEE.EMPID = BONUS.EMPID)

Parent topic: Push-down examples [page 38]

Related Information

Example 1: Collapsing transforms to push down operations [page 39]Example 2: Full push down from source to target [page 40]Example 3: Full push down for auto correct load to target [page 41]Example 4: Partial push down to source [page 42]

Performance Optimization GuideMaximize push-down operations PUBLIC 43

Page 44: Performance Optimization Guide

6.5 Query transform viewing optimized SQL

View the optimized SQL before you execute a job to verify that the software generates the commands you expect.

If you determine the generated SQL commands are not what you expect, alter the design to improve the data flow and to improve processing performance. To view the optimized SQL, perform the following steps:

1. In Designer, validate and save the applicable data flow.

2. With the data flow open in the workspace, select Validation Display Optimized SQL .

NoteIf you select Display Optimized SQL when there are no SQL sources in the data flow, Data Services issues an alert message and does not display the SQL.

The Optimized SQL dialog box opens displaying a list of datastores on the left, and the optimized SQL code for the selected datastore on the right. By default, the software selects the first datastore. Data Services shows only the SELECT generated for table sources and INSERT INTO...SELECT for table targets. Data Services does not show SQL generated for SQL sources that are not table sources. For example, it does not show SQL for:○ Lookup function○ Key_Generation function○ Key_Generation transform○ Table_Comparison transform

3. Select a datastore name from the list of datastores on the left to view the SQL that the data flow applies against the corresponding database or application.

ExampleThe following optimized SQL illustrates a full push-down operation (INSERT INTO...SELECT bolded in sample code). In the data flow, a Data_Transfer transform creates a transfer table that Data Services loads directly into the target.

INSERT INTO "DBO"."ORDER_AGG" ("SHIPCOUNTRY","SHIPREGION", "SALES_AGG") SELECT "TS_Query_Lookup"."SHIPCOUNTRY" , "TS_Query_Lookup"."SHIPREGION" ,sum("TS_Query_Lookup"."SALES") FROM"DBO"."TRANS2""TS_Query_Lookup" GROUP BY "TS_Query_Lookup"."SHIPCOUNTRY" , "TS_Query_Lookup"."SHIPREGION"

4. Optional. Perform the following tasks with the optimized SQL in view:○ Click Find to search for a string in the optimized SQL.○ Click Save As to save the optimized SQL as an SQL file.

5. If you change anything in your data flow, make sure to save the data flow to update the displayed SQL.

After time, you may change the data flow. After any change, remember to save the data flow and view the updated optimized SQL.

Task overview: Maximize push-down operations [page 33]

44 PUBLICPerformance Optimization Guide

Maximize push-down operations

Page 45: Performance Optimization Guide

Related Information

Full push-down operations [page 34]Partial push-down operations [page 36]Operations that cannot be pushed down [page 37]Push-down examples [page 38]Data_Transfer transform for push-down operations [page 45]Linked datastores [page 49]

6.6 Data_Transfer transform for push-down operations

The Data_Transfer transform enables SAP Data Services to push down certain resource-intensive operations to a transfer object that pushes operations to the database server for more efficient processing.

Resource-intensive operations include full push-down operations (INSERT INTO...SELECT), joins, GROUP BY, ORDER BY, and DISTINCT.

Find the Data_Transfer transform in the Data Integrator transform group in Designer. For more information about the Data_Transfer transform, see the Reference Guide.

Push down an operation after a blocking operation [page 46]A blocking operation prevents SAP Data Services from performing a full push-down operation during job execution.

Using Data_Transfer tables to speed up auto correct loads [page 47]The auto correct load operation prevents a full push-down operation from the source table to the target table when the source and target are in different datastores.

Parent topic: Maximize push-down operations [page 33]

Related Information

Full push-down operations [page 34]Partial push-down operations [page 36]Operations that cannot be pushed down [page 37]Push-down examples [page 38]Query transform viewing optimized SQL [page 44]Linked datastores [page 49]

Performance Optimization GuideMaximize push-down operations PUBLIC 45

Page 46: Performance Optimization Guide

6.6.1 Push down an operation after a blocking operation

A blocking operation prevents SAP Data Services from performing a full push-down operation during job execution.

During job execution, any processes that appear after the blocking operation cannot be pushed down to the database server. To work around a blocking operation, use the Data_Transfer transform. The Data_Transfer transform unblocks the data flow when you place it after the blocking operation in the data flow. After you place the Data_Transfer transform in the data flow, Data Services performs a full push-down operation.

ExampleYour data flow groups sales order records by country and region, and sums the sales amounts to find which regions are generating the most revenue. The following diagram shows a data flow that contains:

● A Pivot transform to obtain orders by Customer ID● A Query transform that contains a lookup_ext function to obtain sales subtotals● Another Query transform with GROUP BY to group the results by country and region

Because the Pivot transform and the Query transform with the lookup_ext function come before the Query transform with the GROUP BY clause, Data Services cannot push down the GROUP BY operation. The optimized SQL for the data flow shows the SELECT statement that the software pushes down to the source database:

SELECT "ORDERID", "CUSTOMERID", "EMPLOYEEID", "ORDERDATE", "REQUIREDDATE", "SHIPPEDDATE",, "SHIPVIA" "FREIGHT", "SHIPNAME", "SHIPADDRESS", "SHIPCITY", "SHIPREGION", "SHIPPOSTALCODE", "SHIPCOUNTRY" FROM "DBO"."ORDERS"

Now add the Data_Transform before the second Query transform in the data flow. Select Table for the transfer type in the transform editor. Then specify a transfer table from the same datastore as the target table. Save the new data flow and view the optimized SQL to see that now Data Services pushes down the GROUP BY operation.

46 PUBLICPerformance Optimization Guide

Maximize push-down operations

Page 47: Performance Optimization Guide

The new optimized SQL shows that the software pushed down the GROUP BY function to the transfer table TRANS2:

INSERT INTO "DBO"."ORDER_AGG" ("SHIPCOUTNRY", "SHIPREGION", "SALES_AGG") SELECT "TS_Query_Lookup"."SHIPCOUNTRY" , "TS_Query_Lookup"."SHIPREGION" , sum("TS_Query_Lookup"."SALES") FROM "DBO"."TRANS2""TS_Query_Lookup" GROUP BY "TS_Query_Lookup"."SHIPCOUNTRY" , "TS_Query_Lookup"."SHIPREGION"

Parent topic: Data_Transfer transform for push-down operations [page 45]

Related Information

Using Data_Transfer tables to speed up auto correct loads [page 47]Operations that cannot be pushed down [page 37]

6.6.2 Using Data_Transfer tables to speed up auto correct loads

The auto correct load operation prevents a full push-down operation from the source table to the target table when the source and target are in different datastores.

Auto correct loading ensures that SAP Data Services does not duplicate the same row in a target table, which is useful for data recovery operations. However, when you use auto correct load for source and targets from different datastores, a Data_Transfer transform can enable a full push-down operation. Also, Data_Transfer tables can speed up the auto correct load process when you have large loads.

For large loads that have database targets that support the Allow merge or upsert option for auto correct load, add a Data_Transfer transform before the target object in your data flow.

Ensure that the following conditions exist between your source and target tables so that the Allow merge or upsert option generates a MERGE statement and prevents duplicate rows in the source data:

● The primary key of the source table must be a subset of the primary key of the target table.● The source row must be unique on the target primary key.

Performance Optimization GuideMaximize push-down operations PUBLIC 47

Page 48: Performance Optimization Guide

If your source and target tables do not meet these conditions, and there are duplicate rows in the source data, the Data Services optimizer pushes down the operation using a database-specific method to identify, update, and insert rows into the target table.

If the MERGE statement can be used, SAP Data Services generates a SQL MERGE INTO <target> statement that implements the following options in the target editor:

● Ignore columns with value: When the value you specify appears in the source column, Data Services does not update the corresponding target column during auto correct loading.

● Ignore columns with null: When enabled, and the source contains a NULL column, Data Services does not update the corresponding target column with NULL.

ExampleA data flow loads sales orders into an Oracle target table that is in a different datastore than the source. The Auto correct load, Ignore columns with null, and Allow merge or upsert options are enabled.

The following optimized SQL shows the SELECT statement that the software pushes down to the source database:

SELECT "ODS_SALESORDER"."SALES_ORDER_NUMBER" , "ODS_SALESORDER"."ORDER_DATE" , "ODS_SALESORDER"."CUST_ID" FROM "ODS"."ODS_SALESORDER""ODS_SALESORDER"

To enable Data Services to perform a push-down operation:

● Place a Data_Transfer transform in the data flow before the target object.● Define a transfer table in the Data_Transform editor that is from the same datastore as the target table.

After you save the above changes to the data flow, the updated optimized SQL shows the MERGE statement that Data Services pushes down to the Oracle target:

MERGE INTO "TARGET"."AUTO_CORRECT_LOAD2_TARGET" s USING (SELECT "AUTOLOADTRANSFER"."SALES_ORDER_NUMBER" SALES_ORDER_NUMBER, "AUTOLOADTRANSFER"."ORDER_DATE" ORDER_DATE, "AUTOLOADTRANSFER"."CUST_ID" CUST_ID FROM "TARGET"."AUTOLOADTRANSFER" "AUTOLOADTRANSFER") n ON ((s.SALES_ORDER_NUMBER=n.SALES_ORDER_NUMBER00 WHEN MATCHED THENUPDATE SET s."ORDER_DATE"=nvl(n.ORDER_DATE,s."ORDER_DATE"), s."CUST_ID"=nbl(n.CUST_ID,s."CUST_ID" WHEN NOT MATCHED THENINSERT(s."SALES_ORDER_NUMBER",s."ORDER_DATE",s."CUST_ID") VALUES(n.SALES_ORDER_NUMBER,n.ORDER_DATE,n.CUSTID)

Other database types use similar statements.

For complete information about the options for the Data_Transfer transform, see the Reference Guide.

Parent topic: Data_Transfer transform for push-down operations [page 45]

Related Information

Push down an operation after a blocking operation [page 46]

48 PUBLICPerformance Optimization Guide

Maximize push-down operations

Page 49: Performance Optimization Guide

6.7 Linked datastores

SAP Data Services refers to the communication paths between databases as database links.

Various database types support one-way communication paths from one database server to another. The datastores in Data Services that are in a database link relationship are called “linked datastores”, or, in the case of SAP IQ (Sybase IQ), “linked remote server”.

Data Services uses linked datastores to enhance performance by pushing down operations to a target database using a target datastore. The advantages of pushing down operations to a database include:

● Reduces the amount of information to transfer between the databases and SAP Data Services● Takes advantage of the various database management systems capabilities, such as various join

algorithms.

With support for database links, Data Services pushes processing down from different datastores, which can refer to the same or different database type.

Linked datastores allow a one-way path for data. For example, if you import a database link from target database B and link datastore B to datastore A, the software pushes the load operation down to database B, not to database A.

Linked datastores database software requirements [page 50]SAP Data Services supports several database types for local linked datastores.

Use linked remote servers [page 52]Linked datastores enable a full push-down operation (INSERT INTO... SELECT) to a target when all of the sources are linked with the target.

Syntax for generated SQL statements [page 54]The SQL statement generated by SAP Data Services uses the syntax for the database type.

Tuning performance at the data flow or Job Server level [page 54]Tune job performance for linked datastore jobs at the data flow or the Job Server levels.

Parent topic: Maximize push-down operations [page 33]

Related Information

Full push-down operations [page 34]Partial push-down operations [page 36]Operations that cannot be pushed down [page 37]Push-down examples [page 38]Query transform viewing optimized SQL [page 44]Data_Transfer transform for push-down operations [page 45]

Performance Optimization GuideMaximize push-down operations PUBLIC 49

Page 50: Performance Optimization Guide

6.7.1 Linked datastores database software requirements

SAP Data Services supports several database types for local linked datastores.

To take advantage of linked datastores, create a database link on a database server that you intend to use as a target in a job.

The database software in the following table is required.

NoteTo find the supported database versions based on your version of Data Services, see the Product Availability Matrix.

Database requirements for linked datastores

Database type Information

IBM DB2 for iSeries Use the DB2 Information Services (previously known as Re­lational Connect) software and make sure that the database user has privileges to create and drop a nickname.

To end users and client applications, data sources appear as a single collective database in DB2. Users and applications interface with the database managed by the information server. Therefore, configure an information server and then add the external data sources. DB2 uses nicknames to iden­tify remote tables and views.

See the DB2 database manuals for more information about how to create links for DB2 and non-DB2 servers.

Oracle Use the Transparent Gateway (Oracle Database Gateways) for DB2 and MS SQL Server.

See the Oracle database manuals for more information about how to create database links for Oracle and non-Ora­cle servers.

MS SQL Server No special software is required.

Microsoft SQL Server supports access to distributed data stored in multiple instances of SQL Server and heterogene­ous data stored in various relational and non-relational data sources using an OLE database provider. SQL Server sup­ports access to distributed or heterogeneous database sour­ces in Transact-SQL statements by qualifying the data sour­ces with the names of the linked server where the data sour­ces exist.

See the MS SQL Server database manuals for more informa­tion.

Example: Push-down with linked datastores [page 51]Use a linked datastore for push-down operations in SAP Data Services.

50 PUBLICPerformance Optimization Guide

Maximize push-down operations

Page 51: Performance Optimization Guide

Parent topic: Linked datastores [page 49]

Related Information

Use linked remote servers [page 52]Syntax for generated SQL statements [page 54]Tuning performance at the data flow or Job Server level [page 54]

Example: Push-down with linked datastores [page 51]

6.7.1.1 Example: Push-down with linked datastores

Use a linked datastore for push-down operations in SAP Data Services.

Linked datastores enable a full push-down operation (INSERT INTO... SELECT) to the target when all the sources are linked with the target. The sources and target can be in datastores that use the same database type or different database types.

ExampleThe following diagram shows an example of a data flow that will take advantage of linked datastores:

The data flow joins three source tables from different database types:

● ora_source.HRUSER1.EMPLOYEE on \\oracle_server1● ora_source_2.HRUSER2.PERSONNEL on \\oracle_server2● mssql_source.DBO.DEPARTMENT on \\mssql_server3

The software loads the join result into the target table ora_target.HRUSER3.EMP_JOIN on \\oracle_server1.

Performance Optimization GuideMaximize push-down operations PUBLIC 51

Page 52: Performance Optimization Guide

In this data flow, the user (HRUSER3) created the following database links in the Oracle database oracle_server1.

Database Link NameLocal (to database link lo­cation) Connection Name

Remote (to database link location) Connection Name Remote User

orasvr2 oracle_server1 oracle_server2 HRUSER2

tg4msql oracle_server1 mssql_server DBO

To enable a full push-down operation, database links must exist from the target database to all source databases and links must exist between the following datastores:

● ora_target and ora_source● ora_target and ora_source2● ora_target and mssql_source

The software executes this data flow query as one SQL statement in oracle_server1:

INSERT INTO HR_USER3.EMP_JOIN (FNAME, ENAME, DEPTNO, SAL, COMM) SELECT psnl.FNAME, emp.ENAME, dept.DEPTNO, emp.SAL, emp.COMMFROM HR_USER1.EMPLOYEE emp, HR_USER2.PERSONNEL@orasvr2 psnl, oracle_server1.mssql_server.DBO.DEPARTMENT@tg4msql dept;

Parent topic: Linked datastores database software requirements [page 50]

6.7.2 Use linked remote servers

Linked datastores enable a full push-down operation (INSERT INTO... SELECT) to a target when all of the sources are linked with the target.

The sources and target can be in datastores that use the same database type or different database types.

ExampleOn a SAP IQ server that contains the target table for your job, create a remote server to the SAP IQ server that contains the source table for the job. On the target server, create an external logon to the remote server. Then create two datastores:

● Configure a datastore for the database on the local server.● Configure a datastore for the remote server.

Edit the datastore that you created for the local server and add a linked datastore connection in the Advanced options.

For instructions to create linked datastore connection, see the Designer Guide.

Example: Push-down with linked remote servers [page 53]Linked remote servers enable a full push-down operation (INSERT…LOCATION) to the target when all the target servers have remote links to the respective source server.

52 PUBLICPerformance Optimization Guide

Maximize push-down operations

Page 53: Performance Optimization Guide

Parent topic: Linked datastores [page 49]

Related Information

Linked datastores database software requirements [page 50]Syntax for generated SQL statements [page 54]Tuning performance at the data flow or Job Server level [page 54]Linking a target datastore to a source datastore using a database link

6.7.2.1 Example: Push-down with linked remote servers

Linked remote servers enable a full push-down operation (INSERT…LOCATION) to the target when all the target servers have remote links to the respective source server.

ExampleThe following diagram shows an example of a data flow that uses linked datastores. The source is an SAP ASE table named ods_customer. The target is an SAP IQ table named cust_dim. The SAP IQ datastore configuration has a remote server connection to the SAP ASE server named ase_remote.

The data flow selects data from ods_customer, transforms it using a Query, and loads it to the cust_dim target table using the INSERT…LOCATION SQL statement.

In the data flow, the remote server link to the source table server, ase_remote, uses the CREATE SERVER SQL statement in SAP IQ. There is also a remote logon for the server, ase_remote, that uses the CREATE EXTERNLOGIN SQL statement.

The software executes the data flow query as one SQL statement in SAP IQ:

INSERT INTO "DBA"."cust_dim" ( "Cust_ID" , "Cust_classf" , "Name1" , "Address" , "City" , "Region_ID" , "Zip" ) LOCATION 'ase_remote.cms57u05' PACKETSIZE 512 QUOTED IDENTIFIER ON ' SELECT "ods_customer"."Cust_ID" , "ods_customer"."Cust_classf" ,

Performance Optimization GuideMaximize push-down operations PUBLIC 53

Page 54: Performance Optimization Guide

"ods_customer"."Name1" , "ods_customer"."Address" , "ods_customer"."City" , "ods_customer"."Region_ID" , "ods_customer"."Zip" FROM "X9999"."ods_customer" "ods_customer"

Parent topic: Use linked remote servers [page 52]

6.7.3 Syntax for generated SQL statements

The SQL statement generated by SAP Data Services uses the syntax for the database type.

To see the SAP Data Services optimized SQL statements, open the applicable data flow and select Display Optimized SQL from the Validation menu.

● For DB2, the SQL syntax uses nicknames to refer to remote table references in the SQL display.● For Oracle, the SQL uses the following syntax to refer to remote table references:

<remote_table>@<dblink_name>.● For SQL Server, the SQL uses the following syntax to refer to remote table references:

<linked_server>.<remote_database>.<remote_user>.<remote_table>.

Parent topic: Linked datastores [page 49]

Related Information

Linked datastores database software requirements [page 50]Use linked remote servers [page 52]Tuning performance at the data flow or Job Server level [page 54]

6.7.4 Tuning performance at the data flow or Job Server level

Tune job performance for linked datastore jobs at the data flow or the Job Server levels.

Linked datastore push-downs

You may not see a job performance improvement for jobs that use linked datastores and that have push-down operations in the job. During testing, run the job with the linked-datastore push down operations turned off.

You may not see performance improvements, for example, when the underlying database doesn't process operations from different data sources well. For Oracle databases, SAP Data Services pushes down Oracle stored procedures and external functions. If the job uses linked datastores, performance may improve. However, Data Services does not push down functions imported from other databases, such as DB2. In this

54 PUBLICPerformance Optimization Guide

Maximize push-down operations

Page 55: Performance Optimization Guide

case, although you may be using database links, Data Services cannot push the processing down so there is no performance gain.

Before you decide to use linked datastores with a large development effort, test your assumptions about individual databases and job designs.

Data flow level

On the data flow properties dialog, Data Services enables the Use database links option by default to allow push-down operations using linked datastores. If you do not want to use linked datastores in a data flow to push down processing, deselect the checkbox.

Data Services can perform push downs using datastore links when the source and target tables share the same database type, database connection name, or datasource name. Data Services can perform push downs even when the tables have different schema names. However, you may experience problems with push downs, for example, when the user of one datastore does not have access privileges to the tables of another datastore. When there are access problems, disable Use database links.

Job Server level

You can disable linked datastores at the Job Server level. However, the Use database links option, at the data flow level takes precedence. So remember to disable it also at the data flow level .

For information about changing Job Server options, see the Designer Guide.

Parent topic: Linked datastores [page 49]

Related Information

Linked datastores database software requirements [page 50]Use linked remote servers [page 52]Syntax for generated SQL statements [page 54]Changing Job Server options

Performance Optimization GuideMaximize push-down operations PUBLIC 55

Page 56: Performance Optimization Guide

7 Cache data

Improve the performance of data transformations that occur in memory by caching as much data as possible.

By caching data, you limit the number of times the system accesses the database. SAP Data Services provides the following types of caches that your data flow can use for all of the operations it contains:

● In Memory: Use in-memory cache when your data flow processes a small amount of data that fits in memory.

● Pageable: Use pageable cache when your data flow processes a large amount of data that does not fit in memory. When memory-intensive operations, such as Group By and Order By, exceed available memory, the software uses pageable cache to complete the operation.

Pageable cache is the default cache type. To change the cache type, use the Cache type option in data flow Properties.

NoteIf your data fits in memory, use in-memory cache because pageable cache incurs an overhead cost.

Cache data sources [page 57]Cache source data in memory on the Job Server computer.

Cache joins [page 58]The join operation in a Query transform uses the cache settings from the source, unless you change the setting in the Query editor.

Setting data flow cache type [page 59]Set cache type at the data flow level to limit the number of times SAP Data Services has to access the database, which improves data transformation performance.

Cache source of lookup function [page 60]Improve performance by caching the source of lookup functions.

Lookup table as source outer join [page 61]To look up the required data, expose a lookup table as a source table in the data flow and use the table as an outer join in the Query transform.

Cache table comparisons [page 62]Improve the performance of the Table Comparison transform by caching the comparison table.

Specify a pageable cache directory [page 63]Specify a different directory for pageable cache when memory-consuming operations in a data flow exceed the available memory.

Use persistent cache [page 64]Persistent cache tables enable you to cache large amounts of data from relational database tables and files.

Monitoring and tuning caches [page 66]Determine cache type manually or let SAP Data Services select the cache type automatically.

56 PUBLICPerformance Optimization Guide

Cache data

Page 57: Performance Optimization Guide

7.1 Cache data sources

Cache source data in memory on the Job Server computer.

If your source is a table or file, SAP Data Services has the Cache option set to Yes by default. If your job has a Query transform that joins sources, the cache setting in the Query transform overrides the setting in the source table or file.

When you enable cache, also select a value for Cache type; either Pageable or In Memory. The data flow default cache type is Pageable. However, for smaller tables, remember to change the value of the Cache type option to In Memory in the data flow Properties.

To determine the cache type to use, calculate the approximate size of a table with the following formula:

Formula to determine cache type

table size = (in bytes) Number of rows × Number of columns × 20 bytes × 1.3

Note● 20 bytes is the average column size● 1.3 is 30% overhead

To keep your job running with the correct cache type, compute row count and table size on a regular basis, especially under the following circumstances:

● A table has significantly changed in size.● You experience decreased system performance.

Parent topic: Cache data [page 56]

Related Information

Cache joins [page 58]Setting data flow cache type [page 59]Cache source of lookup function [page 60]Lookup table as source outer join [page 61]Cache table comparisons [page 62]Specify a pageable cache directory [page 63]Use persistent cache [page 64]Monitoring and tuning caches [page 66]

Performance Optimization GuideCache data PUBLIC 57

Page 58: Performance Optimization Guide

7.2 Cache joins

The join operation in a Query transform uses the cache settings from the source, unless you change the setting in the Query editor.

Cache settings in the source include the following:

● Cache is enabled or disabled.● If enabled, the cache type: Pageable or In Memory.

In the Query editor, the cache setting is set to Automatic by default. The Automatic setting carries forward the cache settings from the source table.

When you configure joined sources in the Query transform, and you change the cache setting from Automatic, the cache setting in the Query transform overrides the setting in the source.

NoteIf any one input schema in the Query editor has a cache setting other than Automatic, the optimizer considers only the Query editor cache settings and ignores all source editor cache settings.

The following table shows the relationship between cache settings in the source and cache settings in the Query editor, and the effective cache setting for the join.

Cache Setting in Source Cache Setting in Query Editor Effective Cache Setting

Yes Automatic Yes

No Automatic No

Yes Yes Yes

No Yes Yes

Yes No No

No No No

NoteFor the best results when joining sources, we recommend that you define the join rank and cache settings in the Query editor.

The effect of cache setting on joins

In the Query editor, cache a source only when you use it as an inner source in a join.

If caching is enabled, and Data Services determines that data caching is possible, Data Services uses the source data in an inner join under the following conditions:

● The source is specified as the inner source of a left outer join.● When using an inner join between the two tables, the source has a lower join rank.

Caching does not affect the order in which tables are joined.

58 PUBLICPerformance Optimization Guide

Cache data

Page 59: Performance Optimization Guide

If Data Services pushes down operations to the underlying database because of optimization conditions, it ignores the cache setting.

If a table becomes too large to fit in the cache, ensure that you set the cache type to Pageable.

Parent topic: Cache data [page 56]

Related Information

Cache data sources [page 57]Setting data flow cache type [page 59]Cache source of lookup function [page 60]Lookup table as source outer join [page 61]Cache table comparisons [page 62]Specify a pageable cache directory [page 63]Use persistent cache [page 64]Monitoring and tuning caches [page 66]Join rank settings [page 187]

7.3 Setting data flow cache type

Set cache type at the data flow level to limit the number of times SAP Data Services has to access the database, which improves data transformation performance.

Open Designer and perform the following steps to change the cache type for a data flow:

1. Open the Data Flow tab in the object library.2. Right-click on the applicable data flow name and select Properties.3. On the General tab of the Properties dialog box, select the desired cache type in the Cache type dropdown

list.

Data flow cache types

Cache type Description

In Memory Uses available memory for storing data. Select when your data flow processes a small amount of data that can fit into the available memory.

Pageable Uses pageable cache when memory-intensive operations such as Group By and Order By exceed available memory.

Task overview: Cache data [page 56]

Performance Optimization GuideCache data PUBLIC 59

Page 60: Performance Optimization Guide

Related Information

Cache data sources [page 57]Cache joins [page 58]Cache source of lookup function [page 60]Lookup table as source outer join [page 61]Cache table comparisons [page 62]Specify a pageable cache directory [page 63]Use persistent cache [page 64]Monitoring and tuning caches [page 66]

7.4 Cache source of lookup function

Improve performance by caching the source of lookup functions.

The lookup function retrieves a value in a table or file based on the values in a different source table or file. Caching lookup sources improves performance because SAP Data Services avoids the expensive task of creating a database query or full file scan on each row.

Data Services supports caching for two out of the three lookup functions: lookup and lookup_ext. It does not support caching for the lookup_seq function.

Set cache options when you specify a lookup function. The following table describes the three caching options that you set in the <cache_spec> value in the syntax of the function.

Cache options

Option Description

NO_CACHE Reads values from the <lookup_table> for every row without caching values.

PRE_LOAD_CACHE Preloads the result column and compare column into memory before executing the lookup.

Select this option if the number of rows in the table is small or you expect to access a high percentage of the table values.

60 PUBLICPerformance Optimization Guide

Cache data

Page 61: Performance Optimization Guide

Option Description

DEMAND_LOAD_CACHE Loads <return_column_list>, <compare_column> (see <condition_list>), and <or­derby_column_list> into memory as the function identifies them.

Select this option if:

● The table has a large number of rows, and you expect to access frequently a low per­centage of table values.

● You use the table in multiple lookups with highly selective compare conditions, result­ing in a small subset of data.

Use this option when looking up repetitive values that make up a small subset of the data and when missing values are unlikely.

Demand-load caching of lookup values is helpful when the lookup results in the same value multiple times. Each time Data Services cannot find the value in the cache, it makes a new request to the database for that value. Even if the value is invalid, Data Services has no way of knowing if it is missing or just has not been cached yet.

When there are many values and some values might be missing, demand-load caching is significantly less efficient than caching the entire source.

Parent topic: Cache data [page 56]

Related Information

Cache data sources [page 57]Cache joins [page 58]Setting data flow cache type [page 59]Lookup table as source outer join [page 61]Cache table comparisons [page 62]Specify a pageable cache directory [page 63]Use persistent cache [page 64]Monitoring and tuning caches [page 66]

7.5 Lookup table as source outer join

To look up the required data, expose a lookup table as a source table in the data flow and use the table as an outer join in the Query transform.

You can choose to use the lookup function in SAP Data Services queries. However, an alternative is to expose the lookup table as the source table in a data flow and not use the lookup function.

Performance Optimization GuideCache data PUBLIC 61

Page 62: Performance Optimization Guide

After the source table, include the Query transform in your data flow and, if necessary, set the source table as an outer join. Using the lookup table as a source in a data flow and creating an outer join in the Query transform has some advantages:

● Provides a graphic view of the source table in the data flow diagram where you can view the table and easily maintain the data flow.

● Data Services can push down the execution of the join to the underlying database, even if the job requires an outer join.

There are also some disadvantages to using the lookup table as a source and not use the lookup function:

● You cannot specify default values in an outer join because the default is always NULL. If you use the lookup_ext function instead, you can specify default values.

● If an outer join returns multiple rows, you cannot specify which row to return. If you use the lookup_ext function instead, you can specify MIN or MAX.

● The workspace can become cluttered if there are too many objects in the data flow.● There is no option to use DEMAND_LOAD_CACHE in an outer join, which is useful when looking up only a

few repetitive values in a very large table.

TipIf you use the lookup table in multiple jobs, you can create a persistent cache that multiple data flows can access. For more information, see Use persistent cache [page 64].

Parent topic: Cache data [page 56]

Related Information

Cache data sources [page 57]Cache joins [page 58]Setting data flow cache type [page 59]Cache source of lookup function [page 60]Cache table comparisons [page 62]Specify a pageable cache directory [page 63]Use persistent cache [page 64]Monitoring and tuning caches [page 66]

7.6 Cache table comparisons

Improve the performance of the Table Comparison transform by caching the comparison table.

Use the Table_Comparison transform to compare two data sets and produce the difference between them as a data set with rows flagged as INSERT and UPDATE. There are three modes of comparisons in the Table Comparison transform:

62 PUBLICPerformance Optimization Guide

Cache data

Page 63: Performance Optimization Guide

Comparison methods

Method Description

Row-by-row select Looks up the target table using SQL every time it receives an input row.

Cached comparison table Loads the comparison table into memory. Queries the com­parison table access memory rather than the actual table.

Sorted input Reads the comparison table in the order of the primary key columns using sequential read. Improves performance be­cause Data Services reads the comparison table only once.

Of the three, Row-by-row select is likely the slowest and Sorted input the fastest.

TipTo sort the input to the table comparison transform, choose the Sorted input option for comparison.

TipIf the input is not sorted, then choose the Cached comparison table option.

Read about the Table Comparison transform in the Reference Guide.

Parent topic: Cache data [page 56]

Related Information

Cache data sources [page 57]Cache joins [page 58]Setting data flow cache type [page 59]Cache source of lookup function [page 60]Lookup table as source outer join [page 61]Specify a pageable cache directory [page 63]Use persistent cache [page 64]Monitoring and tuning caches [page 66]Table_Comparison transform

7.7 Specify a pageable cache directory

Specify a different directory for pageable cache when memory-consuming operations in a data flow exceed the available memory.

If the memory-consuming operations in your data flow exceed the available memory, SAP Data Services uses pageable cache to complete the operation. Memory-intensive operations include the following:

Performance Optimization GuideCache data PUBLIC 63

Page 64: Performance Optimization Guide

● Distinct● Count_distinct● Lookup_ext● Group By (Query transform)● Order By (Query transform)● Hierarchy_Flattening transform

NoteThe default pageable cache directory is <DS_COMMON_DIR>\Log\PCache. If your data flows contain memory-consuming operations, change to a directory that:

● Contains enough disk space for the amount of data you plan to profile.● Is on a separate disk or file system from the SAP Data Services system.

For details about specifying a directory in the SAP Data Services Server Manager, see the Administrator Guide. There is a “Configuring run-time resources” topic for Windows and Unix.

Parent topic: Cache data [page 56]

Related Information

Cache data sources [page 57]Cache joins [page 58]Setting data flow cache type [page 59]Cache source of lookup function [page 60]Lookup table as source outer join [page 61]Cache table comparisons [page 62]Use persistent cache [page 64]Monitoring and tuning caches [page 66]Configuring run-time resourcesConfiguring run-time resources

7.8 Use persistent cache

Persistent cache tables enable you to cache large amounts of data from relational database tables and files.

Persistent cache datastores are containers for persistent cache tables. They provide benefits for data flows that process large volumes of data:

● Store a large amount of data in persistent cache, which SAP Data Services quickly pages into memory each time the job executes.

64 PUBLICPerformance Optimization Guide

Cache data

Page 65: Performance Optimization Guide

ExampleAccess a lookup table or comparison table locally, instead of reading from a remote database.

● Create cache tables that multiple data flows can share, unlike a memory table, which cannot be shared between real-time jobs.

ExampleIf a large lookup table in a lookup_ext function rarely changes, create a cache once and then subsequent jobs can use this cache instead of re-creating cache each time.

Persistent cache tables can cache data from relational database tables and files. However, you cannot do the following with persistent cache tables:

● Cache data from hierarchical data files such as XML messages or IDocs that contain nested schemas● Perform incremental inserts, deletes, or updates

Create a persistent cache table by loading data into the persistent cache target table using one data flow. Then read from the cache table in another data flow. When you load data into a persistent cache table, Data Services always truncates and re-creates the table.

After you create a persistent cache table as a target in one data flow, use the persistent cache table as a source in any data flow. You can also use it as a lookup table or comparison table.

For more information about persistent cache datastores, see the Designer Guide. For more information about persistent cache sources, see the Reference Guide.

Parent topic: Cache data [page 56]

Related Information

Cache data sources [page 57]Cache joins [page 58]Setting data flow cache type [page 59]Cache source of lookup function [page 60]Lookup table as source outer join [page 61]Cache table comparisons [page 62]Specify a pageable cache directory [page 63]Monitoring and tuning caches [page 66]Persistent cache datastoresCreate persistent cache tablesPersistent cache source

Performance Optimization GuideCache data PUBLIC 65

Page 66: Performance Optimization Guide

7.9 Monitoring and tuning caches

Determine cache type manually or let SAP Data Services select the cache type automatically.

Data Services automatically sets the cache type for data flows. It uses cache statistics collected from previous job runs. Cache statistics include the number of rows processed.

The default cache type is Pageable. Data Services switches the cache type to In Memory when it determines that your data flow processes a small amount of data that fits in memory.

You can also monitor and choose the cache type to use for the data flow.

Setting cache type automatically [page 66]To have SAP Data Services select the cache type for your job, execute the job twice.

Monitoring and tuning In Memory and Pageable caches [page 67]To set the cache type for data flows manually, monitor and tune caches on a regular basis.

Accessing the Administrator Performance Monitor [page 69]The Performance Monitor in the SAP Data Services Management Console contains information to help you determine the cache type to use.

Parent topic: Cache data [page 56]

Related Information

Cache data sources [page 57]Cache joins [page 58]Setting data flow cache type [page 59]Cache source of lookup function [page 60]Lookup table as source outer join [page 61]Cache table comparisons [page 62]Specify a pageable cache directory [page 63]Use persistent cache [page 64]

7.9.1 Setting cache type automatically

To have SAP Data Services select the cache type for your job, execute the job twice.

Log in to Designer and open the applicable job in the workspace.

1. Right-click the job name and select Execute.2. Make sure to select the Collect statistics for optimization option in the Execution Parameters dialog box.3. After the job completes without errors, run the job again. Make sure to select the Use collected statistics

option, which should be selected by default.

66 PUBLICPerformance Optimization Guide

Cache data

Page 67: Performance Optimization Guide

Task overview: Monitoring and tuning caches [page 66]

Related Information

Monitoring and tuning In Memory and Pageable caches [page 67]Accessing the Administrator Performance Monitor [page 69]

7.9.2 Monitoring and tuning In Memory and Pageable caches

To set the cache type for data flows manually, monitor and tune caches on a regular basis.

Perform the following steps in SAP Data Services Designer:

1. Run a test job with your data and select the following options in the job execution dialog: Collect statistics for optimization and Collect statistics for monitoring.

NoteThe option Collect statistics for monitoring is costly to run because it determines the cache size for each row processed.

2. Test run the job again with the option Use collected statistics selected.3. Open the Trace Log to determine the cache type that Data Services used.

There are several messages that could appear in the Trace Log. For example, a message that indicates the cache statistics is not available, or the subdata flows use the default cache type, Pageable.The following table contains example messages. Note that we've inserted line breaks for readability. The actual Trace Log does not contain these line breaks.

Situation Trace Log message

The first time you run the job or if you have not previously collected statistics

Cache statistics for sub data flow <data flow name> are not available to be used for optimization and need to be collected before they can be used.

Sub data flow <subdata flow name> using PAGEABLE Cache with <1280 MB> buffer pool.

Performance Optimization GuideCache data PUBLIC 67

Page 68: Performance Optimization Guide

Situation Trace Log message

Software is switching to In-memory cache Cache statistics determined that sub data flow <subdata flow name> uses <1> caches with a total size of <1920> bytes. This is less than (or equal to) the virtual memory <1342177280> bytes available for caches. Statistics is switching the cache type to IN MEMORY.

Sub data flow <subdata flow name> using IN MEMORY Cache.

Because pageable cache is the default cache type for a data flow, you might want to change permanently the Cache type to In Memory in data flow Properties.

The subdata flow uses IN MEMORY CACHE and the other subdata flow uses PAGEABLE cache

Sub data flow <subdata flow1 name> using IN MEMORY Cache.

...

Sub data flow <subdata flow2 name> using PAGEABLE Cache with <1536 MB> buffer pool.

4. Open the SAP Data Services Management Console Administrator to view the Performance Monitor, which contains data flow statistics and the cache size.

For instructions to view the Administrator Performance Monitor, see Accessing the Administrator Performance Monitor [page 69].

If the value of Cache Size in the Administrator Performance Monitor is approaching the physical memory limit on the job server, consider changing the Cache type of the data flow from In Memory to Pageable.

Task overview: Monitoring and tuning caches [page 66]

Related Information

Setting cache type automatically [page 66]Accessing the Administrator Performance Monitor [page 69]

68 PUBLICPerformance Optimization Guide

Cache data

Page 69: Performance Optimization Guide

7.9.3 Accessing the Administrator Performance Monitor

The Performance Monitor in the SAP Data Services Management Console contains information to help you determine the cache type to use.

Before you perform the following steps, perform steps 1 through 3 in Monitoring and tuning In Memory and Pageable caches [page 67].

1. Log in to the Management Console and open the Administrator.2. Expand the Batch node at left and select the applicable repository name.

The Administrator displays a list of batch jobs for the selected repository at right.3. Find the applicable batch job execution instance and click Performance Monitor under the Job Information

column.

The Administrator opens the Table tab of the Performance Monitor page. This tab shows a tabular view of the start time, stop time, and execution time for each work flow, data flow, and subdata flow within the job.

4. To display statistics for each object within a data flow or subdata flow, click one of the data flow names on the Table tab.

The Transform tab displays the statistics described in the following table.

Statistic Description

Name Name that you gave the object (source, transform, or tar­get) in the Designer.

Type Type of object within the data flow: Source, Mapping, Tar­get.

Start time Date and time this object instance started execution.

End time Date and time this object instance stopped execution.

Execution time (seconds) Time (in seconds) the object took to complete execution.

Row Count Number of rows that this object processed.

Cache Size (KB) Size (in kilobytes) of the cache that Data Services used to process this object.

NoteThis statistic displays only when you select Collect statistics for monitoring for the job execution.

If the value of Cache Size is approaching the physical memory limit on the job server, consider changing the Cache type of the data flow from In Memory to Pageable.

Task overview: Monitoring and tuning caches [page 66]

Performance Optimization GuideCache data PUBLIC 69

Page 70: Performance Optimization Guide

Related Information

Setting cache type automatically [page 66]Monitoring and tuning In Memory and Pageable caches [page 67]

70 PUBLICPerformance Optimization Guide

Cache data

Page 71: Performance Optimization Guide

8 Parallel Execution

When you run operations in SAP Data Services in parallel, you save processing time and improve performance.

Set Data Services to perform data extraction, transformation, and loads in parallel by setting parallel options for sources, transforms, and targets. In addition, set individual data flows and work flows to run in parallel by not connecting them in the workspace. If the Job Server runs on a multi-processor computer, it takes full advantage of available CPUs.

Parallel data flows and work flows [page 71]During parallel execution of data flows and work flows, SAP Data Services coordinates the parallel steps, waits for all steps to complete, then starts the next sequential step.

Parallel execution in data flows [page 73]When you execute batch jobs, SAP Data Services uses parallel threads for tasks such as reading and loading files, to maximize performance.

8.1 Parallel data flows and work flows

During parallel execution of data flows and work flows, SAP Data Services coordinates the parallel steps, waits for all steps to complete, then starts the next sequential step.

You can explicitly execute data flows and work flows in parallel by not connecting them in a work flow or job. In the following example, parallel engine processes execute the parallel data flow processes.

ExampleUse parallel processing to load dimension tables by calling work flows in parallel. Then specify that your job creates dimension tables before the Facts table by moving the Dimension work flow to the left of a second (parent) work flow (Facts) and connecting the flows.

Performance Optimization GuideParallel Execution PUBLIC 71

Page 72: Performance Optimization Guide

NoteIf you have more than eight CPUs on your Job Server computer, you can increase the number of engines for your Job Server to improve performance. Increase the number of engines in the Tools Optionsdialog.

For more information about work flows, see the Designer Guide.

Changing the maximum number of engines [page 73]If you have more than eight CPUs on your Job Server computer, increase the number of engines to improve performance of parallel execution.

Parent topic: Parallel Execution [page 71]

Related Information

Parallel execution in data flows [page 73]Work flows

72 PUBLICPerformance Optimization Guide

Parallel Execution

Page 73: Performance Optimization Guide

8.1.1 Changing the maximum number of engines

If you have more than eight CPUs on your Job Server computer, increase the number of engines to improve performance of parallel execution.

Perform the following steps in Designer:

1. Select Tools Options .2. In the Options dialog box, expand Job Server and select Environment.3. Enter a number for Maximum number of engine processes.

The default settings is 2.4. Click OK.

The setting that you make for the Maximum number of engine processes affects all processes for the specific Job Server. You can override the Job Server engine processes on a specific data flow when you set theDegree of Parallelism (DOP) to 1 or more for the specific data flow.

Task overview: Parallel data flows and work flows [page 71]

Related Information

Setting the DOP for a data flow [page 77]

8.2 Parallel execution in data flows

When you execute batch jobs, SAP Data Services uses parallel threads for tasks such as reading and loading files, to maximize performance.

A thread is an instance of a process run by Data Services. During execution, there are several processes that you can configure to run in parallel threads, including the following:

● Partitioned source or target table: Contains a thread for each partition.● Degree of Parallelism: Determines the number of transform replications to run in parallel, with each run as

a separate thread.● Combined table partitioning and degree of parallelism: Creates multiple threads based on the numbers of

partitions and parallel processes.

Degree of parallelism [page 74]The degree of parallelism (DOP) is a data flow property in which you define the number of times SAP Data Services replicates the transform to process a subset of data in parallel.

Table partitioning [page 85]SAP Data Services processes table partitions in parallel for source and target tables.

Combining table partitioning and DOP [page 95]

Performance Optimization GuideParallel Execution PUBLIC 73

Page 74: Performance Optimization Guide

When you partition tables and use them in data flows that have a Degree of parallelism set to greater than 1, expect different results and affects on job performance.

Parallel process threads for flat files [page 99]To enhance performance of time-consuming flat file reading and loading processes, enable SAP Data Services to run the processes in parallel threads.

Parent topic: Parallel Execution [page 71]

Related Information

Parallel data flows and work flows [page 71]

8.2.1 Degree of parallelism

The degree of parallelism (DOP) is a data flow property in which you define the number of times SAP Data Services replicates the transform to process a subset of data in parallel.

The DOP setting affects all transforms in the data flow. If there are multiple transforms in a data flow, SAP Data Services chains them together until it reaches a merge point. A merge point combines all data streams into one data stream before continuing on to the next step in the data flow.

ExampleYou set the Degree of parallelism option to 2 for a data flow. The data flow has a source, a Query transform, and a target. Data Services processes the data flow as follows:

● Replicates the Query transform two times, divides data into two groups, and processes the data groups in parallel threads

● After all data has gone through the Query transform, merges the parallel threads into a single data stream

● Uploads the single data stream into the target

DOP vs Maximum number of engines [page 75]Use the Degree of Parallelism (DOP) setting to override the Maximum number of engines setting for the Job Server.

Before you use DOP for enhanced performance [page 76]If you do not use the degree of parallelism judiciously, it can degrade job performance.

Setting the DOP for a data flow [page 77]Set the Degree of parallelism (DOP) in the Properties dialog box of the data flow.

DOP and transforms [page 78]SAP Data Services always replicates certain transforms and operations when you set the Degree of parallelism (DOP) to a value greater than 1.

DOP and joins [page 80]

74 PUBLICPerformance Optimization Guide

Parallel Execution

Page 75: Performance Optimization Guide

The Degree of parallelism (DOP) setting determines the number of times SAP Data Services replicates a join process.

DOP and functions [page 83]Set stored procedures and custom functions to replicate with the transforms in which you use them.

Enabling stored procedures to run in parallel [page 84]SAP Data Services runs stored procedures in parallel when the transforms in which they execute run in parallel.

Enabling custom functions to run in parallel [page 85]SAP Data Services runs functions in parallel when the transforms in which they execute run in parallel.

Parent topic: Parallel execution in data flows [page 73]

Related Information

Table partitioning [page 85]Combining table partitioning and DOP [page 95]Parallel process threads for flat files [page 99]

Setting the DOP for a data flow [page 77]

8.2.1.1 DOP vs Maximum number of engines

Use the Degree of Parallelism (DOP) setting to override the Maximum number of engines setting for the Job Server.

When you set the DOP for a data flow, you set it at the local level through the data flow Properties. You can also set a global DOP to affect the degree of parallelism for all data flows processed by a specific Job Server. Set a global DOP in the Maximum number of engine processes option in the Job Server environment settings.

Use local or global settings exclusively, or in combination. The default value for the local DOP in the Degree of parallelism option is 0. The default for the global DOP is 2. To rely on a global DOP setting, leave each local data flow DOP set to the default (0). If you set a local data flow DOP to 1 or more, you override the global DOP setting.

ExampleUse the local and global DOP options in the following ways:

● You have a global DOP of 4 for your Job Server. SAP Data Services runs all data flows on the Job Server with four threads in parallel, when applicable.

● You want a particularly complicated data flow to run as a single data stream. Override the global setting of 4 by setting the data flow DOP to 1.

● You prefer to set the DOP on a case-by-case basis for each data flow. Therefore, you set the global DOP to 0 and you set the local DOP for each data flow to greater than 0, as applicable.

Performance Optimization GuideParallel Execution PUBLIC 75

Page 76: Performance Optimization Guide

Parent topic: Degree of parallelism [page 74]

Related Information

Before you use DOP for enhanced performance [page 76]Setting the DOP for a data flow [page 77]DOP and transforms [page 78]DOP and joins [page 80]DOP and functions [page 83]Enabling stored procedures to run in parallel [page 84]Enabling custom functions to run in parallel [page 85]Changing the maximum number of engines [page 73]

8.2.1.2 Before you use DOP for enhanced performance

If you do not use the degree of parallelism judiciously, it can degrade job performance.

The best value to use depends on the complexity of the dataflow and the number of CPUs available.

ExampleOn a computer with four CPUs, setting a DOP greater than 2 for the following data flow may not improve performance, but could degrade it due to thread contention. The data flow in the Designer workspace looks as follows:

Data Services processes the data flow as shown in the following diagram:

Data Services processes the data flow as follows:

● Performs an internal Round Robin Split (RRS) operation and divides the data into two threads.● Instantiates two Query transforms and processes the threads.

76 PUBLICPerformance Optimization Guide

Parallel Execution

Page 77: Performance Optimization Guide

● Instantiates two Table comparison transforms and processes the threads.● Merges the threads and outputs data to the target.

A better optimization for this example is setting the DOP to 4, which is the same number as the number of CPUs. With a DOP of 4, Data Services runs the data in four parallel threads.

Order By and Group By are always merge points, after which Data Services proceeds as if the DOP value is 1. Therefore, if you have a Query transform with an Order By or a Group By sort that Data Services does not push down to the database level, place them as close to the target as possible. This way, Data Services processes all possible operations in parallel before it has to merge and process in a single thread.

Parent topic: Degree of parallelism [page 74]

Related Information

DOP vs Maximum number of engines [page 75]Setting the DOP for a data flow [page 77]DOP and transforms [page 78]DOP and joins [page 80]DOP and functions [page 83]Enabling stored procedures to run in parallel [page 84]Enabling custom functions to run in parallel [page 85]Query transform viewing optimized SQL [page 44]Maximize push-down operations [page 33]

8.2.1.3 Setting the DOP for a data flow

Set the Degree of parallelism (DOP) in the Properties dialog box of the data flow.

1. Open the Data Flow tab in the object library of SAP Data Services Designer.2. Right-click the applicable data flow and select Properties.

The Properties dialog box opens.3. Enter a number in the Degree of parallelism option.

The default value is 0. If you leave the default value, control the DOP using the Maximum number of engines setting for the Job Server (global DOP). The global DOP setting affects all data flows that the Job Server processes. However, when you set the data flow DOP to 1 or higher, you can override the global DOP setting for the specific data flow.

4. Click OK.

Task overview: Degree of parallelism [page 74]

Performance Optimization GuideParallel Execution PUBLIC 77

Page 78: Performance Optimization Guide

Related Information

DOP vs Maximum number of engines [page 75]Before you use DOP for enhanced performance [page 76]DOP and transforms [page 78]DOP and joins [page 80]DOP and functions [page 83]Enabling stored procedures to run in parallel [page 84]Enabling custom functions to run in parallel [page 85]Changing the maximum number of engines [page 73]

8.2.1.4 DOP and transformsSAP Data Services always replicates certain transforms and operations when you set the Degree of parallelism (DOP) to a value greater than 1.

Data Services replicates the Query transform, and Query operations such as Order By, Group By, joins. It also replicates functions such as lookup_ext.

Data Services replicates the Table Comparison transform when you use the Row-by-row select and Cached comparison table comparison methods. Other transforms that Data Services replicates are:

● Map_Operation● History_Preserving● Pivot

To help you understand how the DOP setting affects the transforms that Data Services automatically replicates, consider the following scenarios:

● DOP and a data flow with a single transform● DOP and a data flow with multiple transforms

DOP and a data flow with a single transform

The following example shows runtime instances for a data flow with a DOP of 1, and then a DOP of 2.

ExampleWith the DOP of 1, Data Services processes the job without any replication:

78 PUBLICPerformance Optimization Guide

Parallel Execution

Page 79: Performance Optimization Guide

The following examples shows how Data Services processes a data flow with a DOP of 2, or a setting greater than 1.

ExampleWith a DOP of 2, or a setting greater than 1, Data Services inserts an internal Round Robin Split (RRS) that transfers data to each of the replicated queries. The replicated queries execute in parallel, and the results merge into a single stream by an internal Merge transform:

DOP and a data flow with multiple transforms

The following example shows runtime instances of a data flow with a DOP of 1 and then a DOP of 2. Data Services replicates and chains multiple transforms in a data flow when the DOP is greater than 1. Chaining is when Data Services waits until all threads complete for a transform before it starts processing the data through the next transform.

ExampleWith the DOP of 1, Data Services processes the job without any replication:

With multiple transforms in a data flow and the DOP of greater than 1, Data Services carries the replicated stream as far as possible, then merges the data into a single stream:

Performance Optimization GuideParallel Execution PUBLIC 79

Page 80: Performance Optimization Guide

Parent topic: Degree of parallelism [page 74]

Related Information

DOP vs Maximum number of engines [page 75]Before you use DOP for enhanced performance [page 76]Setting the DOP for a data flow [page 77]DOP and joins [page 80]DOP and functions [page 83]Enabling stored procedures to run in parallel [page 84]Enabling custom functions to run in parallel [page 85]

8.2.1.5 DOP and joins

The Degree of parallelism (DOP) setting determines the number of times SAP Data Services replicates a join process.

If you set a Query transform to join sources in a data flow, the DOP setting determines the number of times Data Services replicates the join process. Each replication processes a subset of data. After Data Services completes processing all threads in the join, it moves the data to the next process in the data flow.

To help you understand how the DOP setting affects the join processes that Data Services replicates, consider the following scenarios:

● DOP and executing a join as a single process● DOP and executing a join as multiple processes

80 PUBLICPerformance Optimization Guide

Parallel Execution

Page 81: Performance Optimization Guide

DOP and executing a join as a single process

ExampleThe following diagrams show runtime instances of a data flow that contains a join. The first diagram has a DOP of 1 and second diagram has a DOP of 2.

Use join ranks to define the outer source and inner source. In both data flows, the inner source is cached in memory.

With a DOP of 2, Data Services inserts an internal Round Robin Split (RRS) that transfers outer source data to each of the replicated joins. Data Services caches the inner source in memory once, and joins each half of the outer source with the cached data in the replicated joins. It then executes the replicated joins in parallel, and merges the results into a single stream by an internal Merge transform:

Performance Optimization GuideParallel Execution PUBLIC 81

Page 82: Performance Optimization Guide

DOP and executing a join as multiple processes

When you select the Run JOIN as a separate process option in the Query transform, you can split the execution of a join into multiple processes. Data Services creates a subdata flow for each process.

ExampleThe following diagram shows a runtime instance of a data flow with a join. The DOP is set to 2 and the join has the Run JOIN as a separate process option selected.

Data Services divides the data flow into four sub data flows, indicated in the diagram by the blue dotted and dashed boxes:

● In the first sub data flow, Data Services uses an internal hash algorithm to split the data.● In the next two sub data flows, Data Services runs the replicated joins as separate processes in parallel.● In the last sub data flow, Data Services merges the data into one stream and loads the data to the

target.

TipIf DOP is greater than 1, set the Distribution level option in the Execution Properties dialog box to either job or data flow. If you set the option to sub data flow, the Hash Split algorithm sends data to replicated queries that might be executing on different Job Servers. If the queries execute on different Job Servers, Data Services sends the data on the network between two Job Servers, which may slow down the entire data flow.

Parent topic: Degree of parallelism [page 74]

82 PUBLICPerformance Optimization Guide

Parallel Execution

Page 83: Performance Optimization Guide

Related Information

DOP vs Maximum number of engines [page 75]Before you use DOP for enhanced performance [page 76]Setting the DOP for a data flow [page 77]DOP and transforms [page 78]DOP and functions [page 83]Enabling stored procedures to run in parallel [page 84]Enabling custom functions to run in parallel [page 85]Join rank settings [page 187]Cache joins [page 58]Using grid computing to distribute data flow execution [page 117]

8.2.1.6 DOP and functions

Set stored procedures and custom functions to replicate with the transforms in which you use them.

To replicate stored procedures and custom functions with the transform, select the Enable parallel execution checkbox in the function Properties dialog box. If you do not select this option and you add the function to a transform, SAP Data Services does not replicate the transform nor run it in parallel even when the parent data flow has a DOP greater than 1.

When enabling functions to run in parallel, verify the following information:

● Your database type allows a stored procedure to run in parallel.● The custom function set to run in parallel actually improves performance.

Most built-in functions replicate when the transform in which you use them replicates because of the DOP value. However, the following functions do not replicate because of the DOP value:

● avg ()● count ()● count_distinct ()● double_metaphone ()● exe ()● get_domain_description ()● gen_row_num ()● gen_row_num_by_group ()● is_group_changed ()● key_generation ()● mail_to ()● max ()● min ()● previous_row_value ()● print ()

Performance Optimization GuideParallel Execution PUBLIC 83

Page 84: Performance Optimization Guide

● raise_exception ()● raise_exception_ext ()● set_env ()● sleep ()● smpt_to ()● soundex ()● sql ()● sum ()● total_rows ()

Parent topic: Degree of parallelism [page 74]

Related Information

DOP vs Maximum number of engines [page 75]Before you use DOP for enhanced performance [page 76]Setting the DOP for a data flow [page 77]DOP and transforms [page 78]DOP and joins [page 80]Enabling stored procedures to run in parallel [page 84]Enabling custom functions to run in parallel [page 85]

8.2.1.7 Enabling stored procedures to run in parallel

SAP Data Services runs stored procedures in parallel when the transforms in which they execute run in parallel.

1. In the Datastores tab of the object library, expand the applicable datastore node.2. Expand the Functions node.3. Right-click the applicable function and select Properties from the dropdown list.4. In the Properties dialog box, open the Function tab.5. To enable parallel execution, check the Enable Parallel Execution checkbox.6. Click OK.

Task overview: Degree of parallelism [page 74]

Related Information

DOP vs Maximum number of engines [page 75]Before you use DOP for enhanced performance [page 76]

84 PUBLICPerformance Optimization Guide

Parallel Execution

Page 85: Performance Optimization Guide

Setting the DOP for a data flow [page 77]DOP and transforms [page 78]DOP and joins [page 80]DOP and functions [page 83]Enabling custom functions to run in parallel [page 85]

8.2.1.8 Enabling custom functions to run in parallel

SAP Data Services runs functions in parallel when the transforms in which they execute run in parallel.

1. In the Custom Functions tab of the object library, right-click the applicable function name and select Properties.

The Properties dialog box opens.2. Open the Function tab.3. Check the Enable Parallel Execution checkbox.4. Click OK.

Task overview: Degree of parallelism [page 74]

Related Information

DOP vs Maximum number of engines [page 75]Before you use DOP for enhanced performance [page 76]Setting the DOP for a data flow [page 77]DOP and transforms [page 78]DOP and joins [page 80]DOP and functions [page 83]Enabling stored procedures to run in parallel [page 84]

8.2.2 Table partitioning

SAP Data Services processes table partitions in parallel for source and target tables.

The way in which you partition tables is based on your database type. Data Services processes data flows with partitioned tables based on the number of partitions that you define. It processes each partition in parallel, which improves performance.

The following are examples of various database types, their partitioning types, and what you can do with partitioned tables in Data Services.

Performance Optimization GuideParallel Execution PUBLIC 85

Page 86: Performance Optimization Guide

Oracle

Oracle databases support range, list, and hash partitioning. With partitioned Oracle data, you can do the following in Data Services:

● Import partition table metadata and use to extract data in parallel.● Use range and list partitions to load data to Oracle targets.● Specify logical range and list partitions using Data Services metadata for Oracle tables.

SAP HANA

SAP HANA supports partitions for column store tables. With partitioned SAP HANA column store tables, you can do the following in SAP Data Services:

● Use physical and logical partitions for parallel reading.● Import partition table metadata and range partitioned tables for parallel reading.● Use logical partitions of tables and range mixed with list, which is similar to physical range partition syntax.

Other databases

Specify logical range and list partitions by modifying imported table metadata for the following databases:

● DB2● Microsoft SQL Server● SAP ERP● SAP ASE● SAP IQ

Viewing table partition information [page 87]View partition information in the Properties of the applicable table to determine partition names and columns.

Creating and editing table partition information [page 88]Create logical partitions in the table Properties dialog box when an imported table is not partitioned.

Enabling partition settings in a source or target table [page 90]To enable partitions for source and target tables, make settings in the respective editor in a data flow.

When range partitioning is not supported [page 91]If you know that a table has a natural distribution of ranges, SAP Data Services allows you to define table ranges even when the database type doesn't support range partitioning.

Example 1: Data flow with source partitions only [page 91]SAP Data Services translates a data flow with a partitioned source table as a data flow with a source for each partition.

Example 2: Data flow with target partitions only [page 92]SAP Data Services performs internal operations for a data flow with a target table that has two partitions.

86 PUBLICPerformance Optimization Guide

Parallel Execution

Page 87: Performance Optimization Guide

Example 3: Data flow with source and target partitions [page 94]SAP Data Services performs internal processes when both the source and target tables are partitioned.

Parent topic: Parallel execution in data flows [page 73]

Related Information

Degree of parallelism [page 74]Combining table partitioning and DOP [page 95]Parallel process threads for flat files [page 99]

8.2.2.1 Viewing table partition information

View partition information in the Properties of the applicable table to determine partition names and columns.

Ensure that you have already imported partitioned tables for the specific datastore. Or, log in to SAP Data Services Designer and import the applicable partitioned tables. To view partitioning information, perform the following steps:

1. Open the Datastores tab in the Object Library and find the applicable datastore.2. Expand the Datastore node and then expand the Table node.3. Right-click the name of the table and select Properties from the dropdown menu.4. In the Properties dialog box, open the Partitions tab.

The partition name appears in the first column. The columns that Data Services uses for partitioning appear as column headings in the second row.

Task overview: Table partitioning [page 85]

Related Information

Creating and editing table partition information [page 88]Enabling partition settings in a source or target table [page 90]When range partitioning is not supported [page 91]Example 1: Data flow with source partitions only [page 91]Example 2: Data flow with target partitions only [page 92]Example 3: Data flow with source and target partitions [page 94]

Performance Optimization GuideParallel Execution PUBLIC 87

Page 88: Performance Optimization Guide

8.2.2.2 Creating and editing table partition information

Create logical partitions in the table Properties dialog box when an imported table is not partitioned.

Make sure that the applicable database type supports table partitioning. Ensure that you know the guidelines for the partition type that you choose.

Log in to SAP Data Services Designer and perform the following steps to create logical partitions for an imported table:

1. Find the table in the Datastores tab in the Object Library under the applicable datastore.2. Right-click the name of the table and select Properties from the dropdown menu.3. In the Properties dialog box, open the Partitions tab.4. Select a type from the Partition type dropdown list.

Partition types

Partition Type Description

None Does not create partitions.

Range Creates partitions with a set of rows that contain column values that are less than the specified values.

ExampleIf the value of Column_One is 100,000, then the data set for Partition_One includes rows with values less than 100,000 in Column_One.

List Creates partitions that each contain a set of rows with the specified column values.

Hash Creates partitions based on placement of hash keys, which are placed to evenly distribute rows across parti­tions. Applicable for databases that support hash parti­tions.

NoteIf you imported an Oracle table with hash partitions, you cannot edit the hash settings in SAP Data Services. The Partitions tab displays the hash parti­tion name and ID as read-only information. However, you can change the partition type to Range or List to create logical range or list partitions for an Oracle ta­ble imported with hash partitions.

5. Add, insert, or remove partitions and columns based on the partition type you choose. Use the tool bar as described in the following table.

88 PUBLICPerformance Optimization Guide

Parallel Execution

Page 89: Performance Optimization Guide

NoteThe number of partitions in a table equals the maximum number of parallel instances that the software can process for a source or target created from this table.

Icon Description

Adds a partition for which you enter a name

Inserts a partition above the existing partition

Removes a selected partition

Adds a column to the partition

Inserts a column above the existing partition

Removes a selected column

6. Click the Add Partition icon and enter a name for the partition.7. Click the Add Column icon and select a column name from the dropdown list.

SAP Data Services validates the column values entered for each partition according to the following rules:

○ Values can be literal numbers and strings or date/time types.○ Column values must match column data types.○ Literal strings must include single quotes: 'Director'.○ For range partitions, the values for a partition must be greater than the values for the previous

partition.○ For the last partition, you can enter the value MAXVALUE to include all values.

8. Continue defining each partition based on the partition type and the data in the table, then click OK to save your partitions.

If your partitions do not meet the validation rules, Data Services displays an error message.

Task overview: Table partitioning [page 85]

Related Information

Viewing table partition information [page 87]Enabling partition settings in a source or target table [page 90]When range partitioning is not supported [page 91]Example 1: Data flow with source partitions only [page 91]Example 2: Data flow with target partitions only [page 92]Example 3: Data flow with source and target partitions [page 94]

Performance Optimization GuideParallel Execution PUBLIC 89

Page 90: Performance Optimization Guide

8.2.2.3 Enabling partition settings in a source or target table

To enable partitions for source and target tables, make settings in the respective editor in a data flow.

Make sure that applicable database type supports table partitioning.

Log in to SAP Data Services Designer and open the applicable project in the Project area. Open the data flow that contains the partitioned table in the workspace.

1. Open the source or target editor for the partitioned table.2. Enable partitioning:

a. For a source table, click the Enable Partitioning checkbox.b. For a target table, open the Options tab and click the Enable Partitioning checkbox.

3. Click OK.

When you execute the job, Data Services generates parallel instances based on the partition information. Data Services executes the load in parallel based on the number of partitions in the table.

NoteIf you set Enable Partitioning to Yes and Include in transaction to No, the Include in transaction setting overrides the Enable Partitioning setting.

ExampleYou design your job to load to a partitioned table. You make the following settings:

● Include in transaction = Yes● Transaction order = <value>

Even though your intention is for parallel processing, when you execute the job, Data Services includes the table in a transaction load and does not parallel load to the partitioned table.

Task overview: Table partitioning [page 85]

Related Information

Viewing table partition information [page 87]Creating and editing table partition information [page 88]When range partitioning is not supported [page 91]Example 1: Data flow with source partitions only [page 91]Example 2: Data flow with target partitions only [page 92]Example 3: Data flow with source and target partitions [page 94]

90 PUBLICPerformance Optimization Guide

Parallel Execution

Page 91: Performance Optimization Guide

8.2.2.4 When range partitioning is not supported

If you know that a table has a natural distribution of ranges, SAP Data Services allows you to define table ranges even when the database type doesn't support range partitioning.

A natural distribution of ranges could be when a table has a primary key column such as an employee key number. When you edit the imported table metadata and define table ranges for a table from a non-supporting database type, Data Services processes as follows:

● Instantiates multiple reader threads, one for each defined range● Executes the reader threads in parallel to extract the data

NoteTable metadata editing for partitioning is designed for source tables. If you use a partitioned table as a target, the physical table partitions in the database must match the metadata table partitions in Data Services. If there is a mismatch, Data Services does not use the partition name to load partitions. Consequently, execution updates the whole table.

Parent topic: Table partitioning [page 85]

Related Information

Viewing table partition information [page 87]Creating and editing table partition information [page 88]Enabling partition settings in a source or target table [page 90]Example 1: Data flow with source partitions only [page 91]Example 2: Data flow with target partitions only [page 92]Example 3: Data flow with source and target partitions [page 94]

8.2.2.5 Example 1: Data flow with source partitions only

SAP Data Services translates a data flow with a partitioned source table as a data flow with a source for each partition.

ExampleThe data flow has a source table divided into two partitions. The following image shows the data flow in Designer:

Performance Optimization GuideParallel Execution PUBLIC 91

Page 92: Performance Optimization Guide

At runtime, Data Services instantiates a source thread for each partition and runs the threads in parallel. Data Services then merges the two threads into a single stream using an internal Merge transform and processes the data through the Query transform. Then the data flow continues in a single process, loading data in a single stream into the target.

Parent topic: Table partitioning [page 85]

Related Information

Viewing table partition information [page 87]Creating and editing table partition information [page 88]Enabling partition settings in a source or target table [page 90]When range partitioning is not supported [page 91]Example 2: Data flow with target partitions only [page 92]Example 3: Data flow with source and target partitions [page 94]

8.2.2.6 Example 2: Data flow with target partitions only

SAP Data Services performs internal operations for a data flow with a target table that has two partitions.

ExampleThe data flow has a source table that is not partitioned, a Query transform, and a target table that has two partitions. The following image shows the order of objects in Designer:

At runtime, Data Services performs the following internal operations:

92 PUBLICPerformance Optimization Guide

Parallel Execution

Page 93: Performance Optimization Guide

● Round Robin Split (RRS) operation after the Query transform● Parallel Case transforms● Parallel Merge transforms

The RRS operation routes incoming rows from the Query transform in a round-robin fashion into two internal Case transforms. The Case transforms evaluate the rows to determine the partition ranges. Finally, an internal Merge transform collects the incoming rows from the Case transforms, and outputs two single streams of rows, one to each target partition.

Data Services executes the data flow in parallel beginning with the flow of data into the internal Case transforms.

Parent topic: Table partitioning [page 85]

Related Information

Viewing table partition information [page 87]Creating and editing table partition information [page 88]Enabling partition settings in a source or target table [page 90]When range partitioning is not supported [page 91]Example 1: Data flow with source partitions only [page 91]Example 3: Data flow with source and target partitions [page 94]

Performance Optimization GuideParallel Execution PUBLIC 93

Page 94: Performance Optimization Guide

8.2.2.7 Example 3: Data flow with source and target partitions

SAP Data Services performs internal processes when both the source and target tables are partitioned.

ExampleThe data flow has a source table with two partitions. The data flow contains a Query Transform and a target table with two partitions. The data flow appears in the Designer workspace as follows:

At runtime, Data Services translates the data flow as shown in the following diagram:

Data Services performs the following internal processes:

● Creates a Merge transform to merge the source table partitions into one data stream● Creates two Case transforms that separate the data stream into two data streams● Creates two Merge transforms that form the table partitions to load into the target tables

Parent topic: Table partitioning [page 85]

Related Information

Viewing table partition information [page 87]Creating and editing table partition information [page 88]Enabling partition settings in a source or target table [page 90]

94 PUBLICPerformance Optimization Guide

Parallel Execution

Page 95: Performance Optimization Guide

When range partitioning is not supported [page 91]Example 1: Data flow with source partitions only [page 91]Example 2: Data flow with target partitions only [page 92]

8.2.3 Combining table partitioning and DOP

When you partition tables and use them in data flows that have a Degree of parallelism set to greater than 1, expect different results and affects on job performance.

The combination of settings for source and target partitions and the set degree of parallelism result in different behaviors in SAP Data Services. To help with your testing in this area, we present several examples that consist of various setting combinations.

For all of the examples, the data flow in Designer appears as follows:

Example 1: Two source partitions and a DOP of 3 [page 96]SAP Data Services instantiates a Merge Round Robin Split (MRRS) and a Merge transform to execute a job with a DOP of 3 and a source table that has two partitions.

Example 2: Two source partitions and a DOP of 2 [page 97]With the DOP set to equal the number of partitions, SAP Data Services processes the data flow without running a Merge Round Robin Split.

Example 3: Two source partitions, DOP of 3, two target partitions [page 97]SAP Data Services runtime processes are more complicated when both the source and target tables are partitioned, and the DOP is greater than the number of partitions.

Example 4: Two source and target partitions, DOP of two [page 98]When the table partitions are equal to the DOP setting, the runtime processes are simple.

Parent topic: Parallel execution in data flows [page 73]

Related Information

Degree of parallelism [page 74]Table partitioning [page 85]Parallel process threads for flat files [page 99]

Performance Optimization GuideParallel Execution PUBLIC 95

Page 96: Performance Optimization Guide

8.2.3.1 Example 1: Two source partitions and a DOP of 3

SAP Data Services instantiates a Merge Round Robin Split (MRRS) and a Merge transform to execute a job with a DOP of 3 and a source table that has two partitions.

ExampleAt runtime, Data Services replicates the source table and creates two sub data groups. It feeds the data sub groups into a merge-round-robin split (MRRS) that splits the data streams into three Query transform processes. Data Services replicates the Query transform and runs three threads in parallel. Finally, Data Services merges the data streams output from the Query transforms and loads data to the target.

The following diagram shows the runtime process:

TipTo produce a data flow without the merge-round-robin splitter, perform the following setup steps:

1. Set the DOP value for the data flow equal to the number of source partitions.2. With a nonpartitioned target, set the Number of loaders option in the target editor equal to the DOP

value.

As a general rule, and depending on the number of CPUs available, this setup eliminates the need for an MRRS, and produces a data flow where each partition pipes the data directly into the consuming transform.

Parent topic: Combining table partitioning and DOP [page 95]

Related Information

Example 2: Two source partitions and a DOP of 2 [page 97]Example 3: Two source partitions, DOP of 3, two target partitions [page 97]Example 4: Two source and target partitions, DOP of two [page 98]

96 PUBLICPerformance Optimization Guide

Parallel Execution

Page 97: Performance Optimization Guide

8.2.3.2 Example 2: Two source partitions and a DOP of 2

With the DOP set to equal the number of partitions, SAP Data Services processes the data flow without running a Merge Round Robin Split.

ExampleWhen the number of source partitions is the same as the value for DOP, Data Services doesn't merge the data streams until it loads data into the target. If the Query transform contains an operation that requires a merge, such as an aggregation, Data Services merges the data streams before the aggregation operation.

Parent topic: Combining table partitioning and DOP [page 95]

Related Information

Example 1: Two source partitions and a DOP of 3 [page 96]Example 3: Two source partitions, DOP of 3, two target partitions [page 97]Example 4: Two source and target partitions, DOP of two [page 98]

8.2.3.3 Example 3: Two source partitions, DOP of 3, two target partitions

SAP Data Services runtime processes are more complicated when both the source and target tables are partitioned, and the DOP is greater than the number of partitions.

ExampleWhen the number of source partitions is less than the value for DOP, Data Services feeds the input into a merge-round-robin split (MRRS). The MRRS merges the input streams and splits them into three parallel threads. In this example, the number of target partitions is also less than the DOP. Therefore, Data Services merges the three streams into two streams and loads the data to the two partitions of the target table.

Performance Optimization GuideParallel Execution PUBLIC 97

Page 98: Performance Optimization Guide

TipTo produce a runtime process without the Merge Round Robin Split, set up the data flow as follows:

1. If the number of target partitions is not equal to the number of source partitions, set the Number of loaders option in the target editor equal to the DOP value, and disable target table partitioning.

2. As a general rule, set the DOP value equal to the number of source partitions. Consider the number of CPUs available for your Job Server before you perform this step.

This setup produces a data flow without the MRRS, and each source partition pipes the data directly into the consuming transform.

Parent topic: Combining table partitioning and DOP [page 95]

Related Information

Example 1: Two source partitions and a DOP of 3 [page 96]Example 2: Two source partitions and a DOP of 2 [page 97]Example 4: Two source and target partitions, DOP of two [page 98]

8.2.3.4 Example 4: Two source and target partitions, DOP of two

When the table partitions are equal to the DOP setting, the runtime processes are simple.

The best case situation for simple runtime processing is when you meet the following conditions:

● Partition the source and target the same way.● Use the same number of partitions for the source and the target.

98 PUBLICPerformance Optimization Guide

Parallel Execution

Page 99: Performance Optimization Guide

● DOP is equal to the number of partitions.

ExampleSAP Data Services replicates all objects in the data flow twice when you partition the source and target into two and you set the DOP to 2:

● When a source has two partitions, Data Services replicates it twice.● When the DOP is 2, Data Services splits the Query transform into two.● When a target has two partitions, Data Services replicates it twice.

At runtime, Data Services essentially processes two identical data flows in parallel as shown in the following diagram:

Parent topic: Combining table partitioning and DOP [page 95]

Related Information

Example 1: Two source partitions and a DOP of 3 [page 96]Example 2: Two source partitions and a DOP of 2 [page 97]Example 3: Two source partitions, DOP of 3, two target partitions [page 97]

8.2.4 Parallel process threads for flat files

To enhance performance of time-consuming flat file reading and loading processes, enable SAP Data Services to run the processes in parallel threads.

When you do not enable parallel processing, Data Services reads and loads flat files without the benefits of multi-threading. The following table describes how Data Services reads or loads flat files when you do not enable parallel process threads for the listed process.

Performance Optimization GuideParallel Execution PUBLIC 99

Page 100: Performance Optimization Guide

Process Without parallel process threads

Delimited file reading Data Services reads a block of data from the file system and scans each character to determine if the character is a col­umn, row, or a text delimiter. It then builds a row using an in­ternal format.

Positional file reading Data Services does not scan character by character as it does for delimited file reading, but it still builds a row using an internal format.

File loading Data Services builds character-based row from the internal row format.

Because reading and loading flat files is time consuming, you benefit by configuring Data Services to read and load flat files in parallel. Set the number of threads in the Parallel process threads option. Find the option in one of the following editors:

● File format editor● Source file editor● Target file editor● Properties dialog box for an ABAP data flow

NoteData Services does not support CPU hyperthreading. CPU hyperthreading can negatively affect the performance of servers.

Considerations for tuning parallel thread processing [page 101]Tune performance of parallel thread processing by balancing the number of parallel processes with the number of CPUs on your Job Server.

More tips for setting Parallel process threads [page 103]The best setting for the Parallel process threads option depends on the complexity of your data flow and the number of available processes.

Parent topic: Parallel execution in data flows [page 73]

Related Information

Degree of parallelism [page 74]Table partitioning [page 85]Combining table partitioning and DOP [page 95]

100 PUBLICPerformance Optimization Guide

Parallel Execution

Page 101: Performance Optimization Guide

8.2.4.1 Considerations for tuning parallel thread processing

Tune performance of parallel thread processing by balancing the number of parallel processes with the number of CPUs on your Job Server.

The Parallel process threads option is a performance enhancement for some sources and targets. Performance is defined as the total elapsed time used to read a file source. Therefore, your goal in tuning performance is to shorten the total elapsed time.

For SAP Data Services to achieve high performance when reading a multi-threaded file source or target, maximize CPU usage on your Job Server computer. When you enable parallel process threading, there is higher CPU usage. You might also notice higher memory usage because the number of process threads you set resides in memory at the same time. Each thread consists of blocks of rows that use 128 kilobytes of memory.

To begin, tune performance by setting the value for Parallel process threads to the number of CPUs that your Job Server has. For example, if you enter the value 4 for Parallel process threads, make sure that you have at least four CPUs on your Job Server computer. This beginning step is a good start, but it doesn't always net improved performance.

Data Services achieves the best performance for file reading and loading under the following circumstances:

● The work load is distributed evenly among all of the CPUs● The file input/output (I/O) thread compares with the speed of the process threads

The I/O thread for a file source reads data from a file and feeds it to process threads. The I/O thread for a file target takes data from process threads and loads it to a file. Therefore, if a source file I/O thread is too slow to keep the process threads busy, there is no need to increase the number of process threads.

If there is more than one process thread on one CPU, the CPU must switch between the threads. There is an overhead incurred in creating these threads and switching the CPU between them.

Parallel process threads for flat file sources [page 102]To determine the best settings when you configure parallel process threads for flat file sources, ensure that you meet certain conditions in the source or flat file editor.

Parallel process threads for flat file targets [page 103]Unlike for flat file sources, SAP Data Services does not have a lot of conditions for parallel processing of flat file targets.

Parent topic: Parallel process threads for flat files [page 99]

Related Information

More tips for setting Parallel process threads [page 103]

Performance Optimization GuideParallel Execution PUBLIC 101

Page 102: Performance Optimization Guide

8.2.4.1.1 Parallel process threads for flat file sources

To determine the best settings when you configure parallel process threads for flat file sources, ensure that you meet certain conditions in the source or flat file editor.

When you enable parallel process threads for multiple flat file sources, SAP Data Services performs file multi-threading one file at a time. It completes data processing in one file before it processes the data in the next file.

ExampleIn the file format editor, you specify a file or files in the File(s) option. When you use a wildcard to specify a file, such as *.txt, there may be multiple files to read.

File format editor

Ensure that you meet the following conditions before you use the Parallel process threads option in the file format editor:

● Do not define text delimiters for delimited files. However, you may have text delimiters defined for fixed-width files.

NoteIn most cases, you can set Data Services to read flat file data in parallel because most jobs use fixed-width or column-delimited source files that do not use text delimiters.

● Do not specify an end-of-file (EOF) marker for the file input and output style.● Do not set the row delimiter value to {none} unless it is a fixed-width file.● For files with a multi-byte locale, set the row delimiter as follows to take advantage of parallel process

threads:○ Set the length of the row delimiter to 1 unless the file uses a code page of UTF-16, then set the row

delimiter to 2.○ Set the row delimiter hex value to less than 0x40.

Source file editor

Ensure that you meet the following conditions before you use the Parallel process threads option in the source file editor:

● Do not set a value for the Rows to read option. Leave the default value of none.

NoteThe Rows to read option sets the maximum number of rows that Data Services reads. You normally use this setting for debugging.

● The maximum row size does not exceed 128 KB.

102 PUBLICPerformance Optimization Guide

Parallel Execution

Page 103: Performance Optimization Guide

Parent topic: Considerations for tuning parallel thread processing [page 101]

Related Information

Parallel process threads for flat file targets [page 103]

8.2.4.1.2 Parallel process threads for flat file targets

Unlike for flat file sources, SAP Data Services does not have a lot of conditions for parallel processing of flat file targets.

SAP Data Services requires flat file targets have a maximum row size that does not exceed 128 KB before it processes parallel threads for a target flat file.

Parent topic: Considerations for tuning parallel thread processing [page 101]

Related Information

Parallel process threads for flat file sources [page 102]

8.2.4.2 More tips for setting Parallel process threads

The best setting for the Parallel process threads option depends on the complexity of your data flow and the number of available processes.

To start with, set the Parallel process threads option to 2 when your Job Server is on a computer with multiple CPUs.

After you set the Parallel process threads option to 2, experiment with different values to determine the best setting for your environment.

The following lists Data Services behavior when you set the Parallel process threads option to certain values:

● Parallel process threads = None: Data Services does not read or load flat files in parallel.● Parallel process threads = 1 and Job Server has one CPU: Data Services reads and loads faster than single-

threaded file reads and loads because it reads the I/O thread separately and concurrently with the process thread.

● Parallel process threads = 4: Data Services creates four process threads. Run these threads on a single CPU, however, but running with four CPUs may maximize performance of flat file reading or loading.

Performance Optimization GuideParallel Execution PUBLIC 103

Page 104: Performance Optimization Guide

Parent topic: Parallel process threads for flat files [page 99]

Related Information

Considerations for tuning parallel thread processing [page 101]

104 PUBLICPerformance Optimization Guide

Parallel Execution

Page 105: Performance Optimization Guide

9 Distribute data flow execution

Data flow distribution, combined with other performance enhancement settings, provides additional optimization.

SAP Data Services can run a single process in multiple threads that run in parallel on a multiprocessor computer that has 2 GB or more of memory. By using multiple threads and degree of parallelism (DOP), Data Services executes each thread on a separate CPU.

Data Services can split a data flow into sub data flows. Sub data flows use available memory from multiple computers, or from the same computer that has 2 GB or more of memory. For example, if your computer has 8 GB of memory, Data Services forms four sub data flows that each has up to 2 GB of memory. With this capability, Data Services distributes CPU-intensive and memory-intensive operations, such as joins, GroupBy, table comparison, and LookUp. Distribution of data flow execution provides the following potential benefits:

● Better memory management by taking advantage of more CPU power and physical memory.● Better job performance and scalability by taking advantage of grid computing (server groups).

Other data flow distribution techniques include creating sub data flows so that Data Services doesn't have to process the entire data flow in memory at one time.

Run as separate process [page 106]To enhance performance, enable the Run as separate process option to split resource-intensive operations into sub data flows.

Multiple processes with Data_Transfer [page 107]Save Job Server resources by pushing down resource-intensive processes to the database server.

Multiple processes for a data flow [page 112]Configuring multiple processes in a data flow moves resource-intensive operations to a different computer.

Related Information

Parallel Execution [page 71]Using grid computing to distribute data flow execution [page 117]Server group architecture

Performance Optimization GuideDistribute data flow execution PUBLIC 105

Page 106: Performance Optimization Guide

9.1 Run as separate process

To enhance performance, enable the Run as separate process option to split resource-intensive operations into sub data flows.

Each time you configure SAP Data Services to run a separate process, you create a sub data flow. To create a separate process, select the Run as separate process option in resource-intensive transforms, functions, and query operations.

Each sub data flow uses separate memory and computer resources. When you specify a Run as separate process option for multiple data flow objects, Data Services splits the data flow into sub data flows that run in parallel.

The following are resource-intensive transforms that have the Run as separate process option:

● Hierarchy Flattening transform● Associate transform● Country ID transform● Global Address Cleanse transform● Global Suggestion Lists transform● Match Transform● United States Regulatory Address Cleanse transform● User-Defined transform● Table Comparison transform

The following are resource-intensive functions that have the Run as separate process option:

● Lookup_ext function● Count distinct function● Search_replace function

The following are resource-intensive query operations that have the Run as separate process option:

● Joins● GROUP BY● ORDER BY● DISTINCT

Parent topic: Distribute data flow execution [page 105]

Related Information

Multiple processes with Data_Transfer [page 107]Multiple processes for a data flow [page 112]

106 PUBLICPerformance Optimization GuideDistribute data flow execution

Page 107: Performance Optimization Guide

9.2 Multiple processes with Data_Transfer

Save Job Server resources by pushing down resource-intensive processes to the database server.

To split a data flow with resource-intensive processes into sub data flows, use the Data_Transfer transform. SAP Data Services pushes down the sub data flows to the database server, which saves resources on the Job Server.

Data_Transfer transform [page 107]The Data_Transfer transform creates transfer tables in datastores to enable the software to push down operations to the database server.

Example 1: Sub data flow that pushes down joins [page 108]To push down a join of two source files to the database, configure the Data_Transfer transform to create a sub data flow for the join operation.

Example 2: Sub data flow that pushes down memory-intensive operations [page 110]To push down memory-intensive operations such as Group By or Order By, use the Data_Transfer transform.

Parent topic: Distribute data flow execution [page 105]

Related Information

Run as separate process [page 106]Multiple processes for a data flow [page 112]

9.2.1 Data_Transfer transform

The Data_Transfer transform creates transfer tables in datastores to enable the software to push down operations to the database server.

The Data_Transfer transform creates two sub data flows and creates a transfer table to distribute the data from one sub data flow to the other sub data flow. The sub data flows execute serially. For more information about the Data_Transfer transform, see the Reference Guide.

Parent topic: Multiple processes with Data_Transfer [page 107]

Related Information

Example 1: Sub data flow that pushes down joins [page 108]Example 2: Sub data flow that pushes down memory-intensive operations [page 110]

Performance Optimization GuideDistribute data flow execution PUBLIC 107

Page 108: Performance Optimization Guide

9.2.2 Example 1: Sub data flow that pushes down joins

To push down a join of two source files to the database, configure the Data_Transfer transform to create a sub data flow for the join operation.

ExampleIn the following data flow, the Query transform joins data from the two source files:

● orders*.txt is a flat file● ORDERS is a table

The joined source data goes through a Query transform that performs a lookup_ext function. The Lookup function obtains the sales subtotals from the source. The next Query transform performs a GroupBy operation to group results by country and region.

To create a sub data flow for the join operation, add a Data_Transfer transform after the flat file, orders*.txt. SAP Data Services creates a sub data flow, which it pushes down to the database server for processing.

Solution 1: Adding Data_Transfer to push down join operation [page 109]As a solution to Example 1, configure a Data_Transfer transform to push the join down to the database server.

Parent topic: Multiple processes with Data_Transfer [page 107]

Related Information

Data_Transfer transform [page 107]Example 2: Sub data flow that pushes down memory-intensive operations [page 110]

108 PUBLICPerformance Optimization GuideDistribute data flow execution

Page 109: Performance Optimization Guide

9.2.2.1 Solution 1: Adding Data_Transfer to push down join operation

As a solution to Example 1, configure a Data_Transfer transform to push the join down to the database server.

Refer to Example 1: Sub data flow that pushes down joins [page 108] for information related to the following steps.

NoteUse the following example process as a guide to configure your own data flows for pushing down operations.

To place a Data_Transfer transform in the data flow and configure it to push down the join operation, perform the following steps:

1. Drag the Data_Transfer icon from the Transform tab under the Data Integrator node, to the data flow in the workspace.

2. Drop the Data_Transfer transform between the orders.txt flat file source and the Query transform, and connect the objects.

3. Open the Data_Transfer transform editor in your workspace and click the Enable transfer option to select it.4. Select Table from the Transfer type dropdown list.5. In the Table options group, click the Browse icon at the end of the Table name text box.

The Input table for Data_Transfer dialog box opens.6. Double-click the datastore that contains the second source.

For Example 1, the datastore that contains configuration information for the ORDERS table.

7. Enter a name for the transfer table in Table name.

Enter the name for a new table. For Example 1, we used Orders_FromFile.

8. Complete any remaining setup tasks for the data flow and save the data flow.9. Optional. To verify that Data Services pushes down the sub data flow to the database server, view the

optimized SQL for the data flow by clicking Validation Display Optimized SQL .

Performance Optimization GuideDistribute data flow execution PUBLIC 109

Page 110: Performance Optimization Guide

The Optimized SQL window shows that Data Services pushes the join operation between the transfer table, Orders_FromFile, and the source table, ORDERS, to the database server.

SELECT "Data_Transfer_Orders_Flatfile"."PRODUCTID" , "ORDERS"."SHIPCOUNTRY" , "ORDERS"."SHIPREGION" , "Data_Transfer_Orders_Flatfile"."ORDERID" FROM "DBO"."ORDERS_FROMFILE" "Data_Transfer_Orders_Flatfile","DBO"."ORDERS""ORDERS" WHERE ("Data_Transfer_Orders_Flatfile"."ORDERID" = "ORDERS"."ORDERID")

NoteSAP Data Services can push down many operations without using the Data_Transfer transform.

When you execute the job, the Trace Log shows messages that indicate that Data Services creates two sub data flows with different process IDs (PID) to run the operations serially.

Task overview: Example 1: Sub data flow that pushes down joins [page 108]

Related Information

Maximize push-down operations [page 33]

9.2.3 Example 2: Sub data flow that pushes down memory-intensive operations

To push down memory-intensive operations such as Group By or Order By, use the Data_Transfer transform.

ExampleUse the same data flow as in example 1:

Add a second Data_Transfer transform to the data flow to push down a Group By operation to the database server. The Data_Transfer transform creates a sub data flow for the Group By operation. Add the Data_Transfer transform between the Lookup Query transform and the GroupBy Query transform.

110 PUBLICPerformance Optimization GuideDistribute data flow execution

Page 111: Performance Optimization Guide

Solution 2: Adding Data_Transfer to push down GroupBy operation [page 111]The position in the data flow of the Data_Transfer transform determines how SAP Data Services interprets your settings for pushing down the GroupBy operation.

Parent topic: Multiple processes with Data_Transfer [page 107]

Related Information

Data_Transfer transform [page 107]Example 1: Sub data flow that pushes down joins [page 108]

9.2.3.1 Solution 2: Adding Data_Transfer to push down GroupBy operation

The position in the data flow of the Data_Transfer transform determines how SAP Data Services interprets your settings for pushing down the GroupBy operation.

In Example 1, we placed the Data_Transfer transform after the source flat file in the data flow. For this example, we place another Data_Transfer transform just before the GroupBy Query transform.

NoteUse the following process as a guide to configure your own data flows for pushing down operations.

1. Drag the Data_Transfer icon from the Transform tab under the Data Integrator node to the data flow in the workspace.

2. Drop the Data_Transfer transform between the Lookup Query and the GroupBy query and connect the objects.

Performance Optimization GuideDistribute data flow execution PUBLIC 111

Page 112: Performance Optimization Guide

3. Open the Data_Transfer transform editor in your workspace and click the Enable transfer option to select it.4. Select Table from the Transfer type dropdown list.5. In the Table options group, click the Browse icon at the end of the Table name text box.

The Input table for Data_Transfer dialog box opens.6. Double-click the datastore that contains the target table.

For Example 2, open the datastore that contains the JOINTARGET table.7. Enter a name for the transfer table and click OK.8. Complete any remaining setup tasks for the data flow and save the data flow.

9. Optional. To view the optimized SQL, click Validation Display Optimized SQL .

The Optimized SQL window shows that the software pushes the GroupBy down to the target database.

INSERT INTO "DBO"."JOINTARGET"("PRODUCTID","SHIPCOUNTRY","SHIPREGION","SALES") SELECT "Data_Transfer_1_Lookup"."PRODUCTID", "Data_Transform_1_Lookup"."SHIPCOUNTRY", "Data_Transfer_1_Lookup"."SHIPREGION",sum("Data_Transfer_1_Lookup"."SALES") FROM "DBO"."GROUPTRANS""Data_Transfer_1_Lookup" GROUP BY "Data_Transfer_1_Lookup"."PRODUCTID","Data_Transfer_1_Lookup"."SHIPCOUNTRY", "Data_Transfer_1_Lookup"."SHIPREGION"

NoteData Services can push down many operations without using the Data_Transfer transform.

During job execution, the messages indicate that Data Services created three sub data flows to run the different operations serially.

Task overview: Example 2: Sub data flow that pushes down memory-intensive operations [page 110]

Related Information

Data_Transfer transform for push-down operations [page 45]

9.3 Multiple processes for a data flow

Configuring multiple processes in a data flow moves resource-intensive operations to a different computer.

A data flow can contain multiple resource-intensive operations that require large amounts of memory or CPU utilization. When you have a server group, run each resource-intensive operation as a separate process, with each process running on a different computer. If you have a multiprocessor computer that has 2 GB or more of memory, run the operations on the same computer.

ExampleA data flow performs the following processes:

112 PUBLICPerformance Optimization GuideDistribute data flow execution

Page 113: Performance Optimization Guide

● Sums sales amounts from a lookup table● Groups sales by country and region● Determines what regions generate the most revenue

In addition to the source and target, the data flow contains the following objects:

● Query transform for the lookup_ext function● Query transform for the group results by country and region

To define separate processes in this sample data flow, take one of the following actions:

● In the lookup_ext function in the first Query, select the Run as a separate process option.● In the Group By operation in the second Query, select the Run GROUP BY as a separate process option.

Example 1: Multiple sub data flows and DOP of 1 [page 113]To configure multiple sub data flows for a data flow that contains resource-intensive processes, select to run the data flows as separate processes.

Example 2: Run multiple sub data flows with DOP greater than 1 [page 115]With the degree of Parallelism set to more than one, SAP Data Services creates multiple processes that each run on a different computer.

Parent topic: Distribute data flow execution [page 105]

Related Information

Run as separate process [page 106]Multiple processes with Data_Transfer [page 107]

9.3.1 Example 1: Multiple sub data flows and DOP of 1

To configure multiple sub data flows for a data flow that contains resource-intensive processes, select to run the data flows as separate processes.

The following example shows the runtime processes when SAP Data Services processes the example job in Multiple processes for a data flow [page 112].

Example● Degree of Parallelism (DOP) = 1● Lookup_ext has Run as a separate process enabled● GroupBy has Run GROUP BY as a separate process enabled

Performance Optimization GuideDistribute data flow execution PUBLIC 113

Page 114: Performance Optimization Guide

At runtime, SAP Data Services splits the data flow into one group of two sub data flows. Data Services automatically names the sub data flows using the following syntax: <DFName>_<executionGroupNumber>_<indexInExecutionGroup>, where:

● <DFName>: Name of the data flow● <executionGroupNumber>: Order that the software executes the sub data flow group● <indexInExecutionGroup>: Sub data flow within an execution group

During execution, the trace log shows the two sub data flows that execute in parallel with a different process ID (PID) for each sub data flow.

The following table shows trace log information. GroupBy_DF_1_1 and GroupBy_DF_1_2 start at the same time and have different PIDs than the parent data flow GroupBy_DF.

Example trace log information

PID TID Type Message

... ... ... ...

964 5128 DATAFLOW Process to execute data flow <GroupBy_DF> is started.

... ... ... ...

4512 5876 DATAFLOW Process to execute sub data flow <GroupBy_DF_1_1> is started.

396 6128 DATAFLOW Process to execute sub data flow <GroupBy_DF_1_2> is started.

... ... ... ...

Parent topic: Multiple processes for a data flow [page 112]

Related Information

Example 2: Run multiple sub data flows with DOP greater than 1 [page 115]

114 PUBLICPerformance Optimization GuideDistribute data flow execution

Page 115: Performance Optimization Guide

9.3.2 Example 2: Run multiple sub data flows with DOP greater than 1

With the degree of Parallelism set to more than one, SAP Data Services creates multiple processes that each run on a different computer.

The following example starts with the data flow described in the example at Multiple processes for a data flow [page 112].

Example● Degree of Parallelism (DOP) = 1● Lookup_ext has Run as a separate process enabled● GroupBy has Run GROUP BY as a separate process enabled

The following diagram shows the runtime processes:

Data Services automatically names each sub data flow using the syntax: DFName_executionGroupNumber_indexInExecutionGroup. For this example, Data Services generates one (1) group of four sub data flows for the data flow named GroupBy_DOP2_DF. The sub data flows have an <indexInExecutionGroup> field value of 1 through 4. The following lists the sub data flow name and the color in which it is depicted in the diagram:

● GroupBy_DOP2_DF_1_3 (gold)● GroupBy_DOP2_DF_1_2 (purple)● GroupBy_DOP2_DF_1_4 (blue)● GroupBy_DOP2_DF_1_1 (green)

NoteSee the sub data flow naming convention in Example 1: Multiple sub data flows and DOP of 1 [page 113].

When you execute the job, the trace log shows that the software creates sub data flows that execute in parallel with different process IDs (PIDs). The following table shows what the trace log displays for the following four sub data flows that start concurrently.

Example Trace log information

PID TID Type Message

... ... ... ...

Performance Optimization GuideDistribute data flow execution PUBLIC 115

Page 116: Performance Optimization Guide

PID TID Type Message

4288 1960 DATAFLOW Process to execute data flow <GroupBy_DOP2_DF> is started.

... ... ... ...

5548 3636 DATAFLOW Process to execute sub data flow <GroupBy_DOP2_DF_1_1> is started.

4032 2868 DATAFLOW Process to execute sub data flow <GroupBy_DOP2_DF_1_2> is started.

1800 5848 DATAFLOW Process to execute sub data flow <GroupBy_DOP2_DF_1_3> is started.

4416 6128 DATAFLOW Process to execute sub data flow <GroupBy_DOP2_DF_1_4> is started.

... ... ... ...

TipIf DOP is greater than 1, select either Job or Data Flow for the Distribution level option in the execution properties. If you execute the job with the value Sub data flow for Distribution level, the Round-Robin-Split (RRS) or Hash Split sends data to the replicated queries that could be executing on different Job Servers. Because the data is sent on the network between different Job Servers, the entire data flow might be slower.

Parent topic: Multiple processes for a data flow [page 112]

Related Information

Example 1: Multiple sub data flows and DOP of 1 [page 113]Degree of parallelism [page 74]Using grid computing to distribute data flow execution [page 117]

116 PUBLICPerformance Optimization GuideDistribute data flow execution

Page 117: Performance Optimization Guide

10 Using grid computing to distribute data flow execution

Grid computing is using a network of computers in which all computers in the network share resources.

You can take advantage of grid computing in SAP Data Services by completing the following tasks:

● Define a server group, which is a group of Job Servers that act as a server grid. Data Services leverages available CPU and memory from the computers in the server group where the Job Servers execute.

● Specify distribution levels when you execute data flows to process smaller data sets or fewer transforms on different Job Servers in a Server Group. Each data flow or sub data flow consumes less virtual memory.

Server Group [page 117]With server groups, SAP Data Services automatically measures resource availability on each Job Server in the group and distributes scheduled batch jobs to the Job Server with the lightest load at runtime.

Distribution levels for data flow execution [page 118]Selecting the right distribution level is important when you use grid computing for your Job Server resources.

10.1 Server Group

With server groups, SAP Data Services automatically measures resource availability on each Job Server in the group and distributes scheduled batch jobs to the Job Server with the lightest load at runtime.

There are two rules for creating server groups:

● All job servers in the server group must be associated with the same repository, which must be your default repository.

● Each computer in a server group can contribute only one Job Server to a server group.

Each rule has more details than what we list here. For more information about the rules, and for details about server group architecture and how to create server groups, see the Management Console Guide.

Parent topic: Using grid computing to distribute data flow execution [page 117]

Related Information

Distribution levels for data flow execution [page 118]Server group architecture

Performance Optimization GuideUsing grid computing to distribute data flow execution PUBLIC 117

Page 118: Performance Optimization Guide

10.2 Distribution levels for data flow execution

Selecting the right distribution level is important when you use grid computing for your Job Server resources.

Select a distribution level when you execute a job. Before job execution, SAP Data Services opens the Execution Properties dialog box. You set the distribution level in the Execution Properties as well as other important job execution options.

The following values are available to select in the Execution Properties dialog box for the Distribution level option:

Distribution level descriptions

Level Description

Job level Executes an entire job on an available Job Server.

Data flow level Executes each data flow in a job on an available Job Server, taking advantage of additional memory (up to 2 GB) for both in-memory and pageable cache on another computer.

Sub data flow level Executes sub data flows in a data flow on available Job Serv­ers. Applicable for resource-intensive operations, such as joins, table comparisons, or table lookups. Each operation can take advantage of up to 2 GB of additional memory for both in-memory and pageable cache on another computer.

Job level [page 118]Job level is the default distribution level when you use a server group for job execution.

Data flow level [page 119]The Data flow level enables all processes in the data flow to run on different computers.

Sub data flow level [page 121]The sub data flow level executes each sub data flow within a data flow on a different computer.

Parent topic: Using grid computing to distribute data flow execution [page 117]

Related Information

Server Group [page 117]

10.2.1 Job level

Job level is the default distribution level when you use a server group for job execution.

At the job level, all sub data flows process on the same computer. For resource-intensive processes, the single computer should have at least 2 GB of memory.

118 PUBLICPerformance Optimization Guide

Using grid computing to distribute data flow execution

Page 119: Performance Optimization Guide

ExampleWhen you execute the job in the example at Example 2: Run multiple sub data flows with DOP greater than 1 [page 115], the following Trace Log messages indicate the distribution level for each sub data flow:

Starting sub data flow <GroupBy_DOP2_DF_1_1> on job server host <SJ-C>, port <3502>. Distribution level <Job>. Starting sub data flow <GroupBy_DOP2_DF_1_2> on job server host <SJ-C>, port <3502>. Distribution level <Job>.Starting sub data flow <GroupBy_DOP2_DF_1_3> on job server host <SJ-C>, port <3502>. Distribution level <Job>. Starting sub data flow <GroupBy_DOP2_DF_1_4> on job server host <SJ-C>, port <3502>. Distribution level <Job>.

Data Services uses named pipes to send data between the sub data flow processes on the same computer, as the following diagram indicates with the red arrows.

Parent topic: Distribution levels for data flow execution [page 118]

Related Information

Data flow level [page 119]Sub data flow level [page 121]

10.2.2 Data flow level

The Data flow level enables all processes in the data flow to run on different computers.

With Data flow selected as the Distribution level, each sub data flow in the following example executes on a different computer. SAP Data Services uses Inter-Process Communications (IPC) to send data between the job and data flows on the different computers. IPC uses the peer-to-peer port numbers specified on the Start port and End port options in the Server Manager

NoteThe default values for Start port and End port are 1025 and 32767, respectively. To restrict the number of ports or use a port that isn't already in use, change these values.

Performance Optimization GuideUsing grid computing to distribute data flow execution PUBLIC 119

Page 120: Performance Optimization Guide

.

ExampleA job named GroupBy_Q1_Q2_Job processes orders for the first quarter and second quarter, respectively. It has two data flows:

● GroupQ1_DF● GroupQ2_DF

In the following diagram, the two data flows under the Job box execute on separate job servers that are different than the computer where you executed the job. Each data flow contains two sub data flows.

When you execute the job, the Trace log displays messages that indicate the communication port for the data flow and the distribution level for each data flow. All of the sub data flows run on the same computer as the parent data flow. The following is an example of text from a Trace log for the example job:

Data flow communication using peer-to-peer method with the port range <1025> to <32767>. ...Peer-to-peer connection server for session process is listening at host <SJ-C>, port <1025>.Job <GroupBy_Q1_Q2_Job> is started.Starting data flow </GroupBy_Q1_Q2_Job/GroupBy_Q1_DF> on job server host <SJ-C>, port <3502>. Distribution level <Data flow>. Data flow submitted to server group <sg_direpo>. Load balancing algorithm <Least load>. Server group load statistics from job server <mssql_lap_js SJ-C 3502>:<mssql_lap_js SJ-C 3502> System Load <47%> Number of CPUs <1><MSSQL2005_JS SJ-W-C 3500> System Load <70%> Number of CPUs <2>Process to execute data flow <GroupBy_Q1_DF> is started.Starting sub data flow <GroupBy_Q1_DF_1_1> on job server host <SJ-C>, port <3502>. Distribution level <Data flow>.Starting sub data flow <GroupBy_Q1_DF_1_2> on job server host <SJ-C>, port <3502>. Distribution level <Data flow>.Starting sub data flow <GroupBy_Q1_DF_1_3> on job server host <SJ-C>, port <3502>. Distribution level <Data flow>.Starting sub data flow <GroupBy_Q1_DF_1_4> on job server host <SJ-C>, port <3502>. Distribution level <Data flow>.

120 PUBLICPerformance Optimization Guide

Using grid computing to distribute data flow execution

Page 121: Performance Optimization Guide

Parent topic: Distribution levels for data flow execution [page 118]

Related Information

Job level [page 118]Sub data flow level [page 121]

10.2.3 Sub data flow level

The sub data flow level executes each sub data flow within a data flow on a different computer.

ExampleThe job named GroupBy_DOP2_Job has a data flow that divides into four sub data flows. The data flow name is GroupBy_DOP2DF. SAP Data Services automatically names the sub data flows based on the syntax described in the example in Example 1: Multiple sub data flows and DOP of 1 [page 113]. The following diagram shows the sub data flows. Each data flow processes on a different computer because the Distribution Level is Sub data flow.

In the diagram, each colored box represents a sub data flow. The arrows labeled IPC represent the Inter-Process Communications (IPC). SAP Data Services uses IPC to send data between the job and the sub data flows.

You specify the IPC port numbers when you set the Start port and End port options in the Server Manager. The default start port number is 1025. The default end port number is 32767. Change these values when you want to restrict the number of ports used or when a port number is already in use.

NoteWith Sub data flow selected for the Distribution level, the Hash Split sends data to the replicated Lookup queries that could be executing on different Job Servers. Because the data is sent on the network between different Job Servers, the entire data flow might be slower. If you find your job execution is slower, consider changing the Distribution level to Data flow or Job.

Performance Optimization GuideUsing grid computing to distribute data flow execution PUBLIC 121

Page 122: Performance Optimization Guide

When you execute the job, the Trace log displays messages that show that Data Services selects a job server for each sub data flow based on the system load on each computer. The following is an example of Trace log messages:

Starting sub data flow <GroupBy_DOP2_DF_1_1> on job server host <SJ-C>, port <3502>. Distribution level <Sub data flow>. Sub data flow submitted to server group <sg_direpo>. Load balancing algorithm <Least load>. Server group load statisticsfrom job server <mssql_lap_js SJ-C 3502>:<mssql_lap_js SJ-C 3502> System Load <21%> Number of CPUs <1><MSSQL2005_JS SJ-W-C 3500> System Load <70> Number of CPUs <1>Starting sub data flow <GroupBy_DOP2_DF_1_2> on job server host <SJ-C>, port <3502>. Distribution level <Sub dataflow>. Sub data flow submitted to server group <sg_direpo>. Load balancing algorithm <Least load>. Server group load statisticsfrom job server <mssql_lap_js SJ-C 3502>:<mssql_lap_js SJ-C 3502> System Load <21%> Number of CPUs <1><MSSQL2005_JS SJ-W-C 3500> System Load <70> Number of CPUs <2>

The following messages show the communication port that each sub data flow uses:

Peer-to-peer connection server for sub data flow <GroupBy_DOP2_DF_1_1> is listening at host <SJ-C>, port <1027>. Process to execute sub data flow <GroupBy_DOP2_DF_1_4> is started.Peer-to-peer connection server for sub data flow <GroupBy_DOP2_DF_1_2> is listening at host <SJ-C>, port <1028>.Peer-to-peer connection server for sub data flow <GroupBy_DOP2_DF_1_3> is listening at host <SJ-C>, port <1029>.Peer-to-peer connection server for sub data flow <GroupBy_DOP2_DF_1_4> is listening at host <SJ-C>, port <1030>.

Parent topic: Distribution levels for data flow execution [page 118]

Related Information

Job level [page 118]Data flow level [page 119]

122 PUBLICPerformance Optimization Guide

Using grid computing to distribute data flow execution

Page 123: Performance Optimization Guide

11 Bulk Loading and Reading

SAP Data Services supports bulk loading in most supported databases that enable you to load and in some cases read data in bulk rather than using SQL statements.

The following list contains some general considerations when you use bulk loading and reading:

● Specify bulk-loading options on the Data Services target table editor in the Options and Bulk Loader Options tabs.

● Specify Teradata reading options on the source table editor Teradata options tab for bulk reading.● Most databases don’t support bulk loading with a template table.● The operation codes that you can use with bulk loading differs between databases. The following table lists

databases and the operation codes they support for bulk loading.

Database Supported operation code

DB2 Universal database INSERT

NORMAL

Netezza DELETE

INSERT

NORMAL

UPDATE

Oracle INSERT

NORMAL

SAP HANA DELETE

INSERT

NORMAL

UPDATE

SAP ASE INSERT

NORMAL

SAP IQ DELETE

INSERT

NORMAL

UPDATE

Performance Optimization GuideBulk Loading and Reading PUBLIC 123

Page 124: Performance Optimization Guide

Database Supported operation code

Teradata INSERT

UPSERT

NoteIf a job using bulk load functionality fails, Data Services saves data files containing customer data in the Bulkload directory. Review and analyze the data to determine why the bulk loading failed. The default bulk load location is <DS_COMMON_DIR>/log/BulkLoader. Data Services doesn’t remove these files. Therefore, it is your responsibility to remove the files after you’re done analyzing them.

For more information about the bulk loading options for each database type, see the Reference Guide.

Google BigQuery ODBC bulk loading [page 125]Use the bulk loading settings in a Google BigQuery ODBC target editor to enhance upload performance.

Configuring bulk loading for Hive [page 126]Use a combination of Hadoop objects to configure bulk loading for Hive targets in a data flow.

Bulk Loading in IBM DB2 Universal Database [page 128]The bulk loading method you choose for IBM DB2 Universal Database (UDB) depends on whether you use a server-named (DSN-less) connection or a DSN connection.

Bulk loading in Informix [page 133]SAP Data Services provides Informix bulk-loading support only for single-byte character ASCII-delimited files, and not for fixed-width files.

Bulk loading in Microsoft SQL Server [page 134]To utilize bulk loading with Microsoft SQL Server, use the SQL Server ODBC bulk copy API.

Bulk loading in Netezza [page 141]SAP Data Services supports bulk loading to Netezza Performance Servers.

Bulk Loading in PostgreSQL [page 145]SAP Data Services supports bulk loading for PostgreSQL using the PSQL tool and DSN-less connections.

Bulk loading in Oracle [page 147]SAP Data Services supports Oracle bulk loading using an API or a staging file.

Bulk loading in SAP HANA [page 154]SAP Data Services improves bulk loading for SAP HANA by using a staging mechanism to load data to the target table.

Bulk loading in SAP ASE [page 155]SAP Data Services supports bulk loading for SAP ASE databases through the SAP ASE bulk copy utility.

Bulk loading in SAP Sybase IQ (SAP IQ) [page 156]SAP Data Services supports bulk loading to SAP IQ databases via the SAP IQ LOAD TABLE SQL command.

Bulk loading and reading in Teradata [page 159]SAP Data Services supports several Teradata bulk loading utilities.

124 PUBLICPerformance Optimization Guide

Bulk Loading and Reading

Page 125: Performance Optimization Guide

Related Information

Types of target tablesTeradata source

11.1 Google BigQuery ODBC bulk loading

Use the bulk loading settings in a Google BigQuery ODBC target editor to enhance upload performance.

Configure bulk loading options in the Bulk Loader Options tab of the target editor. The following table contains option descriptions.

Bulk loading option descriptions

Option Description

Bulk Load Select the checkbox to enable bulk loading.

Mode Specifies how SAP Data Services updates the target file with new data.

● Append: Appends new data to the existing data in the target table.

● Truncate: Deletes all existing data in the target table and loads generated data.

Remote Storage Specifies where to store the data in the remote location. Se­lect Google Cloud Storage to indicate that Data Services stores the uploaded file first in your Google Cloud Storage before uploading to Google BigQuery.

File Location Specifies the name of the Google Cloud Storage (GCS) file location object. Data Services copies the local data file into GCS, then uploads the data file to Google BigQuery.

Field Delimiter Specifies the character to use as the field (column) delimiter for the temporary staging file.

The specified character can be any printable or nonprintable ASCII character. The default character is a comma.

Make sure to designate a character that isn't already present in the input file.

Generate files only Specifies to generate data files only.

Select to load data into a data file or files instead of the tar­get in the data flow. Data Services writes the data files into the bulk loader directory specified in the datastore configu-ration.

Performance Optimization GuideBulk Loading and Reading PUBLIC 125

Page 126: Performance Optimization Guide

Option Description

Clean up bulk loader directory after load Specifies to delete the local staging files and the remote bulk load-oriented files from the Google remote system after the load is complete.

If you select the Generate files only option, do not enable this option.

Parent topic: Bulk Loading and Reading [page 123]

Related Information

Configuring bulk loading for Hive [page 126]Bulk Loading in IBM DB2 Universal Database [page 128]Bulk loading in Informix [page 133]Bulk loading in Microsoft SQL Server [page 134]Bulk loading in Netezza [page 141]Bulk Loading in PostgreSQL [page 145]Bulk loading in Oracle [page 147]Bulk loading in SAP HANA [page 154]Bulk loading in SAP ASE [page 155]Bulk loading in SAP Sybase IQ (SAP IQ) [page 156]Bulk loading and reading in Teradata [page 159]

11.2 Configuring bulk loading for Hive

Use a combination of Hadoop objects to configure bulk loading for Hive targets in a data flow.

Create the following objects:

● HDFS file location object● HDFS file format● Hive database datastore

To set up bulk loading to Hive, follow these steps:

1. Open the Format tab in the Local Object Library and expand the HDFS Files node.2. Select the HDFS file format that you created for this task and drag it onto your data flow workspace.3. Select Make Source.4. Add the applicable transform objects to your data flow.5. Add a template table as a target to the data flow:

126 PUBLICPerformance Optimization Guide

Bulk Loading and Reading

Page 127: Performance Optimization Guide

a. Select the template table icon from the tool palette at right.b. Click on a blank space in your data flow workspace

The Create Template dialog box opens6. Complete Template name with a new name for the target.7. Select the Hive database datastore that you created for this task from the In datastore dropdown list.8. Select a format from the Formats dropdown list.9. Click OK.10. Connect the template to the data flow.11. In your data flow workspace, open the target table and open the Bulk Loader Options tab.

The Bulk Load option is selected by default.12. Select a mode from the Mode dropdown list.

Because the target is a newly-created table, there is no data in the table. However, if you use the data flow in subsequent runs, the Mode affects the data in the target table.○ Append: Adds new records generated from Data Services processing to the existing data in the target

table.○ Truncate: Replaces all existing records in the existing target table with the records generated from Data

Services processing.13. Select the HDFS file location object that you created for this task from the HDFS File Location drop-down

list.14. Complete the remaining target options as applicable.

Task overview: Bulk Loading and Reading [page 123]

Related Information

Google BigQuery ODBC bulk loading [page 125]Bulk Loading in IBM DB2 Universal Database [page 128]Bulk loading in Informix [page 133]Bulk loading in Microsoft SQL Server [page 134]Bulk loading in Netezza [page 141]Bulk Loading in PostgreSQL [page 145]Bulk loading in Oracle [page 147]Bulk loading in SAP HANA [page 154]Bulk loading in SAP ASE [page 155]Bulk loading in SAP Sybase IQ (SAP IQ) [page 156]Bulk loading and reading in Teradata [page 159]Data flowsApache Hadoop

Performance Optimization GuideBulk Loading and Reading PUBLIC 127

Page 128: Performance Optimization Guide

11.3 Bulk Loading in IBM DB2 Universal Database

The bulk loading method you choose for IBM DB2 Universal Database (UDB) depends on whether you use a server-named (DSN-less) connection or a DSN connection.

Select bulk loading when you use an IBM DB2 UDB as a target in a data flow.

When you use an IBM DB2 target in a data flow, the related datastore configuration affects the options available for bulk loading. To enable bulk loading, consider the following restrictions when you configure your DB2 datastore:

● When you use a server-named (DSN-less) connection, the bulk-load method in the target editor is CLI Load only.

● When you use a DSN connection, the bulk-load method in the target editor can be either Import or CLI Load.

When to use each DB2 bulk-loading method [page 129]SAP Data Services supports two bulk-loading methods for DB2 Universal Database (UDB) on Windows and UNIX.

About the DB2 CLI load method [page 130]The DB2 Call Level Interface (CLI) load method performs faster than the bulk-load or import utilities.

IBM DB2 UDB bulk load Import method [page 132]The import mode for IBM DB2 bulk loading works the best when the DB2 server and the Job Server are on the same system.

Parent topic: Bulk Loading and Reading [page 123]

Related Information

Google BigQuery ODBC bulk loading [page 125]Configuring bulk loading for Hive [page 126]Bulk loading in Informix [page 133]Bulk loading in Microsoft SQL Server [page 134]Bulk loading in Netezza [page 141]Bulk Loading in PostgreSQL [page 145]Bulk loading in Oracle [page 147]Bulk loading in SAP HANA [page 154]Bulk loading in SAP ASE [page 155]Bulk loading in SAP Sybase IQ (SAP IQ) [page 156]Bulk loading and reading in Teradata [page 159]

128 PUBLICPerformance Optimization Guide

Bulk Loading and Reading

Page 129: Performance Optimization Guide

11.3.1 When to use each DB2 bulk-loading method

SAP Data Services supports two bulk-loading methods for DB2 Universal Database (UDB) on Windows and UNIX.

Access the bulk load configuration options by opening the DB2 table source in a data flow.

NoteYou cannot bulk load data to DB2 databases that run on AS/400 or z/OS (MVS) systems.

The following table lists the bulk load methods and the advantages and restrictions for each method.

Load methods for bulk loading in DB2

Load method Description Advantages Restrictions

CLI Load Call Level Interface (CLI). Loads a large volume of data at high speed by passing it directly from memory to the table on the DB2 UDB server.

Advantages include the fol­lowing:

● The fastest way to bulk load.

● Does not require inter­mediate data files and therefore reduces the number of options to complete.

● Places rows that violate the unique key con­straint into an exception table.

● Can have either DSN or DSN-less connection.

Restrictions include the fol­lowing:

● DB2 logging is not ena­bled. Therefore, to ena­ble recovery, you must specify Recoverable and Copy target directory op­tions.

● Use DB2 UDB server and client version 8.0 or later.

● Stops loading when it encounters the first re­jected row.

Import Loads a large volume of data by using a SQL INSERT state­ment to write data from an input file into a table or view.

Advantages include the fol­lowing:

● Recovery is enabled au­tomatically because DB2 logging occurs dur­ing import.

● Performs referential in­tegrity or table con­straint checking in addi­tion to unique key con­straint checking.

Restrictions include the fol­lowing:

● The slowest method to bulk load data because DB2 logs each INSERT statement.

● Requires the Data Services Job Server and DB2 UDB server be on the same computer.

● Must use a DSN connec­tion.

Parent topic: Bulk Loading in IBM DB2 Universal Database [page 128]

Performance Optimization GuideBulk Loading and Reading PUBLIC 129

Page 130: Performance Optimization Guide

Related Information

About the DB2 CLI load method [page 130]IBM DB2 UDB bulk load Import method [page 132]

11.3.2 About the DB2 CLI load method

The DB2 Call Level Interface (CLI) load method performs faster than the bulk-load or import utilities.

The CLI load method is faster because it doesn’t write the data to an intermediate file. Instead, the CLI load method writes data from memory directly to the table on the DB2 server. SAP Data Services extracts or transforms data in the memory location.

To use the CLI load method, first configure your system to use the method, and then select the load method in your job.

Configuring your system to use CLI load [page 130]To configure your system to use CLI load, make bulk-load settings in the DB2 database datastore.

Configuring the CLI load method in a job [page 131]After you configure your system for the CLI load method in the datastore editor, configure the load method in the target editor in a data flow.

Parent topic: Bulk Loading in IBM DB2 Universal Database [page 128]

Related Information

When to use each DB2 bulk-loading method [page 129]IBM DB2 UDB bulk load Import method [page 132]

11.3.2.1 Configuring your system to use CLI load

To configure your system to use CLI load, make bulk-load settings in the DB2 database datastore.

Create an IBM DB2 Universal Database (UDB) datastore or open an existing IBM DB2 datastore to edit it.

1. Click Advanced.2. Enter a user name in Bulk loader user name.

SAP Data Services uses this name when it loads data with the CLI load option.

NoteFor bulk loading, the user must have import and load permissions.

130 PUBLICPerformance Optimization Guide

Bulk Loading and Reading

Page 131: Performance Optimization Guide

3. Enter a password in Bulk loader password.

Enter the password associated with the name you entered in Bulk loader user name.4. Set the location of the working directory in DB2 server working directory.5. Complete the remaining options in the datastore editor and save the DB2 datastore.

For descriptions of the datastore options for IBM DB2 UDB, see the Designer Guide.

Task overview: About the DB2 CLI load method [page 130]

Related Information

Configuring the CLI load method in a job [page 131]IBM DB2 datastores

11.3.2.2 Configuring the CLI load method in a job

After you configure your system for the CLI load method in the datastore editor, configure the load method in the target editor in a data flow.

Configure your system to use the CLI load method in the datastore editor by following the steps in Configuring your system to use CLI load [page 130]. In SAP Data Services Designer, create a data flow or open an existing DB2 data flow in your workspace.

1. Double-click the target icon in the dataflow to open the target editor.2. Open the Bulk Loader Options tab.3. Select CLI load in the Bulk loader list.

The Bulk Loader Options tab updates to show all CLI load options and additional or changed CLI load options.

4. Enter a value for the number of rows in Maximum bind array.

The Maximum bind array controls the maximum number of rows that Data Services extracts or transforms before it sends the rows to the DB2 table or view. The default setting is 10000 rows.

5. Select Clean up bulk loader directory after load.

Data Services deletes the message file when the CLI load completes successfully.

CautionThere are no control or data files to clean up in the message file because the CLI load obtains the data from memory.

Task overview: About the DB2 CLI load method [page 130]

Performance Optimization GuideBulk Loading and Reading PUBLIC 131

Page 132: Performance Optimization Guide

Related Information

Configuring your system to use CLI load [page 130]

11.3.3 IBM DB2 UDB bulk load Import method

The import mode for IBM DB2 bulk loading works the best when the DB2 server and the Job Server are on the same system.

You can still use the Import loading method for bulk loading when your DB2 server and the Job Server are not on the same system. However, make sure that you generate the data and control files, and save them in the target file location before you process your job.

Configuring for DB2 Import mode when using separate systems [page 132]If you choose to use the Import bulk load mode for IBM DB2 UDB, and the DB2 server is on a different system than your Job Server, perform additional steps before you execute the job.

Parent topic: Bulk Loading in IBM DB2 Universal Database [page 128]

Related Information

When to use each DB2 bulk-loading method [page 129]About the DB2 CLI load method [page 130]

11.3.3.1 Configuring for DB2 Import mode when using separate systems

If you choose to use the Import bulk load mode for IBM DB2 UDB, and the DB2 server is on a different system than your Job Server, perform additional steps before you execute the job.

Configure your system to use the CLI load method in the datastore editor by following the steps in Configuring your system to use CLI load [page 130]. In SAP Data Services Designer, create a data flow or open an existing DB2 data flow in your workspace. Include a DB2 template table or table as a target in the data flow.

1. Open the target editor by double-clicking the target icon in the data flow.2. Open the Bulk Loading Options tab.3. Select the option Generate files only.4. Complete all other target options and generate the file.5. Move the generated data and control file to the same location as the target database.6. Use the data file and the control file as targets in the database.

132 PUBLICPerformance Optimization Guide

Bulk Loading and Reading

Page 133: Performance Optimization Guide

Task overview: IBM DB2 UDB bulk load Import method [page 132]

11.4 Bulk loading in Informix

SAP Data Services provides Informix bulk-loading support only for single-byte character ASCII-delimited files, and not for fixed-width files.

For detailed information about Informix bulk-loading utility options and their behavior in the Informix DBMS environment, see the relevant Informix product documentation.

Setting up Informix for bulk-loading requires that you set the INFORMIXDIR, INFORMIXSERVER, and PATH environment variables.

For Data Services to initiate Informix bulk loading directly, the Job Server and the target database must be located on the same system.

Set Informix server variables [page 133]Configure environment variables to set information for the Informix server.

Parent topic: Bulk Loading and Reading [page 123]

Related Information

Google BigQuery ODBC bulk loading [page 125]Configuring bulk loading for Hive [page 126]Bulk Loading in IBM DB2 Universal Database [page 128]Bulk loading in Microsoft SQL Server [page 134]Bulk loading in Netezza [page 141]Bulk Loading in PostgreSQL [page 145]Bulk loading in Oracle [page 147]Bulk loading in SAP HANA [page 154]Bulk loading in SAP ASE [page 155]Bulk loading in SAP Sybase IQ (SAP IQ) [page 156]Bulk loading and reading in Teradata [page 159]

11.4.1 Set Informix server variables

Configure environment variables to set information for the Informix server.

For Windows platforms, configure the environment variables in the LINK_DIR\bin\dbloadIfmx.bat script.

Performance Optimization GuideBulk Loading and Reading PUBLIC 133

Page 134: Performance Optimization Guide

ExampleUse the following example script to set the INFORMIXDIR, INFORMIXSERVER, and PATH for your Windows platform. We've shown the variables in bold for easier viewing:

set INFORMIXDIR=<C:\path\to\informix\installation> set INFORMIXSERVER=ol_svr_custom set PATH=%INFORMIXDIR%\bin;%PATH%

For UNIX platforms, configure the environment variables in the $LINK_DIR/bin/dbloadIfmx.sh script.

ExampleUse the following example script to set the INFORMIXDIR, INFORMIXSERVER, and PATH for your UNIX platform. We've shown the variables in bold for easier viewing:

export INFORMIXDIR=</path/to/informix/installation> export INFORMIXSERVER=ol_svr_custom export PATH=$INFORMIXDIR/bin:$PATH

Parent topic: Bulk loading in Informix [page 133]

11.5 Bulk loading in Microsoft SQL Server

To utilize bulk loading with Microsoft SQL Server, use the SQL Server ODBC bulk copy API.

For detailed information about the SQL Server ODBC bulk copy API options and their behavior in the Microsoft SQL Server DBMS environment, see the relevant Microsoft SQL Server product documentation.

Enabling the SQL Server ODBC bulk copy API [page 135]To enable the SQL Server ODBC bulk copy API, configure your Job Server for bulk loading.

Network packet size option [page 135]Enhance bulk loading performance for SQL Server by tuning the network packet and commit size.

Maximum rejects option [page 136]The setting for Maximum rejects can either hurt or enhance performance.

Bulk loading with DataDirect Wire Protocol SQL Server ODBC driver [page 137]Use the DataDirect Wire Protocol SQL Server ODBC driver bulk-load feature to quickly insert and update a large number of records into a database.

Parent topic: Bulk Loading and Reading [page 123]

Related Information

Google BigQuery ODBC bulk loading [page 125]

134 PUBLICPerformance Optimization Guide

Bulk Loading and Reading

Page 135: Performance Optimization Guide

Configuring bulk loading for Hive [page 126]Bulk Loading in IBM DB2 Universal Database [page 128]Bulk loading in Informix [page 133]Bulk loading in Netezza [page 141]Bulk Loading in PostgreSQL [page 145]Bulk loading in Oracle [page 147]Bulk loading in SAP HANA [page 154]Bulk loading in SAP ASE [page 155]Bulk loading in SAP Sybase IQ (SAP IQ) [page 156]Bulk loading and reading in Teradata [page 159]

11.5.1 Enabling the SQL Server ODBC bulk copy API

To enable the SQL Server ODBC bulk copy API, configure your Job Server for bulk loading.

Perform the following steps in SAP Data Services Designer:

1. Select Tools Options and expand the Job Server node and click General.

General options appear at right.2. Enter AL_Engine in the Section text box.

3. Enter UseSQLServerBulkCopy in the Key text box.

4. Enter TRUE or FALSE for Value based on the following information:

○ TRUE: Data Services uses the SQL Server ODBC bulk copy API, which is the value in Key. TRUE is the default setting.

○ FALSE: Data Services overrides the default and uses the SQL Bulk Operations API, which is much slower than the default.

Task overview: Bulk loading in Microsoft SQL Server [page 134]

Related Information

Network packet size option [page 135]Maximum rejects option [page 136]Bulk loading with DataDirect Wire Protocol SQL Server ODBC driver [page 137]

11.5.2 Network packet size option

Enhance bulk loading performance for SQL Server by tuning the network packet and commit size.

When the client loads data to SQL Server, it caches rows until it either fills a network packet or reaches the commit size. When the load reaches the commit size limit, it sends the network packet to the server even when

Performance Optimization GuideBulk Loading and Reading PUBLIC 135

Page 136: Performance Optimization Guide

it hasn't reached the network packet size limit. To improve performance, tune the commit size and network packet size. Adjust the packet size and the commit size In the Bulk Loader Options tab in the target editor.

● Rows per commit: Specifies the number of rows to put in cache before sending the rows to the server.● Network packet size: Specifies the network packet size in Kilobytes. The default network packet size is 4

KB.

After you initially set the options, run a test job and analyze the results. Adjust the settings as appropriate to improve the results.

Performance benefits when you tune the Rows per commit and Network packet size in the following ways:

● Avoids sending several partially filled network packets over the network.● Ensures that packets contain all of the rows in the commit.

Parent topic: Bulk loading in Microsoft SQL Server [page 134]

Related Information

Enabling the SQL Server ODBC bulk copy API [page 135]Maximum rejects option [page 136]Bulk loading with DataDirect Wire Protocol SQL Server ODBC driver [page 137]

11.5.3 Maximum rejects option

The setting for Maximum rejects can either hurt or enhance performance.

The Maximum rejects parameter specifies the maximum number of errors encountered in the data before the bulk load operation stops. For SQL Server, when the maximum rejects setting is reached, Data Services stops processing and rolls back all the previous transactions.

Set this parameter when you don't expect errors, but you want to verify that Data Services loaded the correct file and table.

Parameter value Description

Blank No errors are allowed. If an error is encountered, Data Services stops processing and rolls back all the previous transactions.

0 Data Services continues bulk loading regardless of the number of errors it encounters.

Positive number (greater than 1) Data Services stops processing after <n> errors.

NoteA rejected row contains data that doesn’t match the expected format or information specified for the table being loaded.

136 PUBLICPerformance Optimization Guide

Bulk Loading and Reading

Page 137: Performance Optimization Guide

Parent topic: Bulk loading in Microsoft SQL Server [page 134]

Related Information

Enabling the SQL Server ODBC bulk copy API [page 135]Network packet size option [page 135]Bulk loading with DataDirect Wire Protocol SQL Server ODBC driver [page 137]Microsoft SQL Server target table options

11.5.4 Bulk loading with DataDirect Wire Protocol SQL Server ODBC driver

Use the DataDirect Wire Protocol SQL Server ODBC driver bulk-load feature to quickly insert and update a large number of records into a database.

The DataSirect Wire Protocol SQL Server ODBC driver doesn’t require a separate database load utility because the bulk-load feature is built into the driver. DataDirect drivers are included in the Data Services installation.

For more information about the Wire Protocol SQL Server ODBC driver, see the DataDirect documentation.

Consider the following information when you use the Data Direct Wire Protocol SQL Server ODBC driver bulk loading:

● Enable the bulk-load option only to optimize load performance. Leaving the bulk-load option enabled at all times could lead to undesired results or even corrupt data.

● Don’t select the Enable Bulk Load option if any of the following Data Services loader options are enabled:○ Include in Transaction○ Use Overflow File○ Auto Correct Load○ Load Triggers

NoteEnable Bulk Load is in the Bulk Loader Options tab in the target table editor.

● Create a different DSN and datastore under the following circumstances:○ You use the same SQL Server database server in different datastores.○ You use loaders that have different bulk-load options.

Enabling DataDirect bulk load in Windows [page 138]Configure the bulk loading feature with the Data Direct Wire Protocol SQL Server ODBC driver in Windows using the ODBC Data Source Administrator.

Enabling DataDirect bulk load in UNIX [page 139]Configure the bulk loading feature with the Data Direct Wire Protocol SQL Server ODBC driver in UNIX using the odbc.ini file.

Performance Optimization GuideBulk Loading and Reading PUBLIC 137

Page 138: Performance Optimization Guide

Parent topic: Bulk loading in Microsoft SQL Server [page 134]

Related Information

Enabling the SQL Server ODBC bulk copy API [page 135]Network packet size option [page 135]Maximum rejects option [page 136]

11.5.4.1 Enabling DataDirect bulk load in Windows

Configure the bulk loading feature with the Data Direct Wire Protocol SQL Server ODBC driver in Windows using the ODBC Data Source Administrator.

1. Open the ODBC Data Source Administrator in one of two ways:

○ Use the Windows Start menu.○ Select Use data source name and click ODBC Admin in the datastore editor.

2. Open the Use DSN tab and click Add.

The Create New Data Source dialog box opens.3. Select the applicable DataDirect SQL Server Wire Protocol driver and click Finish.

The ODBC SQL Server Wire Protocol Driver Setup dialog box opens.4. Enter driver setup information, such as the name of the data source and the host number.5. Select the Enable Bulk Loading option in the Bulk tab.6. Set the Bulk Options as described in the following table:

Option Description

Keep Identity Keeps source identity values.

Check Constraints Checks constraints while data is inserted into the data­base.

Keep Nulls Keeps null values in the destination table.

Table Lock Locks the table while the bulk copy operation is taking place. This option is checked by default.

Fire Triggers Executes a trigger each time a row is inserted into the da­tabase.

Bulk Binary Threshold (KB) Maximum amount of data exported to the bulk data file.

Batch Size Number of rows the driver sends to the database at one time.

Bulk Character Threshold (KB) Maximum amount of character data to export to the bulk data file.

Task overview: Bulk loading with DataDirect Wire Protocol SQL Server ODBC driver [page 137]

138 PUBLICPerformance Optimization Guide

Bulk Loading and Reading

Page 139: Performance Optimization Guide

Related Information

Enabling DataDirect bulk load in UNIX [page 139]

11.5.4.2 Enabling DataDirect bulk load in UNIX

Configure the bulk loading feature with the Data Direct Wire Protocol SQL Server ODBC driver in UNIX using the odbc.ini file.

1. Open the odbc.ini file associated with the applicable data source name (DSN) with a text editor.

2. Set the EnableBulkLoad option to 1.3. Enter a BulkLoadOptions value.

Option Description

Keep Identity A value of 1 keeps source identity values.

Check Constraints A value of 16 checks constraints while data is being inserted into the database.

Keep Nulls A value of 64 keeps null values in the destination table.

Table Lock A value of 2 locks the table while the bulk copy operation is taking place.

Fire Triggers A value of 32 executes triggers when a row is being inserted into the database.

The value you enter depends on what options you enable. The following table describes the options and applicable values. Add the option values together and use the total as the BulkLoadOption value.

Option Description

Keep Identity Value is 1, which keeps the source identity values.

Check Constraints Value is 16. Checks constraints while the loader inserts data into the database.

Keep Nulls Value is 64. Keeps null values in the destination table.

Table Lock Value is 2. Locks the table during the bulk copy operation.

Fire Triggers Value is 32. Executes triggers when the loader inserts a row into the database.

Example16 (Check Constraints) + 32 (Fire Triggers) + 1 (Keep Identity) = 49.

BulkLoadOptions=49

4. Set the BulkLoadBatchSize option.

The BulkLoadBatchSize option sets the number of rows the driver sends to the database at one time. The default value is 1024.

Performance Optimization GuideBulk Loading and Reading PUBLIC 139

Page 140: Performance Optimization Guide

Example

The following code shows an example of an updated odbc.ini file:

[ddsql] Driver=/build/ds41/dataservices/DataDirect/odbc/lib/DAsqls25.soDescription=DataDirect 6.1 SQL Server Wire ProtocolAlternateServers=AlwaysReportTriggerResults=0AnsiNPW=1ApplicationName=ApplicationUsingThreads=1AuthenticationMethod=1BulkBinaryThreshold=32BulkCharacterThreshold=-1BulkLoadBatchSize=1024BulkLoadOptions=2ConnectionReset=0ConnectionRetryCount=0ConnectionRetryDelay=3Database=odsEnableBulkLoad=1EnableQuotedIdentifiers=0EncryptionMethod=0FailoverGranularity=0FailoverMode=0FailoverPreconnect=0FetchTSWTZasTimestamp=0FetchTWFSasTime=1GSSClient=nativeHostName=vantgvmwin470HostNameInCertificate=InitializationString=Language=LoadBalanceTimeout=0LoadBalancing=0LoginTimeout=15LogonID=MaxPoolSize=100MinPoolSize=0PacketSize=-1Password=Pooling=0PortNumber=1433QueryTimeout=0ReportCodePageConversionErrors=0SnapshotSerializable=0TrustStore=TrustStorePassword=ValidateServerCertificate=1WorkStationID=XML Describe Type=-10

Task overview: Bulk loading with DataDirect Wire Protocol SQL Server ODBC driver [page 137]

Related Information

Enabling DataDirect bulk load in Windows [page 138]

140 PUBLICPerformance Optimization Guide

Bulk Loading and Reading

Page 141: Performance Optimization Guide

11.6 Bulk loading in Netezza

SAP Data Services supports bulk loading to Netezza Performance Servers.

For detailed information about Netezza loading options and their behavior in the Netezza environment, see the relevant Netezza product documentation.

Netezza recommends using the bulk-loading method to load data for faster performance. Unlike some other bulk loaders, the SAP Data Services bulk loader for Netezza supports UPDATE and DELETE as well as INSERT operations, which allows for more flexibility and performance.

Netezza bulk-loading process [page 141]SAP Data Services follows a specific process to bulk load data into Netezza databases.

Options overview [page 142]Select a bulk loading method in the Netezza target editor.

Configuring bulk loading for Netezza [page 143]To configure Netezza for bulk loading, make settings in the datastore editor and then enable and configure bulk loading in the table target editor.

Netezza log files: nzlog and nzbad [page 144]Netezza generates the nzlog and the nzbad files when it writes data from the external table to the staging table.

Parent topic: Bulk Loading and Reading [page 123]

Related Information

Google BigQuery ODBC bulk loading [page 125]Configuring bulk loading for Hive [page 126]Bulk Loading in IBM DB2 Universal Database [page 128]Bulk loading in Informix [page 133]Bulk loading in Microsoft SQL Server [page 134]Bulk Loading in PostgreSQL [page 145]Bulk loading in Oracle [page 147]Bulk loading in SAP HANA [page 154]Bulk loading in SAP ASE [page 155]Bulk loading in SAP Sybase IQ (SAP IQ) [page 156]Bulk loading and reading in Teradata [page 159]

11.6.1 Netezza bulk-loading process

SAP Data Services follows a specific process to bulk load data into Netezza databases.

To bulk load to a Netezza target table, Data Services performs the following tasks:

Performance Optimization GuideBulk Loading and Reading PUBLIC 141

Page 142: Performance Optimization Guide

● Creates an external table that is associated with a local file or named pipe● Loads data from the source into the file or named pipe● Loads data from the external table into a staging table by executing an INSERT statement● Loads data from the staging table to the target table by executing a set of INSERT/UPDATE/DELETE

statements

ExampleThe following diagram shows the flow of data from the source to target:

Parent topic: Bulk loading in Netezza [page 141]

Related Information

Options overview [page 142]Configuring bulk loading for Netezza [page 143]Netezza log files: nzlog and nzbad [page 144]

11.6.2 Options overview

Select a bulk loading method in the Netezza target editor.

Bulk loading options are in the Bulk Loader Options tab of the Netezza target table editor. Open the editor by clicking the target object in a data flow. The method you choose is based on your Netezza environment.

Netezza bulk loading methods

Method Description

Named pipe Streams data as it is written to the named pipe through the external table to the staging table. Select Named pipe for faster performance for files that are larger than 4 GB.

142 PUBLICPerformance Optimization Guide

Bulk Loading and Reading

Page 143: Performance Optimization Guide

Method Description

File Writes the data to a file before loading through the external table to the staging table. Select File for faster performance for files that are smaller than 4 GB.

None Does not use bulk loading.

Set options in the target table editor Options tab when you select Update or Delete-insert update methods.

Find the following options in the General section of the Options tab:

● Column comparison: Specifies how SAP Data Services maps the input columns to the output columns.● Number of loaders: Specifies the number of loaders the software uses.

Find the following options in the Update Control settings in the Options tab:

● Use input keys: Uses primary key from the input when target does not have primary key.● Update key columns: Updates key column values when it loads data to the target.● Auto correct load: Ensures that Data Services does not duplicate rows in the target.

Parent topic: Bulk loading in Netezza [page 141]

Related Information

Netezza bulk-loading process [page 141]Configuring bulk loading for Netezza [page 143]Netezza log files: nzlog and nzbad [page 144]Target Table editor: Options tabUpdate Control

11.6.3 Configuring bulk loading for Netezza

To configure Netezza for bulk loading, make settings in the datastore editor and then enable and configure bulk loading in the table target editor.

Perform the following steps in SAP Data Services Designer:

1. Create a new or edit an existing Netezza datastore.2. In the Netezza datastore editor, click Advanced.3. Type the directory path or browse for the path in the Bulk loader directory text box.

Data Services uses this directory for writing SQL and for storing data files for bulk loading.4. Enter the FTP host name, logon user name, logon password, and host working directory in the FTP

category options.

These options are used to transfer the Netezza nzlog and nzbad files.

Performance Optimization GuideBulk Loading and Reading PUBLIC 143

Page 144: Performance Optimization Guide

NoteIf you use the Netezza datastore for other purposes other than for Netezza bulk loading, Data Services ignores any FTP option entries.

5. Set the Code page option as follows:

○ For loading non-ASCII character data, set the Code page to latin-9.○ For loading multibyte data, set the Code page to utf-8.

6. Click OK or Apply.7. Open the applicable data flow in the workspace and click the target object to open the table editor.8. Open Bulk Loader Options tab and select a bulk-loading method.9. Set the remaining options on the Bulk Loader Options tab and on the Options tab as described in Options

overview [page 142].10. Save the data flow.

Task overview: Bulk loading in Netezza [page 141]

Related Information

Netezza bulk-loading process [page 141]Options overview [page 142]Netezza log files: nzlog and nzbad [page 144]

11.6.4 Netezza log files: nzlog and nzbad

Netezza generates the nzlog and the nzbad files when it writes data from the external table to the staging table.

Netezza writes the nzlog and nzbad log files to one of the following database server working directories:

● The directory that you set in the Database server working directory option in the datastore editor.● The Netezza /tmp directory when you do not specify a database server working directory.

For SAP Data Services to access and manage these logs, configure the FTP parameters in the datastore editor. After a load, Data Services copies the log files from the specified location to the specified Bulk loader directory and deletes them from the Netezza server.

Ensure that you enable the option Clean up bulk loader directory after load in the target table editor. Based on the success of the load, Data Services behaves as follows:

● For successful loads, Data Services deletes the log files from the Bulk loader directory when you enable the Clean up bulk loader directory after load option in the target table editor.

● For failed loads, Data Services does not delete the log files from the Bulk loader directory, even when you enable the Clean up bulk loader directory after load option.

144 PUBLICPerformance Optimization Guide

Bulk Loading and Reading

Page 145: Performance Optimization Guide

Parent topic: Bulk loading in Netezza [page 141]

Related Information

Netezza bulk-loading process [page 141]Options overview [page 142]Configuring bulk loading for Netezza [page 143]

11.7 Bulk Loading in PostgreSQL

SAP Data Services supports bulk loading for PostgreSQL using the PSQL tool and DSN-less connections.

To prepare for bulk loading, download the PSQL tool from the official PostgreSQL website. For convenience, create a global variable for the PSQL tool location. Then use the global variable for the PSQL Full Path option when you complete the Advanced options in the PostgreSQL datastore configuration.

Configure bulk loading for PostgreSQL server-based (DSN-less) connections only. Data Services supports the append and truncate modes of bulk loading for PostgreSQL.

Data Services creates a bulk loading log file and error file that contain log and error messages. Data Services stores the log and error files in the directory that you set in the Bulk Loader Directory option in the datastore configuration.

To enable bulk loading in your job, complete the options in the Bulk Loader Options tab in the target editor.

Parent topic: Bulk Loading and Reading [page 123]

Related Information

Google BigQuery ODBC bulk loading [page 125]Configuring bulk loading for Hive [page 126]Bulk Loading in IBM DB2 Universal Database [page 128]Bulk loading in Informix [page 133]Bulk loading in Microsoft SQL Server [page 134]Bulk loading in Netezza [page 141]Bulk loading in Oracle [page 147]Bulk loading in SAP HANA [page 154]Bulk loading in SAP ASE [page 155]Bulk loading in SAP Sybase IQ (SAP IQ) [page 156]Bulk loading and reading in Teradata [page 159]

Performance Optimization GuideBulk Loading and Reading PUBLIC 145

Page 146: Performance Optimization Guide

11.7.1 Bulk Loading options

Complete options for bulk loading in the datastore and in the Bulk Loader Options tab in the target editor.

Create a PostgreSQL datastore, and configure it for a DSN-less connection. Complete all required fields in the datastore editor. Ensure that you complete the fields listed in the following table to configure PostgreSQL for bulk loading.

Required bulk loader datastore options

Option Description

Database server name Specifies the host name of the applicable database server.

Database name Specifies the name of the PostgreSQL database.

Port Specifies the port number to use to access the specified PostgreSQL database.

User name Specifies the name of the user authorized to access the specified database.

Password Specifies the password related to the stated user name.

Bulk loader directory Specifies the directory where Data Services stores the files related to bulk loading, such as log file, error file, and tempo­rary files.

Click the Browse icon at the end of the field and either browse for the location or select the global variable.

If you leave this field blank, Data Services writes the files to %DS_COMMON_DIR%/log/BulkLoader.

PSQL full path Specifies the full path to the location of the psql tool.

For convenience, create a global variable for this value before you configure the datastore.

Click the Browse icon at the end of the field and either browse to the location, or select the global variable.

In a data flow, include the PostgreSQL target object, such as a PostgreSQL table. Then open the target editor and the Bulk Loader Options tab. The following table contains descriptions for the bulk loader options.

PostgreSQL Bulk Loader Options tab

Option Description

Bulk Load Specifies to implement bulk loading for this target.

Select to enable bulk loading.

146 PUBLICPerformance Optimization Guide

Bulk Loading and Reading

Page 147: Performance Optimization Guide

Option Description

Mode Specifies how Data Services updates the target file:

● Append: Append new data to the existing data in the target table.

● Truncate: Delete all existing data in the target table and load generated data.

Generate files only Specifies to generate data files only.

Select to load data into a data file or files instead of the tar­get in the data flow. Data Services writes the data files into the bulk loader directory specified in the datastore configu-ration.

Data Services automatically names the file in the following format: PGS_<datastore name>_<Owner>_<table name>_<PID>_<loader number>_<number_of_files_generated_by_each_loader>_<timestamp>.dat.

<tablename> is the name of the target table.

Clean up bulk loader directory after load Specifies to delete all bulk load-oriented files from the bulk-load directory and the PostgreSQL remote system after the load is complete.

If you select the Generate files only option, do not enable this option.

11.8 Bulk loading in Oracle

SAP Data Services supports Oracle bulk loading using an API or a staging file.

Data Services offers two options for the Oracle bulk loading methods. Select the method in the Bulk Loader Options tab in the target table editor.

Oracle bulk loading methods

Method Description

API Accesses the direct path engine of the Oracle database server associated with the target table and connected to the target database.

With the Oracle Direct-Path Load API, Data Services feeds input data directly into database files.

To use this option, you must have Oracle version 8.1 or later.

Performance Optimization GuideBulk Loading and Reading PUBLIC 147

Page 148: Performance Optimization Guide

Method Description

File Writes an intermediate staging file, control file, and log files to the local disk. Invokes the Oracle SQL*Loader.

The File method requires more processing time than the API method.

For detailed information about the Oracle SQL*Loader op­tions, see the relevant Oracle product documentation.

Oracle supports two bulk loading modes as described in the following table. Select the mode in the Bulk Loader Options tab in the target table editor.

Oracle bulk loading modes

Mode Description

Conventional-path Uses the SQL INSERT statements to load data to tables.

Implicit for the File bulk loading method.

Direct-path loads use multiple buffers for a number of formatted blocks that load data directly to database files associated with ta­bles.

Implicit for the API bulk loading method.

.

For all Oracle table target options, see the Reference Guide.

Use table partitioning for Oracle bulk loading [page 149]For performance optimization, use table partitioning when you implement bulk loading for Oracle databases.

Oracle bulk loading method and mode combinations [page 150]Use combinations of bulk loading methods and modes to configure bulk loading in various ways for the best performance optimization for your situation.

Example 1: File method, Direct-path mode, and Number of loaders [page 151]The following example illustrates SAP Data Services performance when you use the File method, the Direct-path mode, and you adjust the number of loaders in the table target editor.

Example 2: API method, Direct-path mode, partitioned tables [page 152]The following example illustrates how SAP Data Services processes a data flow when you use bulk loading with the API method, the Direct-path mode, and a partitioned target table.

Parent topic: Bulk Loading and Reading [page 123]

Related Information

Google BigQuery ODBC bulk loading [page 125]Configuring bulk loading for Hive [page 126]

148 PUBLICPerformance Optimization Guide

Bulk Loading and Reading

Page 149: Performance Optimization Guide

Bulk Loading in IBM DB2 Universal Database [page 128]Bulk loading in Informix [page 133]Bulk loading in Microsoft SQL Server [page 134]Bulk loading in Netezza [page 141]Bulk Loading in PostgreSQL [page 145]Bulk loading in SAP HANA [page 154]Bulk loading in SAP ASE [page 155]Bulk loading in SAP Sybase IQ (SAP IQ) [page 156]Bulk loading and reading in Teradata [page 159]Oracle target table options

11.8.1 Use table partitioning for Oracle bulk loading

For performance optimization, use table partitioning when you implement bulk loading for Oracle databases.

Enable partitioning by setting options in the Options tab of the table target editor. SAP Data Services processes the partitioned tables in parallel for performance optimization.

When you select the API bulk loading method, check the Enable partitioning checkbox. With partitioning enabled, SAP Data Services generates the number of target parallel instances based on the number of partitions in the target table. If you don't enable partitioning, or if your table target is not partitioned, Data Services uses one loader by default.

When you select the File bulk loading method, enter a value in the Number of loaders text box or click the Enable partitioning checkbox.

NoteIf the target table is not partitioned, the Enable partitioning checkbox does not appear on the Options tab.

Parent topic: Bulk loading in Oracle [page 147]

Related Information

Oracle bulk loading method and mode combinations [page 150]Example 1: File method, Direct-path mode, and Number of loaders [page 151]Example 2: API method, Direct-path mode, partitioned tables [page 152]

Performance Optimization GuideBulk Loading and Reading PUBLIC 149

Page 150: Performance Optimization Guide

11.8.2 Oracle bulk loading method and mode combinations

Use combinations of bulk loading methods and modes to configure bulk loading in various ways for the best performance optimization for your situation.

The following table shows all possible combinations of settings for Oracle bulk loading. The combinations include the method, mode, and partitioning setting.

Method Mode Partitioning

API Direct-path Enable partitioning: Not selected. Data Services uses one loader by default.

API Direct-path Enable partitioning: Selected.

File Direct-path Number of loaders: Set to 1.

File Direct-path Number of loaders: Set to greater than 1.

File Direct-path Enable partitioning: Selected.

File Conventional Number of loaders: Set to 1.

File Conventional Number of loaders: Set to greater than 1.

File Conventional Enable partitioning: Selected.

Information about the API method

The following list contains information about the API method:

● The API method always uses the direct-path load type.● SAP Data Services processes loads in parallel when you use the direct-path load type with a partitioned

target table.● Data Services instantiates multiple loaders based on the number of partitions in a table. Each loader

receives rows that meet the conditions specified by the partition.

Information about the File method

The following list contains information about the File method:

● The direct-path mode is faster than conventional load for the File method.● The File method is slower than the API method because, during the File method, Data Services must

generate a staging file, logs, and invoke the Oracle SQL*Loader.

150 PUBLICPerformance Optimization Guide

Bulk Loading and Reading

Page 151: Performance Optimization Guide

● When you set the Number of Loaders to greater than 1, or you select the Enable partitioning option, loads can’t truly run in parallel; Data Services creates the staging file and log file for each loader serially.

Parent topic: Bulk loading in Oracle [page 147]

Related Information

Use table partitioning for Oracle bulk loading [page 149]Example 1: File method, Direct-path mode, and Number of loaders [page 151]Example 2: API method, Direct-path mode, partitioned tables [page 152]

11.8.3 Example 1: File method, Direct-path mode, and Number of loaders

The following example illustrates SAP Data Services performance when you use the File method, the Direct-path mode, and you adjust the number of loaders in the table target editor.

At runtime, Data Services instantiates multiple loaders based on your setting for Number of loaders in the Options tab of the table target editor. Each loader receives rows equal to the amount specified in the Rows per commit text box on the Bulk Loader Options tab. The loaders pipe rows to a staging file, then call the SQL*Loader to load the staging file contents into the table.

This loading process occurs in a “round-robin” fashion.

ExampleYour data flow has the following bulk loader option settings:

● Bulk Loader method: File● Bulk Loader mode: Direct-path● Rows per commit = 5000● Number of loaders = 2

Data Services performs the loading in a round-robin fashion as follows:

● Data Services sends 5000 rows to the first loader, which writes the rows to a staging file and invokes the SQL*Loader to load the data into the target table.

● Concurrently, the second loader receives the second batch of 5000 rows, writes them to a staging file, and waits for the first loader to complete the loading.

● When the first loader completes the bulk load, the second loader invokes the SQL*Loader and loads the second batch of 5000 rows into the target table.

● Concurrently, the first loader receives the third batch of 5000 rows, writes to a staging file, and waits for the second loader to complete the loading.

This process continues until Data Services loads all the data.

The SQL*Loader uses a control file to read staging files and load data. Data Services either creates this control file at runtime or uses one that is specified on the Bulk Loader Options tab at design time.

Performance Optimization GuideBulk Loading and Reading PUBLIC 151

Page 152: Performance Optimization Guide

For parallel loading, Data Services uses the following naming convention for the generated control files, data files, and log files:

<TableName><TID><PID>_<LDNUM>_<BATCHNUM>

Where:

<TableName> Name of the table into which data loads.

<TID> The thread ID.

<PID> The process ID.

<LDNUM> The loader number, which ranges from 0 to number of load­ers minus 1. For single loaders, LDNUM is always 0.

<BATCHNUM> The batch number that the loader is processing. For single loaders, the <BATCHNUM> is always 0.

Processing performance during this type of parallel loading depends on a number of factors such as distribution of incoming data and underlying DBMS capabilities.

NoteUnder some circumstances, it is possible that specifying parallel loaders can be detrimental to performance. Always test the parallel loading process before moving to production.

Parent topic: Bulk loading in Oracle [page 147]

Related Information

Use table partitioning for Oracle bulk loading [page 149]Oracle bulk loading method and mode combinations [page 150]Example 2: API method, Direct-path mode, partitioned tables [page 152]

11.8.4 Example 2: API method, Direct-path mode, partitioned tables

The following example illustrates how SAP Data Services processes a data flow when you use bulk loading with the API method, the Direct-path mode, and a partitioned target table.

When you select Enable partitioning in the target table editor, Data Services performs the following operations:

● Instantiates multiple loaders based on the number of partitions in the table.● Sends rows to each loader based on the number set for Rows per commit in the Bulk Loader Options tab.● Sends rows to each loader based on the conditions specified by the partition.

152 PUBLICPerformance Optimization Guide

Bulk Loading and Reading

Page 153: Performance Optimization Guide

ExampleWhen you configure bulk loading and table partitioning as follows, the first loader commits after receiving 2500 of all possible rows while loader 2 runs concurrently:

● Bulk load (method): API● Mode: Direct-path● Rows per commit = 5000● Enable Partitioning: Yes● Table has 2 partitions● First partition has 2500 rows

If you change the first partition to include 10,000 rows, the first loader commits twice: Once after receiving 5000 rows and again after receiving the second batch of 5000 rows. The second loader runs concurrently with the first loader.

The loaders pipe rows directly to Oracle database files by using Oracle direct-path load APIs associated with the target database. Obtain the direct-path load APIs with your Oracle client installation.

The API method enables Data Services to bypass using the SQL* Loader and the control and staging files it needs. In addition, by using table partitioning, bulk loaders pass data to different partitions in the same target table at the same time.

NoteFor Oracle, import partitioned tables as Data Services metadata. However, if you plan to use a partitioned table as a target, ensure that the physical table partitions in the database match the metadata table partitions in Data Services. If there is a mismatch, Data Services does not use the partition name to load partitions, which impacts processing time.

For the API method, Data Services records and displays error and trace logs as it does for any job. A monitor log records connection activity between components; however, it does not record activity while the API is handling the data.

Parent topic: Bulk loading in Oracle [page 147]

Related Information

Use table partitioning for Oracle bulk loading [page 149]Oracle bulk loading method and mode combinations [page 150]Example 1: File method, Direct-path mode, and Number of loaders [page 151]

Performance Optimization GuideBulk Loading and Reading PUBLIC 153

Page 154: Performance Optimization Guide

11.9 Bulk loading in SAP HANA

SAP Data Services improves bulk loading for SAP HANA by using a staging mechanism to load data to the target table.

When Data Services uses changed data capture (CDC) or auto correct load, it uses a temporary staging table to load the target table. Data Services loads the data to the staging table and applies the operation codes INSERT, UPDATE, and DELETE to update the target table. With the Bulk load option selected in the target table editor, any one of the following conditions triggers the staging mechanism:

● The data flow contains a Map CDC Operation transform.● The data flow contains a Map Operation transform that outputs UPDATE or DELETE rows.● The data flow contains a Table Comparison transform.● The Auto correct load option in the target table editor is set to Yes.

If none of these conditions are met, the input data contains only INSERT rows. Therefore Data Services performs only a bulk insert operation, which does not require a staging table or the need to execute any additional SQL.

By default, Data Services automatically detects the SAP HANA target table type. Then Data Services updates the table based on the table type for optimal performance.

The bulk loader for SAP HANA is scalable and supports UPDATE and DELETE operations. Therefore, the following options in the target table editor are also available for bulk loading:

● Use input keys: Uses the primary keys from the input table when the target table does not contain a primary key.

● Auto correct load: If a matching row to the source table does not exist in the target table, Data Services inserts the row in the target. If a matching row exists, Data Services updates the row based on other update settings in the target editor.

Find these options in the target editor under Update Control.

For more information about SAP HANA bulk loading and option descriptions, see the Data Services Supplement for Big Data.

Parent topic: Bulk Loading and Reading [page 123]

Related Information

Google BigQuery ODBC bulk loading [page 125]Configuring bulk loading for Hive [page 126]Bulk Loading in IBM DB2 Universal Database [page 128]Bulk loading in Informix [page 133]Bulk loading in Microsoft SQL Server [page 134]Bulk loading in Netezza [page 141]Bulk Loading in PostgreSQL [page 145]Bulk loading in Oracle [page 147]

154 PUBLICPerformance Optimization Guide

Bulk Loading and Reading

Page 155: Performance Optimization Guide

Bulk loading in SAP ASE [page 155]Bulk loading in SAP Sybase IQ (SAP IQ) [page 156]Bulk loading and reading in Teradata [page 159]SAP HANA target table options

11.10 Bulk loading in SAP ASE

SAP Data Services supports bulk loading for SAP ASE databases through the SAP ASE bulk copy utility.

Set options in the Bulk Loader Options tab of the target editor when you use the SAP ASE table as a target in a data flow. The following table describes the applicable options.

SAP ASE bulk loading options

Option Description

Maximum rejects Specifies the maximum number of warnings for rejected re­cords received before Data Services stops bulk loading.

Rows per commit Specifies the number of rows Data Services processes be­fore it commits data.

Network packet size Specifies a network packet size in Kilobytes.

Use the Rows per commit, Maximum rejects, and Network packet size options to tune performance in the following ways:

● Rows per commit and Network packet size work together to determine when the software stops caching and commits the data to the target.

● Maximum rejects stops processing when the software reaches a set number of rejects in the database.● While loading, the client caches rows until it fills a network packet or reaches the number of rows set for

Rows per commit. If the client reaches the number of rows first, it sends the packet to the server regardless of whether the network packet is full.

For detailed information about the SAP ASE bulk loader options and their behavior in the SAP ASE DBMS environment, see the relevant SAP ASE product documentation. For descriptions of the table target editor options, see the Reference Guide.

Parent topic: Bulk Loading and Reading [page 123]

Related Information

Google BigQuery ODBC bulk loading [page 125]Configuring bulk loading for Hive [page 126]Bulk Loading in IBM DB2 Universal Database [page 128]Bulk loading in Informix [page 133]Bulk loading in Microsoft SQL Server [page 134]

Performance Optimization GuideBulk Loading and Reading PUBLIC 155

Page 156: Performance Optimization Guide

Bulk loading in Netezza [page 141]Bulk Loading in PostgreSQL [page 145]Bulk loading in Oracle [page 147]Bulk loading in SAP HANA [page 154]Bulk loading in SAP Sybase IQ (SAP IQ) [page 156]Bulk loading and reading in Teradata [page 159]SAP ASE target table options

11.11 Bulk loading in SAP Sybase IQ (SAP IQ)

SAP Data Services supports bulk loading to SAP IQ databases via the SAP IQ LOAD TABLE SQL command.

For detailed information about the SAP IQ LOAD TABLE parameters and their behavior in the SAP IQ database environment, see the relevant SAP IQ product documentation.

For improved performance when you use changed-data capture or auto correct load, Data Services uses a temporary staging table to load the target table. Data Services first loads the data to the staging table, then it applies the operation codes INSERT, UPDATE, and DELETE to update the target table. When you select the Bulk load option in the table target editor, any one of the following conditions triggers the staging mechanism:

● The data flow contains a Map_CDC_Operation transform.● The data flow contains a Map_Operation transform that outputs UPDATE or DELETE rows.● The Auto correct load option in the table target editor is set to Yes.

If none of these conditions exist in your data flow, the input data contains only INSERT rows. Therefore, Data Services performs a bulk INSERT operation, which does not require a staging table or the need to execute any additional SQL.

Because the bulk loader for SAP IQ also supports UPDATE and DELETE operations, the following options in the Update control section of the table target editor are also available for bulk loading:

● Use input keys: Uses the primary keys from the input table when the target table does not contain a primary key.

● Auto correct load: Prevents Data Services from adding a duplicate row to the target table.

For all target table option descriptions for SAP IQ target tables, see the Reference Guide.

Configuring bulk loading for SAP Sybase IQ (SAP IQ) [page 157]To configure bulk loading for SAP IQ, make settings in the datastore editor and the table target editor.

SAP IQ log files [page 158]SAP Data Services writes information to the message log and row log files to store information about the bulk load.

Parent topic: Bulk Loading and Reading [page 123]

156 PUBLICPerformance Optimization Guide

Bulk Loading and Reading

Page 157: Performance Optimization Guide

Related Information

Google BigQuery ODBC bulk loading [page 125]Configuring bulk loading for Hive [page 126]Bulk Loading in IBM DB2 Universal Database [page 128]Bulk loading in Informix [page 133]Bulk loading in Microsoft SQL Server [page 134]Bulk loading in Netezza [page 141]Bulk Loading in PostgreSQL [page 145]Bulk loading in Oracle [page 147]Bulk loading in SAP HANA [page 154]Bulk loading in SAP ASE [page 155]Bulk loading and reading in Teradata [page 159]SAP Sybase IQ target table options

11.11.1 Configuring bulk loading for SAP Sybase IQ (SAP IQ)

To configure bulk loading for SAP IQ, make settings in the datastore editor and the table target editor.

Find SAP IQ datastore option descriptions in the Designer Guide. Find SAP IQ table target options in the Reference Guide.

Perform the following steps in SAP Data Services Designer.

1. Create new or edit an existing SAP Sybase IQ datastore.2. In the datastore editor, click Advanced.3. Type the directory path or browse for the path in the Server working directory text box.

Data Services uses this directory for writing commands and for storing data files for bulk loading.4. Depending on the version of SAP Sybase IQ, set options in the Bulk Loader group or the FTP option group.

For details, see Option considerations for SAP Sybase IQ (SAP IQ) bulk loading.5. Save the datastore.6. Open the data flow in your workspace and click the target table object to open the editor.7. On the Bulk Loader Options tab, select Bulk load to enable it.8. To enable the file bulk load method, select Generate files only.9. Optional. Specify whether Data Services cleans up the bulk loading directory by selecting Clean up bulk

loader directory after load.10. Configure the remaining options to complete the target setup.11. Save the data flow.

Task overview: Bulk loading in SAP Sybase IQ (SAP IQ) [page 156]

Performance Optimization GuideBulk Loading and Reading PUBLIC 157

Page 158: Performance Optimization Guide

Related Information

SAP IQ log files [page 158]SAP Sybase IQ target table optionsSybase IQ

11.11.2 SAP IQ log files

SAP Data Services writes information to the message log and row log files to store information about the bulk load.

When you execute an SAP IQ job that you configure for bulk loading, Data Services saves information in log files. Data Services stores the log files in the directory that you specified in the Server working directory option in the datastore editor. If you don't specify a location, Data Services saves the files to <DS_COMMON_DIR>\log\bulkloader by default.

The following table contains the log files and contents.

SAP IQ bulk loading log files

Log file Description

Message log Contains constraint violations specified in the Error handling group in the Bulk Loader Options tab of the target table.

Row log Contains the data from the row that had the constraint viola­tion. Data Services delimits the Row log using the delimiter you set in the Field delimiter option in the Bulk Loader Options tab.

If you select No for Use named pipe in the datastore editor, and you don’t specify a directory for Server working directory, Data Services also saves the data file in this location.

To make sure that Data Services deletes the log files and data file after data loading. Ensure that you select Clean up bulk loader directory after load in the table target editor. If you choose not to clean up the bulk loader directory, or if your job results in errors captured in the logs, the software doesn’t delete the files.

Parent topic: Bulk loading in SAP Sybase IQ (SAP IQ) [page 156]

Related Information

Configuring bulk loading for SAP Sybase IQ (SAP IQ) [page 157]

158 PUBLICPerformance Optimization Guide

Bulk Loading and Reading

Page 159: Performance Optimization Guide

11.12 Bulk loading and reading in Teradata

SAP Data Services supports several Teradata bulk loading utilities.

SAP Data Services supports the following Teradata utilities for bulk loading and reading tools:

● Parallel Transporter (TPT): Uses a combination of loading and reading utilities such as FastLoad, MultiLoad, and TPump.

● FastLoad: Quickly loads large volumes of data to empty Teradata tables.● MultiLoad: Provides parallel loading for high-volume batch files using commands.● TPump (Teradata Parallel Data Pump): Loads data one row at a time using row hash locks.● Load utility: Invokes FastLoad, MultiLoad, or TPump with Data Services. Applicable for Data Services

version 11.5.1 or earlier.● None: Data Services uses ODBC.

NoteIf your Job Server is on a UNIX platform, and you’re taking advantage of bulk loading on Teradata version 13 databases, you must set the required environment variables in the file located at $LINK_DIR/bin/td_env.config. Find instructions inside the file.

For details about Teradata options and their behavior in the Teradata environment, see the relevant Teradata product documentation.

For all bulk loader methods, choose to use either data files or named pipes. Named pipes include either generic named pipes or named pipes access module. Select the bulk loader method in the File option parameter in the Bulk Loader Options tab of the target table editor.

Data file [page 160]With the Data file option, SAP Data Services loads a large volume of data into a staging file, and then passes data to the Teradata server.

Named Pipes [page 161]Teradata uses two types of named pipes: Generic named pipes or Named Pipe Access Module.

When to use each Teradata bulk-loading method [page 164]

Parallel Transporter (TPT) method [page 167]The Teradata Parallel Transporter (TPT) encompasses several stand-alone utilities into one interface.

Teradata standalone utilities [page 172]Select the individual utility to load and extract from the Teradata database.

Teradata UPSERT operation for bulk loading [page 184]Use the UPSERT operation during Teradata bulk loading to either update a matching row, or to insert a new row into the target object.

Parent topic: Bulk Loading and Reading [page 123]

Performance Optimization GuideBulk Loading and Reading PUBLIC 159

Page 160: Performance Optimization Guide

Related Information

Google BigQuery ODBC bulk loading [page 125]Configuring bulk loading for Hive [page 126]Bulk Loading in IBM DB2 Universal Database [page 128]Bulk loading in Informix [page 133]Bulk loading in Microsoft SQL Server [page 134]Bulk loading in Netezza [page 141]Bulk Loading in PostgreSQL [page 145]Bulk loading in Oracle [page 147]Bulk loading in SAP HANA [page 154]Bulk loading in SAP ASE [page 155]Bulk loading in SAP Sybase IQ (SAP IQ) [page 156]Teradata sourceTeradata target table options

11.12.1 Data file

With the Data file option, SAP Data Services loads a large volume of data into a staging file, and then passes data to the Teradata server.

Data Services runs bulk-loading jobs using a data file as follows:

● Generates a staging data file or files that contain data to be loaded into a Teradata table.● Generates a loading script that the Teradata Parallel Transporter (TPT) uses. The script defines read and

load operators.○ The read operator in the Teradata Parallel Transporter reads the staging data file and passes the data

to the load operator.○ The load operator loads data into the Teradata table.

Parent topic: Bulk loading and reading in Teradata [page 159]

Related Information

Named Pipes [page 161]When to use each Teradata bulk-loading method [page 164]Parallel Transporter (TPT) method [page 167]Teradata standalone utilities [page 172]Teradata UPSERT operation for bulk loading [page 184]

160 PUBLICPerformance Optimization Guide

Bulk Loading and Reading

Page 161: Performance Optimization Guide

11.12.2 Named Pipes

Teradata uses two types of named pipes: Generic named pipes or Named Pipe Access Module.

With both types of named pipes:

● The pipe can contain a large volume of data to load into a Teradata table.● Data Services generates the load script.

However, there’s a difference between how the two types create the pipe and execute the loading script.

Generic named pipe [page 161]With the Generic named pipe file option, SAP Data Services creates a pipe that contains the data for loading, and executes the loading script.

Named pipes access module [page 162]With the Named pipes access module file option, the Teradata Parallel Transporter (TPT) creates the pipe and defines read and load operators to load the data.

Increasing time for Job Server connection [page 163]SAP Data Services tries to connect to the named pipes every 30 seconds, which may not be enough time for the Teradata Parallel Transporter to create the named pipes.

Parent topic: Bulk loading and reading in Teradata [page 159]

Related Information

Data file [page 160]When to use each Teradata bulk-loading method [page 164]Parallel Transporter (TPT) method [page 167]Teradata standalone utilities [page 172]Teradata UPSERT operation for bulk loading [page 184]

11.12.2.1 Generic named pipe

With the Generic named pipe file option, SAP Data Services creates a pipe that contains the data for loading, and executes the loading script.

Data Services runs bulk-loading jobs using a generic named pipe as follows:

● Data Services generates a script that the loader, such as Teradata Parallel Transporter (TPT), uses to load the database.

● Data Services creates a pipe to contain the data to load into a Teradata table.○ On UNIX, the pipe is a FIFO (first in, first out) file that has the following name format:

/temp/<filename>.dat

Performance Optimization GuideBulk Loading and Reading PUBLIC 161

Page 162: Performance Optimization Guide

○ On Windows, the file has the following name format:

\\..\pipe\<datastorename_ownername_tablename_loadernum>.dat ● Data Services executes the loading script.

ExampleIf you use TPT, the script starts the TPT and defines read and load operators.

● Data Services writes data to the pipes.● The loader connects to the pipes.● Read operator reads the named pipe and passes the data to the load operator.● Load operator loads the data into the Teradata table.

Parent topic: Named Pipes [page 161]

Related Information

Named pipes access module [page 162]Increasing time for Job Server connection [page 163]

11.12.2.2 Named pipes access module

With the Named pipes access module file option, the Teradata Parallel Transporter (TPT) creates the pipe and defines read and load operators to load the data.

SAP Data Services runs bulk-loading jobs using a named pipe access module as follows:

● Data Services generates the load script.● The TPT defines read and load operators.● The TPT utility creates named pipes to contain the data to load into a Teradata table.

○ On UNIX, the pipe is a FIFO (first in, first out) file that has the following name format:

/temp/<filename>.dat

○ On Windows, the file has the following name format:

\\.\pipe\<datastorename_ownername_tablename_loadernum>.dat

● Data Services connects to the pipes and writes data to them.

NoteWhen Data Services tries to connect to the pipes, Teradata Parallel Transporter may not have created them yet. Data Services tries to connect every second for up to 30 seconds. You can increase the 30-second connection time to up to 100 seconds in the Job Server General settings in Designer.

● The Teradata Parallel Transporter read operator reads the named pipe and passes the data to the load operator.

162 PUBLICPerformance Optimization Guide

Bulk Loading and Reading

Page 163: Performance Optimization Guide

● The load operator loads the data into the Teradata table.

Parent topic: Named Pipes [page 161]

Related Information

Generic named pipe [page 161]Increasing time for Job Server connection [page 163]

11.12.2.3 Increasing time for Job Server connection

SAP Data Services tries to connect to the named pipes every 30 seconds, which may not be enough time for the Teradata Parallel Transporter to create the named pipes.

Perform the following steps in Designer to increase the number of seconds that Data Services tries to connect to the named pipe:

1. Select Tools Options .2. Expand the Job Server node and then select the General node.

The General settings appear at right.3. Enter AL_Engine in the Section text box.

4. Enter NamedPipeWaitTime in the Key text box.

5. Enter a value from 30 through 100 in the Value text box.

Performance Optimization GuideBulk Loading and Reading PUBLIC 163

Page 164: Performance Optimization Guide

6. Click OK to save your settings and to close the Options dialog box.

Task overview: Named Pipes [page 161]

Related Information

Generic named pipe [page 161]Named pipes access module [page 162]

11.12.3 When to use each Teradata bulk-loading method

SAP Data Services supports multiple bulk-loading methods for Teradata on Windows and UNIX. The following table lists the methods and file options that you can select, depending on your requirements.

164 PUBLICPerformance Optimization Guide

Bulk Loading and Reading

Page 165: Performance Optimization Guide

Bulk loader method File Option Advantages Restrictions

Parallel Transporter Data file ● Can use Data Services parallel process­ing.

● Data Services creates the loading script.

● The Teradata Server Tools and Utilities must be Version 7.0 or later.

● If you use TTU 7.0 or 7.1, see the Release Notes.

Generic named pipe ● Provides a fast way to bulk load because:○ As soon as Data Services writes to a

pipe, Teradata can read from the pipe.

○ Can use Data Services parallel proc­essing.

○ On Windows, no I/O to an intermedi­ate data file occurs because a pipe is in memory

● Data Services creates the loading script.

● A job that uses a ge­neric pipe is not restart­able.

● The Teradata Server Tools and Utilities must be Version 7.0 or later.

● If you use TTU 7.0 or 7.1, see the Release Notes.

Named pipe access module

● The job is restartable.● Provides a fast way to bulk load because:

○ As soon as Data Services writes to a pipe, Teradata can read from the pipe.

○ Can use Data Services parallel proc­essing.

○ On Windows, no I/O to an intermedi­ate data file occurs because a pipe is in memory.

● Data Services creates the loading script.

● The Teradata Server Tools and Utilities must be Version 7.0 or later.

● If you use TTU 7.0 or 7.1, see the Release Notes.

Load utility Data file Load utilities are faster than INSERT state­ments through the ODBC driver.

● User must provide the loading script.

● Cannot use Data Services parallel proc­essing

Generic named pipe ● Load utilities are faster than INSERT statements through the ODBC driver.

● Named pipes are faster than data files because:○ As soon as Data Services writes to a

pipe, Teradata can read from the pipe.

○ On Windows, no I/O to an intermedi­ate data file occurs because a pipe is in memory.

● User must provide the loading script.

● Cannot use Data Services parallel proc­essing

● A job that uses a ge­neric pipe is not restart­able.

Performance Optimization GuideBulk Loading and Reading PUBLIC 165

Page 166: Performance Optimization Guide

Bulk loader method File Option Advantages Restrictions

Named pipes access module

● Load utilities are faster than INSERT statements through the ODBC driver.

● Named pipes should be faster than data files because:○ As soon as Data Services writes to a

pipe, Teradata can read from the pipe.

○ On Windows, no I/O to an intermedi­ate data file occurs because a pipe is in memory

● The job is restartable.

● User must provide the loading script.

● Cannot use Data Services parallel proc­essing.

FastLoad, Multi­Load, and TPump

Data file Load utilities are faster than INSERT or UP­SERT statements through the ODBC driver. Data Services creates the loading script.

Cannot use Data Services parallel processing

Generic named pipe ● Load utilities are faster than INSERT or UPSERT statements through the ODBC driver.

● Named pipes are faster than data files because:○ As soon as Data Services writes to a

pipe, Teradata can read from the pipe.

○ On Windows, no I/O to an intermedi­ate data file occurs because a pipe is in memory.

● Data Services creates the loading script.

● Cannot use Data Services parallel proc­essing

● A job that uses a ge­neric pipe is not restart­able.

Named pipes access module

● Load utilities are faster than INSERT or UPSERT statements through the ODBC driver.

● Named pipes should be faster than data files because:○ As soon as Data Services writes to a

pipe, Teradata can read from the pipe.

○ On Windows, no I/O to an intermedi­ate data file occurs because a pipe is in memory.

● The job is restartable.● Data Services creates the loading script.

Cannot use Data Services parallel processing.

None (use ODBC) Uses Teradata ODBC driver to send separate SQL INSERT state­ments to load data.

INSERT statements through the ODBC driver are simpler to use than a data file or pipe.

This method does not bulk-load data.

Parent topic: Bulk loading and reading in Teradata [page 159]

166 PUBLICPerformance Optimization Guide

Bulk Loading and Reading

Page 167: Performance Optimization Guide

Related Information

Data file [page 160]Named Pipes [page 161]Parallel Transporter (TPT) method [page 167]Teradata standalone utilities [page 172]Teradata UPSERT operation for bulk loading [page 184]Designer Guide: Recovery Mechanisms, Automatically recovering jobs

11.12.4 Parallel Transporter (TPT) method

The Teradata Parallel Transporter (TPT) encompasses several stand-alone utilities into one interface.

The TPT is an ETL tool that enables you to use parallel processing in SAP Data Services. Use TPT with Data Services to specify a number of source and target options, including the number of data files or named pipes to use when you process large volumes of data.

Before TPT, Teradata provided several stand-alone methods that each required a script. For example, the FastLoad method required a loading script, and the MultiLoad method required an update script. Teradata still supports the stand-alone utilities, but TPT makes it more convenient when you want to combine utilities for one process.

Teradata source performance tuning [page 168]With the Parallel Transporter (TPT) method, tune reading performance by setting options in the source table editor in SAP Data Services Designer.

Target performance tuning [page 169]SAP Data Services provides the option for parallel processing when you bulk load data using the Parallel Transporter (TPT) method.

Parent topic: Bulk loading and reading in Teradata [page 159]

Related Information

Data file [page 160]Named Pipes [page 161]When to use each Teradata bulk-loading method [page 164]Teradata standalone utilities [page 172]Teradata UPSERT operation for bulk loading [page 184]

Performance Optimization GuideBulk Loading and Reading PUBLIC 167

Page 168: Performance Optimization Guide

11.12.4.1 Teradata source performance tuning

With the Parallel Transporter (TPT) method, tune reading performance by setting options in the source table editor in SAP Data Services Designer.

Teradata source options are in the Teradata options tab in the source table editor. The following table describes source table options that you can adjust to tune reading performance.

Teradata source editor options for reading performance

Option Description

Maximum number of sessions Enables Data Services to read data in parallel when you have large volumes of data. More sessions mean that Data Serv­ices reads more data in parallel. Ideally the number of ses­sions equals the number of Access Module Processors (AMPs).

Number of export operator instances Enables Data Services to use multiple export instances to read data in parallel. Ideally, the number of export operator instances equals the number of CPUs.

NoteAn export operator functions like the Teradata stand­alone utility, FastExport.

Parallel process threads Breaks buffered data into rows and columns, which maxi­mizes CPU usage on the Job Server computer. Ideally this number of parallel process threads equals the number of CPUs.

Parent topic: Parallel Transporter (TPT) method [page 167]

Related Information

Target performance tuning [page 169]Teradata source

11.12.4.1.1 Special considerations for source tuning

When you use the Teradata Parallel Transporter (TPT), be aware of limitations when you tune source loading.

In certain situations, Teradata doesn’t support SQL constructs from the TPT method, therefore, you must use the ODBC method instead. Data Services automatically switches the source mode to ODBC under specific conditions, regardless of the mode that you select. Specific conditions include the following:

● The WHERE clause of a Query contains <primary key>=<value> predicates. In this case, the primary key can be a single column or a composite key.

168 PUBLICPerformance Optimization Guide

Bulk Loading and Reading

Page 169: Performance Optimization Guide

● The input schema contains columns of the LOB (CLOB or BLOB) data type.

Other limitations include the following:

● Data Services pushes down the same database functions as it does for ODBC readers, except for the functions Year and Month.

● Parallel Transporter doesn’t accept parameterized SQL. Therefore, if a Teradata table is the inner loop of a join, Data Services always caches the table.

● Readers generated from Table Comparison transform and Lookup function families do not use TPT.● When Data Services optimizes multiple Teradata readers by collapsing them into one reader, Data Services

uses TPT whenever possible. When not possible, Data Services uses ODBC instead.

Data Services doesn’t automatically change the mode to None (ODBC) when a WHERE clause of a Query contains unique secondary index columns. Data Services doesn’t allow a unique secondary index column in the WHERE clause because it doesn’t know whether the WHERE clause predicate is part of a unique secondary index. Therefore, it pushes the WHERE clause down to the Parallel Transporter reader. If you select TPT mode and the WHERE clause contains a unique secondary index column, Data Services issues an error at runtime. To avoid the runtime error, you must manually set the source mode to None (ODBC).

11.12.4.2 Target performance tuning

SAP Data Services provides the option for parallel processing when you bulk load data using the Parallel Transporter (TPT) method.

Use a combination of settings in the target table editor in the Options and Bulk Loader Options tabs. Specify the number of data files or named pipes as well as the number of read and load operator Instances.

In the target table Options tab, specify a value for the Number of Loaders option to control the number of data files or named pipes that Data Services or TPT generates. The Number of Loaders option distributes the workload while the operators perform parallel processing. Data Services writes data to the loader files in batches of 999 rows.

ExampleIf you set Number of Loaders to 2, Data Services generates two data files and processes the data as follows:

● Writes 999 rows to the first file.● Writes the next 999 rows to the second file.● Data Services continues alternating between file 1 and file 2, loading groups of 999 rows, until there are

no more rows to load.

On the Bulk Loader Options tab, implement parallel processing using the Number of DataConnector instances and Number of instances options to specify the number of instances in the loading scripts.

ExampleIf you set Number of DataConnector instances to 2 and Number of instances to 2, TPT processes as follows:

● Assigns the first DataConnector read operator instance to read one data file.● Assigns the second DataConnector read operator instance to read another data file in parallel to the

first file.

Performance Optimization GuideBulk Loading and Reading PUBLIC 169

Page 170: Performance Optimization Guide

● The DataConnector read operator instances then pass the data to the load operator instance for parallel loading into Teradata.

The TPT uses a control file to read staging files or pipes and load data.

NotePerformance using this type of parallel loading depends on a number of factors such as distribution of incoming data and underlying database capabilities. Under some circumstances, it’s possible that specifying parallel loaders can be detrimental to performance. Always test the parallel loading process before moving to production.

Parent topic: Parallel Transporter (TPT) method [page 167]

Related Information

Teradata source performance tuning [page 168]

11.12.4.2.1 Configuring the bulk loader for parallel processing

Configure SAP Data Services to process bulk loading for Teradata tables in parallel.

Perform the following steps in Data Services Designer:

1. To open the target table editor in your workspace, open the applicable data flow and click the target object.2. Enter a value for the Number of loaders in the Options tab.

The number of loaders controls the number of data files or named pipes. Data Services writes data to these files in batches of 999 rows.

3. Open the Bulk Loader Options tab.4. Choose Parallel Transporter from the Bulk loader dropdown list.5. Choose an option from the File Option dropdown list.

Choose one of the following options to contain the data to bulk load:○ Data file○ Generic named pipe○ Named pipes access module

The following table describes how Data Services processes the target loading based on your selection.

170 PUBLICPerformance Optimization Guide

Bulk Loading and Reading

Page 171: Performance Optimization Guide

File option Additional settings Results

Data file

Generic named pipe

Number of instances

Number of DataConnector instances

Data Services sends the first data file or pipe to the first read operator, the second data file or pipe to the second operator, and the DataConnector reads the files in parallel.

NoteIf you chose Data file, ensure that the value for Number of DataConnector instances (read operators) is less than or equal to the number of data files.

Named pipes access module Number of instances (load operators) in the loading scripts.

Teradata uses the value you specify in Number of loaders to determine the number of read operator instances, as well as the number of named pipes. The DataConnector instances are not applicable when you select Named Pipes Access Module.

ExampleIf you set Number of loaders (read operators) to 2:○ TPT generates two named

pipes.○ TPT assigns one read opera­

tor instance to read from one pipe.

○ TPT assigns another opera­tor instance to read from the other pipe in parallel.

If you set Number of instances (load operators) to 2, the read op­erator instances pass the data to the load operator instances for parallel loading into Teradata.

Performance Optimization GuideBulk Loading and Reading PUBLIC 171

Page 172: Performance Optimization Guide

File option Additional settings Results

Named pipes access module You can override the default settings for the following Teradata Access Module parameters:○ Log directory○ Log level○ Block size○ Fallback file name○ Fallback directory○ Signature checking

The Teradata Access Module creates a log file to record the load status and writes information to fallback data files. If the job fails, the Teradata Ac­cess Module uses the fallback data files to restart the load.

The Access Module log file differs from the build log that you specify in the Log directory option in the Tera­data datastore.

NoteData Services sets the bulk loader directory as the default value for both Log Directory and Fallback Directory.

For more information about these parameters, see the relevant Teradata tools and utilities documentation.

Related Information

Teradata target table options

11.12.5 Teradata standalone utilities

Select the individual utility to load and extract from the Teradata database.

If you don't select TPT (Teradata Parallel Transporter), select from individual utilities. Each load utility is a separate executable designed to move data into a Teradata database. The following table describes the stand-alone utilities.

Teradata stand-alone bulk-load utilities

Utility Description

FastLoad Loads to unpopulated tables only. Provides a high-perform­ance load (inserts only) to one empty table each session. Both the client and server environments support FastLoad.

MultiLoad Loads large quantities of data into populated tables. Sup­ports bulk inserts, updates, upserts, and deletions against populated tables.

172 PUBLICPerformance Optimization Guide

Bulk Loading and Reading

Page 173: Performance Optimization Guide

Utility Description

TPump Loads small data volumes. Uses standard SQL Data Manipu­lation Language (DML) to maintain data in tables. Specifies the percentage of system resources necessary for opera­tions on tables. Allows background maintenance for insert, update, upsert, and delete operations to take place at any time you specify.

Load Utility Invokes MultiLoad, FastLoad, or TPump with Data Services version 11.5.1 or earlier.

Bulk loading to a table using FastLoad [page 173]Configure SAP Data Services to use the FastLoad utility to bulk load data to an empty table.

Bulk loading to a table using MultiLoad [page 175]Configure SAP Data Services to use the MultiLoad utility for parallel loading using commands.

Bulk loading to a table using TPump [page 178]Configure SAP Data Services to bulk load data using TPump, which uses standard SQL Data Manipulation Language (DML) to load data to tables.

Bulk loading a table using Load Utility [page 183]If you use SAP Data Services version 11.5.1 or earlier, use the Load Utility to bulk load a Teradata table.

Parent topic: Bulk loading and reading in Teradata [page 159]

Related Information

Data file [page 160]Named Pipes [page 161]When to use each Teradata bulk-loading method [page 164]Parallel Transporter (TPT) method [page 167]Teradata UPSERT operation for bulk loading [page 184]

11.12.5.1 Bulk loading to a table using FastLoad

Configure SAP Data Services to use the FastLoad utility to bulk load data to an empty table.

Ensure that you configure a Teradata datastore that specifies a value for the Teradata Director Program Identifier in the Tdpld option. Tdpld is a required option that identifies the name of the Teradata database to load.

Perform the following steps in Data Services Designer:

1. To open the target table editor, open the applicable data flow and click the target icon.

Performance Optimization GuideBulk Loading and Reading PUBLIC 173

Page 174: Performance Optimization Guide

2. Open the Bulk Loader Options tab and select FastLoad from the Bulk loader dropdown list.3. Select the type of file from the File option dropdown list.

Select Data file, Generic named pipe, or Named pipes access module.4. Set the FastLoad options as described in the following table.

Fastload options

Option Description

Data encryption Encrypts data and requests in all sessions that the job uses. The default is no data encryption.

Print all requests Prints every request sent to the Teradata database. The default is to print all requests and not to reduce print out­put.

Buffer size Specifies the number of kilobytes for the output buffer. FastLoad uses the buffer for messages to the Teradata da­tabase. The default is 63 KB, which is also the maximum size.

Character set Specifies the character set to use when mapping between characters and byte strings, such as ASCII or UTF-8.

For more information about these parameters, see the Teradata FastLoad Reference in your Teradata documentation.

5. Set values in Attributes to use in the FastLoad script that Data Services generates.The following table describes the available attributes and lists the default settings.

Script attribute Description

AccountId Specifies the number of characters in the account Identi­fier. The account identifier is associated with the user name that you use to log into the Teradata database. Specify a value from 1 through 30 characters.

CheckpointRate Specifies the number of rows the loader sends to the Tera­data database between checkpoint operations. The de­fault is to not use checkpointing.

ErrorLimit Specifies the maximum number of rejected records that Teradata writes to ErrorTable 1 while inserting data to a FastLoad table.

ErrorTable1 Stores records that are rejected for errors other than unique primary index or duplicate row violation.

ErrorTable2 Stores records that violate the unique primary index con­straint.

MaxSessions Specifies the maximum number of FastLoad sessions for the load job.

MinSessions Specifies the minimum number of FastLoad sessions re­quired for the load job to continue.

174 PUBLICPerformance Optimization Guide

Bulk Loading and Reading

Page 175: Performance Optimization Guide

Script attribute Description

TenacityHours Specifies the number of hours that the FastLoad utility tries to log in while it runs the maximum number of load jobs on the Teradata database.

TenacitySleep Specifies the number of minutes that the FastLoad utility waits before it retries a logon operation. The default is 6 minutes.

NoteBy default, Data Services uses the bulk loader directory to store the script, data, error, log, and command (bat) files.

6. Optional. Increase the Number of loaders on the Options tab when you select Data file for File Option.

Increasing the number of loaders increases the number of data files. The software can use parallel processing to write data to multiple data files in batches of 999 rows.

If you select Generic named pipe or Named pipes access module, Data Services supports only one loader and disables the Number of loaders option.

Task overview: Teradata standalone utilities [page 172]

Related Information

Bulk loading to a table using MultiLoad [page 175]Bulk loading to a table using TPump [page 178]Bulk loading a table using Load Utility [page 183]Teradata

11.12.5.2 Bulk loading to a table using MultiLoad

Configure SAP Data Services to use the MultiLoad utility for parallel loading using commands.

Ensure that you configure a Teradata datastore that specifies a value for the Teradata Director Program Identifier in the Tdpld option. Tdpld is a required option that identifies the name of the Teradata database to load.

Perform the following steps in Data Services Designer:

1. To open the target table editor, open the applicable data flow and click the target icon.2. Open the Bulk Loader Options tab and select MultiLoad from the Bulk loader dropdown list.3. Select the type of file from the File option dropdown list.

Select Data file, Generic named pipe, or Named pipes access module.

Performance Optimization GuideBulk Loading and Reading PUBLIC 175

Page 176: Performance Optimization Guide

4. Set the MultiLoad options as described in the following table.

Multiload options

Option Short description

Reduced print output Limits MultiLoad printout to the minimal information re­quired to determine job success. The default is not to re­duce print output.

Data Encryption Encrypts data and requests in all sessions that the job uses. The default is no data encryption.

Character set Specifies the character set to use when mapping between characters and byte strings, such as ASCII or UTF-8.

For more information about these parameters, see the Teradata MultiLoad Reference in your Teradata documentation.

5. Set values in Attributes to use in the MultiLoad script that Data Services generates.

Script attribute Short description

LogTable Specifies the table in which Teradata stores checkpoint in­formation for the MultiLoad job.

AccountId Specifies the number of characters in the account Identi­fier. The account identifier is associated with the user name that you use to log into the Teradata database. Specify a value from 1 through 30 characters.

WorkTable Specifies the table in which Teradata stages input data.

ErrorTable1 Specifies the table in which Teradata stores errors de­tected during the acquisition phase of the MultiLoad im­port task.

ErrorTable2 Specifies the table in which Teradata stores errors de­tected during the application phase of the MultiLoad im­port task.

ErrorLimit Specifies the maximum rejected records that Teradata writes to ErrorTable 1 during the acquisition phase of the MultiLoad import task. If you use with ErrorPercentage, specifies the number of records that MultiLoad must send to the Teradata database before ErrorPercentage takes ef­fect.

ErrorPercentage Specifies the approximate percentage of total records sent currently to the Teradata database, that the acquisi­tion phase might reject. Base the percent on the setting in ErrorLimit. Specify percent as an integer.

CheckpointRate Specifies the interval between checkpoint operations dur­ing the acquisition phase. Express this value as either of the following values:○ The number of rows read from the client system or

sent to the Teradata database.○ An amount of time in minutes.

176 PUBLICPerformance Optimization Guide

Bulk Loading and Reading

Page 177: Performance Optimization Guide

Script attribute Short description

MaxSessions Specifies the maximum number of MultiLoad sessions for the load job.

MinSessions Specifies the minimum number of MultiLoad sessions re­quired for the load job to continue.

TenacityHours Specifies the number of hours that the MultiLoad utility tries to log in while it runs the maximum number of load jobs on the Teradata database.

TenacitySleep Specifies the number of minutes that the MultiLoad utility waits before it retries a logon operation. The default is 6 minutes.

TableWait Specifies the number of hours that MultiLoad tries to start while another job loads one of the target tables.

AmpCheck Specifies how MultiLoad responds when an Access Mod­ule Processor (AMP) is down.

IgnoreDuplicate Prevents Data Services from placing duplicate rows in Er­rorTable 2. The default is to load the duplicate rows.

NoteBy default, Data Services uses the bulk loader directory to store the script, data, error, log, and command (bat) files.

For more information about these parameters, see the Teradata MultiLoad Reference.

6. Optional. Increase the Number of loaders on the Options tab when you select Data file for File Option.

Increasing the number of loaders increases the number of data files. The software can use parallel processing to write data to multiple data files in batches of 999 rows.

If you select Generic named pipe or Named pipes access module, Data Services supports only one loader and disables the Number of loaders option.

Task overview: Teradata standalone utilities [page 172]

Related Information

Bulk loading to a table using FastLoad [page 173]Bulk loading to a table using TPump [page 178]Bulk loading a table using Load Utility [page 183]TeradataTeradata target table options

Performance Optimization GuideBulk Loading and Reading PUBLIC 177

Page 178: Performance Optimization Guide

11.12.5.3 Bulk loading to a table using TPumpConfigure SAP Data Services to bulk load data using TPump, which uses standard SQL Data Manipulation Language (DML) to load data to tables.

Ensure that you configure a Teradata datastore that specifies a value for the Teradata Director Program Identifier in the Tdpld option. Tdpld is a required option that identifies the name of the Teradata database to load.

Perform the following steps in Data Services Designer:

1. To open the target table editor, open the applicable data flow and click the target icon.2. Open the Bulk Loader Options tab and select TPump from the Bulk loader dropdown list.3. Select the type of file from the File option dropdown list.

Select Data file, Generic named pipe, or Named pipes access module.4. Set the TPump options as described in the following table.

Option Short description

Reduced print output Limits TPump printout to the minimal information re­quired to determine job success. The default is not to re­duce print output.

Retain Macros Keeps macros created during the job run. Use these mac­ros as predefined macros for subsequent runs of the same job.

Data Encryption Encrypts data and requests in all sessions used by the job. The default is not to encrypt all sessions.

Number of buffers Specifies the number of request buffers that TPump uses for SQL statements to maintain the Teradata database.

Character set Specifies the character set to use when mapping between characters and byte strings, such as ASCII or UTF-8.

Configuration file Specifies the configuration file for the TPump job.

Periodicity value Controls the rate at which TPump transfers SQL state­ments to the Teradata database. Enter an integer from 1 through 600 that represents the number of periods per minute. The default value is 4 15-second periods per mi­nute.

Print all requests Enables verbose mode, which provides additional statisti­cal data in addition to the regular statistics.

For more information about these parameters, see the Teradata TPump Reference.

5. Specify the Data Services parameters that correspond to Teradata parameters in TPump commands in Attributes.Refer to TPump commands for Data Services parameters [page 179] for descriptions of the commands, and the corresponding Data Services parameter in Attributes.

NoteFor most settings, use the default settings that Data Services generates in the TPump script.

178 PUBLICPerformance Optimization Guide

Bulk Loading and Reading

Page 179: Performance Optimization Guide

NoteBy default, SAP Data Services uses the bulk loader directory to store the script, data, error, log, and command (bat) files.

6. Optional. Increase the Number of loaders on the Options tab when you select Data file for File Option.

Increasing the number of loaders increases the number of data files. The software can use parallel processing to write data to multiple data files in batches of 999 rows.

If you select Generic named pipe or Named pipes access module, Data Services supports only one loader and disables the Number of loaders option.

TPump commands for Data Services parameters [page 179]When you configure TPump bulk loading, you set attributes that correspond with SAP Data Services parameters.

Task overview: Teradata standalone utilities [page 172]

Related Information

Bulk loading to a table using FastLoad [page 173]Bulk loading to a table using MultiLoad [page 175]Bulk loading a table using Load Utility [page 183]Teradata

11.12.5.3.1 TPump commands for Data Services parameters

When you configure TPump bulk loading, you set attributes that correspond with SAP Data Services parameters.

The following table contains descriptions of the TPump command, and lists the corresponding Data Services option.

TPump command Data Services Attribute Description

NAME AccountId Specifies the number of characters in the account Identifier. The account identifier is associated with the user name that you use to log into the Tera­data database. Specify a value from 1 through 30 characters.

Performance Optimization GuideBulk Loading and Reading PUBLIC 179

Page 180: Performance Optimization Guide

TPump command Data Services Attribute Description

BEGIN LOAD Append Appends data to the error table speci­fied for ErrorTable. TPump creates the table when it doesn’t exist. If the struc­ture of the existing error table isn’t compatible with this ErrorTable, Data Services issues an error when TPump tries to update data or insert data to the error table.

BEGIN LOAD CheckpointRate Specifies the interval between check­point operations. Enter an unsigned in­teger from 0 through 60. The default is 15 minutes.

BEGIN LOAD ErrorLimit Specifies the maximum rejected re­cords that TPump can write to the error table while maintaining a table. The de­fault is no limit.

If you use with ErrorPercentage, speci­fies the number of records that TPump must send to the Teradata database be­fore ErrorPercentage takes effect.

ExampleIf you set ErrorLimit to 100 and ErrorPercentage to 5, TPump must send 100 records to the Teradata database before Teradata applies the approximate 5% rejection limit. If Teradata rejects only five records when it sends the 100th record, TPump doesn’t exceed the limit. However, if Teradata rejected six re­cords when TPump sends the 100th record, TPump stops proc­essing because the limit is ex­ceeded.

BEGIN LOAD ErrorPercentage Specifies the approximate percentage of total records sent currently to the Teradata database, that Teradata might reject. Base the percent on the setting in ErrorLimit. Specify percent as an in­teger.

BEGIN LOAD ErrorTable Specifies the name of the table in which TPump stores information about errors and the rejected records.

180 PUBLICPerformance Optimization Guide

Bulk Loading and Reading

Page 181: Performance Optimization Guide

TPump command Data Services Attribute Description

EXECUTE ExecuteMacro Specifies the name of macro to exe­cute. Use predefined macros to save time. With predefined macros, TPump doesn't need to create and drop new macros each time you run a TPump job script.

DML LABEL Ignore duplicate inserts Prevents TPump from placing duplicate rows in the error table.

NAME JobName Identifies the job name. The maximum length is 16 characters.

BEGIN LOAD Latency Specifies the number of seconds that the oldest record resides in the buffer before TPump flushes it to the Teradata database. Enter a value greater than 1 second.

If you don't specify the SerializeOn at­tribute, only the current buffer is stale. If you specify SerializeOn, the number of stale buffers can range from zero through the number of sessions.

Other TPump commands LogTable Specifies the name of the table for writ­ing checkpoint information that is re­quired for the safe and automatic re­start of a TPump job.

The default name has the following for­mat:

<owner>.<table>_LT

BEGIN LOAD MacroDatabase Specifies the name of the database for the macros that TPump uses or builds. The default is to place macros in the same database as the TPump target ta­ble.

BEGIN LOAD MaxSessions Specifies the maximum number of ses­sions for TPump to update the Teradata database. SAP Data Services uses a de­fault of three sessions.

BEGIN LOAD MinSessions Specifies the minimum number of ses­sions for TPump to update the Teradata database.

BEGIN LOAD NoDrop Specifies to not drop the error table, even when it’s empty, at the end of a job.

Use NoDrop with Append to persist the error table, or use NoDrop alone.

Performance Optimization GuideBulk Loading and Reading PUBLIC 181

Page 182: Performance Optimization Guide

TPump command Data Services Attribute Description

BEGIN LOAD NoMonitor Prevents TPump from checking for statement rate changes from, or update status information for, the TPump Mon­itor.

IMPORT INFILE NoStop Prevents TPump from terminating be­cause of an error associated with a vari­able-length record.

BEGIN LOAD Pack Specifies the number of SQL state­ments to pack into a multiple-state­ment request. The default is 20 state­ments per request. The maximum value is 600.

BEGIN LOAD PackMaximum Specifies the maximum number of re­cords to pack. TPump dynamically de­termines the number of records to pack within one request, and doesn’t exceed the maximum. The default maximum value is 600.

BEGIN LOAD Rate Specifies the initial maximum rate at which TPump sends SQL statements to the Teradata database. Value must be a positive integer. If unspecified, Rate is unlimited.

BEGIN LOAD Robust Specifies whether to use robust restart logic. Value can be YES or NO.

● NO: Specifies simple restart logic, which causes TPump to begin where the last checkpoint occurred in the job. TPump reprocesses any­thing that occurred after the checkpoint.

● YES: Specifies robust restart logic, which you use for DML statements that change the results when you repeat the operation. For example, an INSERT statement that allows duplicate rows:

UPDATE foo SET A=A+1...

BEGIN LOAD SerializeOn Specifies a comma-separated list of columns to use as the key for rows. Guarantees that operations on these rows occur serially.

BEGIN LOAD TenacityHours Specifies the number of hours that TPump tries to log in to sessions that are required to perform the TPump job. The default is 4 hours.

182 PUBLICPerformance Optimization Guide

Bulk Loading and Reading

Page 183: Performance Optimization Guide

TPump command Data Services Attribute Description

BEGIN LOAD TenacitySleep Specifies the number of minutes that TPump waits before it retries a logon operation. The default is 6 minutes.

For more information about these parameters, see the Teradata Parallel Data Pump Reference.

Parent topic: Bulk loading to a table using TPump [page 178]

11.12.5.4 Bulk loading a table using Load Utility

If you use SAP Data Services version 11.5.1 or earlier, use the Load Utility to bulk load a Teradata table.

Perform the following steps in SAP Data Services Designer:

1. To open the target table editor, open the applicable data flow and click the target icon.2. Open the Bulk Loader Options tab and select MultiLoad from the Bulk loader dropdown list.3. Select the file type from the File Option dropdown list.

Options include Data File, Generic named pipe, or Named pipes access module. This file holds the data to bulk load.

4. Enter a command in the Command line text box. Data Services invokes this command.

Example

fastload<C:\tera_script\float.ctl

5. Enter the path, or browse to the directory for the data file if you selected Data file for File Option.6. Enter the pipe name if you selected Generic named pipe or Named pipes access module for File Option.

Task overview: Teradata standalone utilities [page 172]

Related Information

Bulk loading to a table using FastLoad [page 173]Bulk loading to a table using MultiLoad [page 175]Bulk loading to a table using TPump [page 178]

Performance Optimization GuideBulk Loading and Reading PUBLIC 183

Page 184: Performance Optimization Guide

11.12.6 Teradata UPSERT operation for bulk loading

Use the UPSERT operation during Teradata bulk loading to either update a matching row, or to insert a new row into the target object.

The UPSERT operation for Teradata uses bulk loading features to update the target Teradata object that already contains data, such as an existing table. When you use an already-populated table as the target, the UPSERT operation updates the table with new information. SAP Data Services first looks for a row in the target that matches the row generated from the data flow and then performs the following processes:

● If Data Services finds a row that matches the output-generated row from the data flow, it updates the matching row with changed data only. The existing row contains all of the matching data, with some updated data.

● If Data Services doesn’t find a matching row, it inserts the new row to the target table and adds the new data.

The Teradata UPSERT operation is available only with the following Bulk loader methods:

● MultiLoad● TPump● Parallel Transporter

The UPSERT operation is available only with the following Operators:

● Stream● Update

Bulk loading a table using UPSERT [page 185]Configure SAP Data Services to use UPSERT when bulk loading to a Teradata table.

Parent topic: Bulk loading and reading in Teradata [page 159]

Related Information

Data file [page 160]Named Pipes [page 161]When to use each Teradata bulk-loading method [page 164]Parallel Transporter (TPT) method [page 167]Teradata standalone utilities [page 172]Teradata target table options

184 PUBLICPerformance Optimization Guide

Bulk Loading and Reading

Page 185: Performance Optimization Guide

11.12.6.1 Bulk loading a table using UPSERT

Configure SAP Data Services to use UPSERT when bulk loading to a Teradata table.

Perform the following steps in Data Services Designer:

1. Open the applicable data flow and click the target icon.

The target editor opens.2. Open the Bulk Loader Options tab and select Upsert from the Bulk operation option.3. Open the Options tab and select Use input keys in the Update Control group.

The Use input keys option enables Data Services to use the primary key from the input file when the target file does not contain a primary key. Data Services uses the primary key to help identify matching rows.

4. Complete all other applicable target table options and close the target table editor.5. Right-click the target icon and select Properties.

The Properties dialog box opens.6. Open the Attributes tab.

View the following additional attributes:○ Ignore missing updates: Specifies to write the inserted rows into the error table. Yes is the default

setting.○ Ignore duplicate updates: Specifies to write the existing rows that are updated into the error table. No is

the default setting.

Task overview: Teradata UPSERT operation for bulk loading [page 184]

Performance Optimization GuideBulk Loading and Reading PUBLIC 185

Page 186: Performance Optimization Guide

12 Performance options for tuning system performance

To enhance system performance, make small adjustments to source objects, target objects, and job design, test, and re-adjust.

Small adjustments make it easier to pinpoint where to make an adjustment or when to re-adjust a setting. Tune your jobs in your test environment. When you use the tuning techniques suggested in the following table, use the following process for tuning:

● Make small adjustments.● Execute the job and monitor the change in performance after each adjustment.● Make additional adjustments based on monitoring results as necessary.

Job areas for tuning system performance

Tune Techniques

Source-based performance To improve system performance in your source objects, con­sider the following tuning techniques:

● Set join ranking so the largest tables have a higher rank and are joined first.

● Minimize extracted data by retrieving only the data that has changed.

● Use array fetch size to retrieve data using fewer re­quests.

Target-based performance To improve target performance in your target objects, con­sider the following tuning techniques:

● Choose regular loading or bulk loading based on what your database system supports.

● Set Rows per commit to control the number of rows Data Services processes before it saves data.

Job design performance To improve performance based on job design, consider the following tuning techniques:

● Use changed data capture and load only changed data to output.

● Minimizing data type conversion to save time.● Minimizing locale conversion to save time.● Use precision in operations so that you use the correct

level of precision for best performance.

Source-based performance tuning [page 187]Make performance adjustments using settings related to the source in your jobs.

Target-based performance options [page 195]

186 PUBLICPerformance Optimization Guide

Performance options for tuning system performance

Page 187: Performance Optimization Guide

To tune target loading performance, use SAP Data Services features such as bulk loading and rows per commit.

Job design performance options [page 198]Enhance performance by making specific design decisions when you configure your jobs in SAP Data Services Designer.

12.1 Source-based performance tuning

Make performance adjustments using settings related to the source in your jobs.

To tune performance, adjust settings in the following three areas related to data sources:

● When configuring joins, use join ranking.● Minimize extracted data using change data capture (CDC) when available.● Retrieve source data with fewer requests by using array fetch size.

Join rank settings [page 187]Enhance performance by assigning a join rank to each join in your setup.

Minimize extracted data with CDC [page 193]To speed up job processing in SAP Data Services, minimize the amount of data extracted from the source systems by using changed data capture (CDC).

Array fetch size [page 194]To reduce network traffic and retrieve data using fewer requests, use the array fetch size feature in SAP Data Services.

Parent topic: Performance options for tuning system performance [page 186]

Related Information

Target-based performance options [page 195]Job design performance options [page 198]

12.1.1 Join rank settings

Enhance performance by assigning a join rank to each join in your setup.

When you rank each join, SAP Data Services considers the rank relative to other tables and files joined in the data flow. The optimizer, which is the optimization application inside the Data Services engine, joins sources with higher rank values before joining sources with lower rank values.

The order of execution depends on join rank and, for left outer joins, the order defined in the FROM clause. Setting the join rank for each join pair doesn’t affect the result, but it can enhance performance by changing the order in which the optimizer performs the joins.

Performance Optimization GuidePerformance options for tuning system performance PUBLIC 187

Page 188: Performance Optimization Guide

Set up joins in the Query transform. In a data flow that contains adjacent Query transforms, the ranking determination can be complex. The optimizer bases the way it joins your data in the following ways:

● The optimizer can combine the joins from consecutive Query transforms into a single Query transform, reassigning join ranks.

● The optimizer can consider the upstream join rank settings when it makes joins.

ExampleIn a data flow with multiple Query transforms with joins, we present four scenarios to demonstrate how the Data Services optimizer determines join order under different circumstances. The scenarios are based on the following data flow example:

● Query_1 contains an inner join between T1 and T2.● Query_2 contains an inner join between the result of Query_1 and T3.

Scenario 1: All joins have join rank values [page 189]SAP Data Services determines the join ranks when all sources have join rank values.

Scenario 2: Query_2 join ranks not defined [page 190]SAP Data Services determines the join ranks when the sources in Query_2 aren’t defined.

Scenario 3: T1 and T2 join ranks not defined [page 191]SAP Data Services determines the join ranks when there are no rank values set for the source tables T1 and T2.

Scenario 4: No joins have join rank values [page 192]SAP Data Services determines join ranks when there are no join rank values for any sources.

Parent topic: Source-based performance tuning [page 187]

Related Information

Minimize extracted data with CDC [page 193]Array fetch size [page 194]Query transform joins

188 PUBLICPerformance Optimization Guide

Performance options for tuning system performance

Page 189: Performance Optimization Guide

12.1.1.1 Scenario 1: All joins have join rank values

SAP Data Services determines the join ranks when all sources have join rank values.

Use the example in Join rank settings [page 187] for the following scenario.

The following table shows the join rank values for the joins in Query_1 and Query_2 as set in the data flow.

Join ranks set in data flow

Query editor Table Join rank

Query_1 T1 30

T2 40

Query_2 Query_1 result set 10

T3 20

When the optimizer, which is the optimization application inside the Data Services engine, combines the joins in Query_2, it internally determines new join ranking based on the values in the original joins. The following table contains the join rank values determined by the optimizer for the combined joins in Query_2.

Joins combined in Query_2

Query editor Table Join rank

Query_2 T1 30

T2 40

T3 41

Internally, the optimizer adjusts the join rank value for T3 from 20 to 41 because, in the data flow, Query_2 has a higher join rank value assigned to T3 than to “Query_1 result set.”

Parent topic: Join rank settings [page 187]

Performance Optimization GuidePerformance options for tuning system performance PUBLIC 189

Page 190: Performance Optimization Guide

Related Information

Scenario 2: Query_2 join ranks not defined [page 190]Scenario 3: T1 and T2 join ranks not defined [page 191]Scenario 4: No joins have join rank values [page 192]

12.1.1.2 Scenario 2: Query_2 join ranks not defined

SAP Data Services determines the join ranks when the sources in Query_2 aren’t defined.

Use the example in Join rank settings [page 187] for the following scenario.

In this scenario, there are no settings for join ranks in Query_2. When you don’t specify a join rank, Data Services uses the default of zero (0). Therefore, in Query_2, Data Services uses the join rank values of zero (0).

Join ranks set in data flow

Query editor Table Join rank

Query_1 T1 30

T2 40

Query_2 Query_1 result set not set (default = 0)

T3 not set (default = 0)

Internally, the optimizer, which is the optimization application inside the Data Services engine, assigns an internal join ranking in the combined joins in Query_2 as shown in the following table.

Joins combined in Query_2

Query editor Table Join rank

Query_2 T1 30

T2 40

T3 40

190 PUBLICPerformance Optimization Guide

Performance options for tuning system performance

Page 191: Performance Optimization Guide

You may be surprised to see a join rank value of 40 for T3. The optimizer considered that, even though “Query_1 result set” had a zero (0) join rank in the data flow, the result set consisted of sources that do have join ranks. The optimizer used the higher join rank from T1 and T2.

Parent topic: Join rank settings [page 187]

Related Information

Scenario 1: All joins have join rank values [page 189]Scenario 3: T1 and T2 join ranks not defined [page 191]Scenario 4: No joins have join rank values [page 192]

12.1.1.3 Scenario 3: T1 and T2 join ranks not defined

SAP Data Services determines the join ranks when there are no rank values set for the source tables T1 and T2.

Use the example in Join rank settings [page 187] with the following scenario.

In this scenario, there are no join ranks set for T1 and T2 source tables in Query_1. When there are no set join ranks, then the optimizer, which is the optimization application inside the Data Services engine, applies the default join rank of zero (0). The following table shows the Join rank values in the data flow, before the optimizer combines the joins into Query_2.

Join ranks in data flow

Query editor Table Join rank

Query_1 T1 not set (default=0)

T2 not set (default=0)

Query_2 Query_1 result set 10

T3 20

Performance Optimization GuidePerformance options for tuning system performance PUBLIC 191

Page 192: Performance Optimization Guide

Internally, the optimizer assigns a join rank of 10 for T1 and T2 because, in the data flow, the combined T1 and T2 tables, named “Query_1 result set,” has a join rank of 10.

Joins combined in Query_2

Query editor Table Join rank

Query_2 T1 10

T2 10

T3 20

Parent topic: Join rank settings [page 187]

Related Information

Scenario 1: All joins have join rank values [page 189]Scenario 2: Query_2 join ranks not defined [page 190]Scenario 4: No joins have join rank values [page 192]

12.1.1.4 Scenario 4: No joins have join rank values

SAP Data Services determines join ranks when there are no join rank values for any sources.

Use the example in Join rank settings [page 187] with the following scenario.

When you do not set join rank values in the data flow, the optimizer, which is the optimization application inside the engine, cannot optimize the joins. The optimizer uses the default setting of zero (0) for all tables in the joins.

Parent topic: Join rank settings [page 187]

192 PUBLICPerformance Optimization Guide

Performance options for tuning system performance

Page 193: Performance Optimization Guide

Related Information

Scenario 1: All joins have join rank values [page 189]Scenario 2: Query_2 join ranks not defined [page 190]Scenario 3: T1 and T2 join ranks not defined [page 191]

12.1.2 Minimize extracted data with CDC

To speed up job processing in SAP Data Services, minimize the amount of data extracted from the source systems by using changed data capture (CDC).

Minimize extracted data by retrieving only the data that has changed since the last time you performed the extraction. This technique is called changed-data capture (CDC). Use CDC when both your database and Data Services supports it.

Use CDC after you perform an initial data load. With CDC enabled, Data Services behaves as follows:

● Searches existing target table for rows that match the source table.● For matching rows:

○ Extracts only new or modified data since the last refresh cycle from the source○ Updates matching target rows by replacing old data with modified data from the source○ Updates matching target rows with new data from the source

● For source rows that don’t have a match in the target table:○ Inserts a new row in the target table.○ Adds the new data from the source table to the added row in the target table

CDC enhances performance because Data Services has less data to extract, transform, and load.

For complete information about CDC, see the Designer Guide.

Parent topic: Source-based performance tuning [page 187]

Related Information

Join rank settings [page 187]Array fetch size [page 194]Changed Data capture

Performance Optimization GuidePerformance options for tuning system performance PUBLIC 193

Page 194: Performance Optimization Guide

12.1.3 Array fetch size

To reduce network traffic and retrieve data using fewer requests, use the array fetch size feature in SAP Data Services.

With the array fetch size feature, request and obtain data from multiple rows and reduce the amount of CPU usage on the Job Server computer.

The array fetch feature lowers the number of database requests by “fetching” multiple rows (an array) of data with each request. Enter the number of rows to fetch per request in the Array fetch size option in any source editor or SQL transform editor in the data flow. The default setting is 1000 bytes, which means that Data Services fetches 1000 rows of data from the source database with each database request. The maximum array fetch size is 5000 bytes.

Set the Array fetch size option based on your network speed. To set the optimal value for array fetch size, consider the following factors:

● The size of the source table rows, including the number and type of columns.● Network round-trip time for database requests and responses.

If you have a powerful computing environment, meaning that the computers running the Job Server, related databases, and connections are fast, then try setting a higher value for Array fetch size. Test the performance of your jobs to find the best setting.

NoteHigher array fetch settings consume more processing memory proportionally to the following factors:

● Length of the data in each row● Number of rows in each fetch

RestrictionFor Oracle LONG data types: Regardless of the array fetch setting, sources that read columns with an Oracle LONG data type can’t take advantage of array fetch size. If a selected data column is of type LONG, the array fetch size internally defaults to 1 row per request.

Setting array fetch size [page 195]Set the array fetch size in either the target table editor or a SQL transform for data flows that access sources over the network.

Parent topic: Source-based performance tuning [page 187]

Related Information

Join rank settings [page 187]Minimize extracted data with CDC [page 193]

194 PUBLICPerformance Optimization Guide

Performance options for tuning system performance

Page 195: Performance Optimization Guide

12.1.3.1 Setting array fetch size

Set the array fetch size in either the target table editor or a SQL transform for data flows that access sources over the network.

To use the SQL transform to set the array fetch size, ensure that your data flow contains the SQL transform.

Perform the following steps in SAP Data Services Designer with the applicable data flow opened in your workspace:

1. To set the array size in the source table editor, open the source table editor by clicking the source icon in the data flow.a. Open the Source tab.b. Enter a value in the Array fetch size text box in the Performance section.

2. To set the array size in a SQL transform, open the transform editor by clicking the icon in the data flow:a. Enter a value in the Array fetch size text box.

Task overview: Array fetch size [page 194]

12.2 Target-based performance options

To tune target loading performance, use SAP Data Services features such as bulk loading and rows per commit.

Several databases support loading data to the target using bulk loading instead of regular loading with SQL statements. In addition, parallel loading, partitioned tables, and rows per commit can improve job performance when Data Services loads data to the target.

Loading method [page 196]Combine the loading method with other data flow settings to optimize job performance.

Rows per commit [page 197]Rows per commit regulates the number of rows that SAP Data Services processes before it commits the rows to the target.

Parent topic: Performance options for tuning system performance [page 186]

Related Information

Source-based performance tuning [page 187]Job design performance options [page 198]

Performance Optimization GuidePerformance options for tuning system performance PUBLIC 195

Page 196: Performance Optimization Guide

12.2.1 Loading method

Combine the loading method with other data flow settings to optimize job performance.

Select to use either regular load or bulk load in your data flow. When your database doesn't support bulk load, you have to use regular load.

Regular load

For regular load, SAP Data Services uses parameterized SQL to load data to the target database.

When the job generates, parses, and compiles a SQL statement, the optimizer, which is the optimization application inside the Data Services engine, automatically enables the Parameterized SQL option. Parameterized SQL minimizes the SQL statement processes by using one handle for a set of values instead of one handle per value.

To improve performance for a regular load with parameterized SQL, use parallel loading with one of the following options from the target table editor:

● Enable Partitioning: The number of partitions in the target table determines the number of parallel loads.● Number of Loaders: The number of loaders determines the number of parallel loads to the target table.

NoteYou can use only one of these options per target table.

Bulk load

When your database supports bulk loading, it also supports the following options that enhance the bulk-loading processing:

● Auto-correct load● Enable Partitioning● Number of Loaders● Full push down to a database● Overflow file● Transactional loading

The optimizer automatically selects Full push down to a database when your job meets the following conditions:

● The source and target in a data flow are on the same database.● The database supports the operations in the data flow.

If the optimizer pushes down target operations, then it ignores the performance options set for sources because it isn’t solely processing the data flow. For example, the optimizer ignores your settings for Array fetch size, Caching, and Join rank.

196 PUBLICPerformance Optimization Guide

Performance options for tuning system performance

Page 197: Performance Optimization Guide

Parent topic: Target-based performance options [page 195]

Related Information

Rows per commit [page 197]Maximize push-down operations [page 33]Table partitioning [page 85]Bulk Loading and Reading [page 123]Types of target tables

12.2.2 Rows per commit

Rows per commit regulates the number of rows that SAP Data Services processes before it commits the rows to the target.

When you use regular loading, Data Services automatically sets the Rows per commit to 1000 rows. Set the value based on the size of your source table. The more rows that Data Services processes before a commit, the faster the processing. Adjust the rows per commit value in the Options tab of the target table editor. Ensure that you adhere to the following rules:

● Set the value using non-negative integer, and don’t include any non-numeric characters such as a plus sign.

● Don't set the value to zero or blank. If you set the value to zero or blank, Data Services changes your setting to 1000.

● The maximum setting is 5000. If you enter a number greater than the maximum, Data Services changes your setting to 5000.

We recommend that you set the Rows per commit option to 500 through 2000 for best performance. However, use the following calculation to calculate a value based on your data:

max_IO_size/row size (in bytes)

For most platforms, max_IO_size is 64 K. For Solaris, the max_IO_size is 1024 K.

Data Services ignores a setting in Rows per commit under the following situations:

● You load into a database with a column that contains a LONG datatype attribute (not applicable for Oracle).● You use an overflow file and the transaction fails.

NoteHowever, once Data Services loads all the rows successfully, the commit size reverts to the number you entered. In this case, depending on how often a load error happens, performance could become worse than when you leave the default setting for Rows per commit of 1000.

Parent topic: Target-based performance options [page 195]

Performance Optimization GuidePerformance options for tuning system performance PUBLIC 197

Page 198: Performance Optimization Guide

Related Information

Loading method [page 196]Cache data sources [page 57]

12.3 Job design performance options

Enhance performance by making specific design decisions when you configure your jobs in SAP Data Services Designer.

Minimizing data type conversion [page 198]SAP Data Services converts data types so that it processes data the same regardless of the source.

Minimizing locale conversion [page 200]For enhanced performance, ensure that you configure your environment and jobs so that locales and single and muli-byte code pages are compatible.

Precision in operations [page 200]For data types with precision settings, decreasing precision enhances SAP Data Services performance for arithmetic and comparison operations.

Parent topic: Performance options for tuning system performance [page 186]

Related Information

Source-based performance tuning [page 187]Target-based performance options [page 195]

12.3.1 Minimizing data type conversion

SAP Data Services converts data types so that it processes data the same regardless of the source.

You can enhance performance by knowing when Data Services converts incoming data types, and how to reduce the occurrence of data type conversions.

Control data type conversion through column mapping in the schema of various data flow objects. When you design your jobs, consider the following ways to enhance Data Services data type conversion processes:

● Avoid unnecessary data conversions.● Verify that Data Services is performing the “implicit” conversions.

198 PUBLICPerformance Optimization Guide

Performance options for tuning system performance

Page 199: Performance Optimization Guide

NoteSelect implicit conversion when you drag and drop columns from input to output schemas in data flow objects. Also, read job validation warnings to ensure you've taken advantage of implicit conversions when possible.

The following table summarizes the areas where Data Services slows performance because of data types.

Data type conversions

Area Description

Dates Performs arithmetic on date-type data using implicit data type conversions.

Unsupported data types Reads, loads, and calls stored procedures with unknown data types, and doesn’t import metadata for columns of un­supported data types.

Conversion to or from internal data types Converts unsupported data types to internal data types on reading, and converts back to original data type on loading.

Conversion within expressions Pushes down expressions to the underlying database man­ager but must convert unknown data types before push­down.

Conversion of number data types in expressions Uses a type-promotion algorithm to evaluate expressions that contain more than one number data type.

Conversion between explicit data types Uses functions to convert data types between explicit data types.

Conversion between native data types Gets and sets field data in a format other than the declared data type except under certain circumstances.

For descriptions of data type conversions for all of the areas, see the Designer Guide.

Parent topic: Job design performance options [page 198]

Related Information

Minimizing locale conversion [page 200]Precision in operations [page 200]Data type conversion

Performance Optimization GuidePerformance options for tuning system performance PUBLIC 199

Page 200: Performance Optimization Guide

12.3.2 Minimizing locale conversion

For enhanced performance, ensure that you configure your environment and jobs so that locales and single and muli-byte code pages are compatible.

SAP Data Services supports the use of different locales in the Job Server, sources, and targets. It also supports single and multibyte code pages. Therefore, ensure that you set your environment and data flows with compatible locales and code pages.

Read about how Data Services supports locales and multibyte code pages to avoid processing errors in the Reference Guide.

Parent topic: Job design performance options [page 198]

Related Information

Minimizing data type conversion [page 198]Precision in operations [page 200]Locales and multi-byte functionality

12.3.3 Precision in operations

For data types with precision settings, decreasing precision enhances SAP Data Services performance for arithmetic and comparison operations.

Basically, Data Services supports precision ranges of 0-28, 29-38, 39-67, 68-96. You can also enhance performance by ensuring the precision between values in an arithmetic operation are the same. When processing an arithmetic or boolean operation that includes decimals in different precision ranges, Data Services converts all values to the highest precision range value because it can’t process more than one decimal data type precision range for a single operation.

ExampleData Services performs an arithmetic operation for two decimals: One with a precision of 28 and the other with a precision of 38. Data Services converts the decimal with precision 28 to precision 38 and then performs the operation.

Parent topic: Job design performance options [page 198]

Related Information

Minimizing data type conversion [page 198]

200 PUBLICPerformance Optimization Guide

Performance options for tuning system performance

Page 201: Performance Optimization Guide

Minimizing locale conversion [page 200]

Performance Optimization GuidePerformance options for tuning system performance PUBLIC 201

Page 202: Performance Optimization Guide

Important Disclaimers and Legal Information

HyperlinksSome links are classified by an icon and/or a mouseover text. These links provide additional information.About the icons:

● Links with the icon : You are entering a Web site that is not hosted by SAP. By using such links, you agree (unless expressly stated otherwise in your agreements with SAP) to this:

● The content of the linked-to site is not SAP documentation. You may not infer any product claims against SAP based on this information.● SAP does not agree or disagree with the content on the linked-to site, nor does SAP warrant the availability and correctness. SAP shall not be liable for any

damages caused by the use of such content unless damages have been caused by SAP's gross negligence or willful misconduct.

● Links with the icon : You are leaving the documentation for that particular SAP product or service and are entering a SAP-hosted Web site. By using such links, you agree that (unless expressly stated otherwise in your agreements with SAP) you may not infer any product claims against SAP based on this information.

Videos Hosted on External PlatformsSome videos may point to third-party video hosting platforms. SAP cannot guarantee the future availability of videos stored on these platforms. Furthermore, any advertisements or other content hosted on these platforms (for example, suggested videos or by navigating to other videos hosted on the same site), are not within the control or responsibility of SAP.

Beta and Other Experimental FeaturesExperimental features are not part of the officially delivered scope that SAP guarantees for future releases. This means that experimental features may be changed by SAP at any time for any reason without notice. Experimental features are not for productive use. You may not demonstrate, test, examine, evaluate or otherwise use the experimental features in a live operating environment or with data that has not been sufficiently backed up.The purpose of experimental features is to get feedback early on, allowing customers and partners to influence the future product accordingly. By providing your feedback (e.g. in the SAP Community), you accept that intellectual property rights of the contributions or derivative works shall remain the exclusive property of SAP.

Example CodeAny software coding and/or code snippets are examples. They are not for productive use. The example code is only intended to better explain and visualize the syntax and phrasing rules. SAP does not warrant the correctness and completeness of the example code. SAP shall not be liable for errors or damages caused by the use of example code unless damages have been caused by SAP's gross negligence or willful misconduct.

Gender-Related LanguageWe try not to use gender-specific word forms and formulations. As appropriate for context and readability, SAP may use masculine word forms to refer to all genders.

202 PUBLICPerformance Optimization Guide

Important Disclaimers and Legal Information

Page 203: Performance Optimization Guide

Performance Optimization GuideImportant Disclaimers and Legal Information PUBLIC 203

Page 204: Performance Optimization Guide

www.sap.com/contactsap

© 2021 SAP SE or an SAP affiliate company. All rights reserved.

No part of this publication may be reproduced or transmitted in any form or for any purpose without the express permission of SAP SE or an SAP affiliate company. The information contained herein may be changed without prior notice.

Some software products marketed by SAP SE and its distributors contain proprietary software components of other software vendors. National product specifications may vary.

These materials are provided by SAP SE or an SAP affiliate company for informational purposes only, without representation or warranty of any kind, and SAP or its affiliated companies shall not be liable for errors or omissions with respect to the materials. The only warranties for SAP or SAP affiliate company products and services are those that are set forth in the express warranty statements accompanying such products and services, if any. Nothing herein should be construed as constituting an additional warranty.

SAP and other SAP products and services mentioned herein as well as their respective logos are trademarks or registered trademarks of SAP SE (or an SAP affiliate company) in Germany and other countries. All other product and service names mentioned are the trademarks of their respective companies.

Please see https://www.sap.com/about/legal/trademark.html for additional trademark information and notices.

THE BEST RUN