performance optimization with idaa

31
Roberto Gioi Manager Capacity & Optimization Performance Optimization with IDAA XIX EPV UG October 11 - 14 2021

Upload: others

Post on 08-Apr-2022

3 views

Category:

Documents


0 download

TRANSCRIPT

Roberto Gioi

Manager Capacity & Optimization

Performance Optimization

with IDAAXIX EPV UG October 11 - 14 2021

2

IDAA in a nutshell

A high-performance appliance that integrates PureData based on

IIAS technology (Ibm Integrated Analytics System) with

zEnterprise technology, with the aim of providing extremely

high performance to the analytical world

Main features:

o DB2 is the data owner (for OLTP, Batch and DW) with all the

benefits of security, consistency and integrity

o Virtual component of DB2, transparent to the application

o DB2 Quality of Service extended to the analytical workload

o Mixed workload support: the DB2 optimizer decides whether

or not to speed up queries

o Access to the accelerator is exclusively through DB2

o No specific skills required, complete transparency for the end

user

Transactional

Workload

Analytics

Workload

3

Acce

lera

tor D

RD

A R

eq

ue

sto

r

ApplicationInterface

Heartbeat (availability and performance indicators)

Application

Optimizer

Query execution run-time for queries

that cannot be or should not be

routed to Accelerator

Heartbeat

Queries executed

with Accelerator

Queries executed

without Accelerator

Query execution flow

node0101Partition 0

node0102 node0103

Partition 2

Partition 3

Partition 4

Partition 5

Partition 6

Partition 7

Partition 8

Partition 9

Partition 10

Partition 1

4

MPS Configuration: Active – Active on M4002-003 appliance and v.7.5.5.1

2021usergroupP

Pri

ma

ry S

ite

DR

Sit

e

5

IBM Query Monitor for query discovering and EPV for DB2 for statistical data analysis

Evaluation of IDAA ‘elegibility’ through Data Studio and cost/benefit analysis (data

snapshot + execution).

Eligible Batch load detection (no CDC)

1. Query with high OPEN cost> 500 seconds

2. Query with high Elapsed time > 500 Seconds

3. Query with similar SQL of the same application

Frequent batch types: downloading big set of data from table for internal application

services processing or production of files for different uses such as DWH

Eligible OLTP load detection (with CDC)

1. Query with high CPU cost> 2 seconds

2. Query with high Elapsed time > 20 seconds

OLTP query types: low frequency – high impact queries (query parallelism in IDAA below

100 thread per Appliance)

Performance optimization process

6

Query selection process - DDF

Search for DDF query using

dynamic SQL statements

Using IBM DB2 Query Monitor

and looking at high

CPU/elapsed figures

In this case:

C03020507A call accounts

for 25% of CPU time

7

IDAA eligibility check:

1. Detect all table within the

scope of the query

2. Define the tables on IDAA

using DataStudio

3. Perform refresh on IDAA

4. Use DB2 Explain to verify

the IDAA eligibility

5. Estimate the benefit of the

optimization

Query selection process - DDF

8

Query selection process – DDF: ADD in DataStudio

9

Query selection process – DDF: LOAD e Replication

10

Explain, set

“current

query

acceleration

ALL”

Query selection process – DDF: Explain

11

Query selection process – DDF: findings

12

Search for query using static SQL

statements in CICS transactions

Using IBM DB2 Query Monitor and

looking at packages showing SQL

statements with high CPU or

Elapsed time

In this case:

Package B2C5005 shows high Elapsed time

Query selection process – CICS Transaction

13

Query selection process – CICS Transaction: ADD in DataStudio

14

Query selection process – CICS Transaction: Refresh in DataStudio

15

Query selection process – CICS Transaction: Rebind

In REBIND step, set

QUERYACCELERATION

Parameter to ‘EL’

16

Snapshot from

Production:

program B2C5005

Run in Production:

statement simulation using

a Java program.

In this case: Elapsed time

cut to 3 seconds

Query selection process – CICS Transaction: simulation in Production

17

Query selection process: example of a mail for AM team engagement

Job LKY95111 has been selected eligible for IDAA with only minimal modification interventions

Current average consumption 8.000 GP CPU seconds, 35.000 zIIP CPU second. These figures show a growing trend due to the type and goals of the job

Average elapsed time: more than 7 hours

What is and why using IDAA?

IDAA is an IBM appliance connected to DB2 on z/OS, used to accelerate some types of complex queries (e.g.: analytical SQL)

How to use it:

1. On IDAA, define the tables in scope of the IDAA-eligible queries

2. On IDAA, enable and populate the tables just defined

3. On z/OS, run the query

• When DB2 notices that the query can be accelerated it executes it on IDAA (data at the timestamp of refresh - see point 1 below)

4. On IDAA, disable the tables

Cross-check run in Production showed Elapsed time under 60 seconds

Listed hereunder the JCLs to carry out a cost-benefit analysis and deployment.

1. REFRESH on IDAA:

SYS5.DM.CNTL(IDAALKEN)

2. RUN JCL LKY95111

SYS5.DM.CNTL(IDAALKUN)

3. DISABLE on IDAA

SYS5.DM.CNTL(IDAALKDI)

18

Query selection process: standard run on DB2

1 J E S 2 J O B L O G -- S Y S T E M C I C S -- N O D E P R O D

0

23.54.49 JOB20085 ---- FRIDAY, 24 MAY 2019 ----

23.54.49 JOB20085 IRR010I USERID OPCACID IS ASSIGNED TO THIS JOB.

23.54.49 JOB20085 ICH70001I OPCACID LAST ACCESS AT 23:54:49 ON FRIDAY, MAY 24, 2019

23.54.49 JOB20085 $HASP373 LKY95111 STARTED - WLM INIT - SRVCLASS MPBATNOR - SYS CICS

23.54.49 JOB20085 IEF403I LKY95111 - STARTED - TIME=23.54.49

23.54.49 JOB20085 - --TIMINGS (MINS.)-- ----PAGING COUNTS---

23.54.49 JOB20085 -JOBNAME STEPNAME PROCSTEP RC EXCP CPU SRB CLOCK SERV PG PAGE SWAP VIO SWAPS

23.54.49 JOB20085 -LKY95111 ULKFAN DELETE 00 23 .00 .00 .0 925 0 0 0 0 0

23.54.49 JOB20085 -LKY95111 ULKFAN OCS0067L 00 65 .00 .00 .0 1845 0 0 0 0 0

09.01.46 JOB20085 ---- SATURDAY, 25 MAY 2019 ----

09.01.46 JOB20085 -LKY95111 ULKFAN UNLOAD 04 1181 200.70 .00 546.9 1976M 0 0 0 0 0

09.01.46 JOB20085 -LKY95111 ULKFAN HT200 FLUSH 0 .00 .00 .0 0 0 0 0 0 0

09.01.46 JOB20085 -LKY95111 ULKFAN CATFILES 00 17 .00 .00 .0 845 0 0 0 0 0

09.01.46 JOB20085 -LKY95111 CE£S010A 00 35 .00 .00 .0 2776 0 0 0 0 0

09.01.46 JOB20085 -LKY95111 ICEGENA 00 129 .00 .00 .0 6390 0 0 0 0 0

09.01.46 JOB20085 IEF404I LKY95111 - ENDED - TIME=09.01.46

09.01.46 JOB20085 -LKY95111 ENDED. NAME- TOTAL CPU TIME=200.70 TOTAL ELAPSED TIME= 546.9

09.01.46 JOB20085 $HASP395 LKY95111 ENDED - RC=00041

19

Query selection process: run using IDAA1 J E S 2 J O B L O G -- S Y S T E M P R O D -- N O D E P R O D

0

09.26.30 JOB38290 ---- MONDAY, 27 MAY 2019 ----

09.26.30 JOB38290 IRR010I USERID S510981 IS ASSIGNED TO THIS JOB.

09.26.38 JOB38290 ICH70001I S510981 LAST ACCESS AT 09:26:12 ON MONDAY, MAY 27, 2019

09.26.38 JOB38290 $HASP373 S5109811 STARTED - WLM INIT - SRVCLASS BATNOR - SYS PROD

09.26.38 JOB38290 IEF403I S5109811 - STARTED - TIME=09.26.38

09.26.38 JOB38290 - --TIMINGS (MINS.)-- ----PAGING COUNTS---

09.26.38 JOB38290 -JOBNAME STEPNAME PROCSTEP RC EXCP CPU SRB CLOCK SERV PG PAGE SWAP VIO SWAPS

09.26.38 JOB38290 -S5109811 DELETE 00 4 .00 .00 .0 708 0 0 0 0 0

09.27.54 JOB38290 -S5109811 UNLOAD 04 138 .00 .00 1.2 34318 0 0 0 0 0

09.27.54 JOB38290 IEF404I S5109811 - ENDED - TIME=09.27.54

09.27.54 JOB38290 -S5109811 ENDED. NAME-DB2 UTILITY TOTAL CPU TIME= .00 TOTAL ELAPSED TIME= 1.2

09.27.54 JOB38290 $HASP395 S5109811 ENDED - RC=0004

*****

CPU: 0 HR 00 MIN 00.04 SEC SRB: 0 HR 00 MIN 00.00 SEC

1READY

DSN SYSTEM(DBP)

DSN

RUN PROGRAM(DSNTIAUL) PLAN(DSNTIAUL) PARM('SQL’)

****

READY

END

1 DSNT490I SAMPLE DATA UNLOAD PROGRAM

0 DSNT505I DSNTIAUL OPTIONS USED: SQL

0 DSNT503I UNLOAD DATA SET SYSPUNCH RECORD LENGTH SET TO 80

0 DSNT504I UNLOAD DATA SET SYSPUNCH BLOCK SIZE SET TO 80

0

SET CURRENT QUERY ACCELERATION = ELIGIBLE

DSNT400I SQLCODE = 000, SUCCESSFUL EXECUTION

20

Using IDAA: monitoring

Critical Open MPS_DB2_IDAA_ThreadElapsed CC=C03020058T|CAN=0 DBN:CICS:DB2 DBN:CICS:DB2 10/05/21 15:46 19 Minutes 10/05/21 15:45 Sampled MPS_DB2_IDAA_ThreadElapsed

FirstOccurrence LastOccurrence AlertGroup Node AlertKey Summary Notifica Receiver Servizio Ack Count Severity AckOccurrence

05/10/21 15:45 05/10/21 15:45 AP:ITM_DB2 DBN:CICS:DB2 M_AP_DB2_AVA_0C_IdaaThreadElap

Elapsed Time elevato per thread che va su Acceleratore 0

Sistema DB2 No 2 Critico 0

21

Using IDAA: monitoring

22

Db2ID MAX 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23

DBA 4 1 0 0 1 4 3 3 3 3 2 2 3 3 3 2 3 3 3 2 3 3 3 3 3

DBMD 4 1 1 0 0 4 2 2 3 3 2 3 3 3 3 2 3 3 3 3 3 2 3 3 2

DBMP 4 1 0 0 1 4 3 3 3 3 2 2 3 3 3 3 3 3 2 3 3 3 3 3 2

DBN 4 2 0 0 1 3 3 3 4 3 2 3 3 3 3 3 3 3 3 2 3 3 3 3 3

DBP 4 1 0 0 0 4 2 2 3 3 2 3 3 3 3 2 3 3 3 3 3 3 3 3 2

Db2ID MAX 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23

DBA 582.234 1.378 150 14.263 93.121 4.317 4.024 1.300 7.508 124.527 582.234 197.267 371.547 168.022 94.105 107.443 162.046 122.071 33.972 50.110 11.277 15.624 18.551 4.258 1.313

DBMD 570.277 1.840 184 16.238 90.782 4.670 6.407 1.049 7.368 122.775 570.277 215.962 375.557 165.177 94.107 102.480 160.139 139.482 50.808 53.677 12.844 19.474 15.671 4.299 1.066

DBMP 585.881 1.378 150 10.138 93.121 4.317 4.024 1.300 7.508 126.575 585.881 195.731 373.486 170.947 99.209 110.973 172.303 140.328 51.444 51.350 12.457 15.117 15.457 4.023 1.131

DBN 587.650 4.899 150 14.424 139.378 9.138 3.783 1.452 9.399 155.649 587.650 231.188 388.144 173.627 99.477 142.712 187.463 143.964 59.386 71.932 12.457 19.702 16.038 3.888 1.150

DBP 545.164 1.911 160 16.230 90.772 4.505 6.407 1.049 7.764 119.522 545.164 207.721 386.996 169.319 82.867 91.664 168.132 112.665 33.972 50.162 11.277 14.619 18.694 4.121 1.131

IDAA02FI REPLICATION - Mon, 03 May 2021 - ACCEL LATENCY - IDAA02FI

IDAA02FI REPLICATION - Mon, 03 May 2021 - ACCEL ROWS INSERTED - IDAA02FI

Using IDAA: monitoring

23

Business areas :

Improve response time and cut mips consumption for Batch and OLTP workloads

- Wire transfers (Daily Batch)

- Pre-authorizations (Daily Batch)

- Installment Loans Management (Daily Batch)

- Marketing (Daily Batch)

- Funds (Daily Batch)

- Credit Monitoring (Daily Batch)

- Risk Matrix (Monthly Batch)

- Centralized Contracts Management (Weekly Batch)

- Credit Monitoring (OLTP)

- Monitoring and Major Incident (OLTP)

- Counter area application authorizations (OLTP)

24

Results from the first set of jobs (application DI) processed by IDAA v.5

• TEST

• Elapsed time : -96% CPU GP: -99%

• Production

• Elapsed: -95% CPU GP: -88%

Using IDAA: Batch

25

Using IDAA for OLTP – InfoSphere CDC for z/OS

Incremental Update

• Syncs DB2 and Accelarator Data in near real-time

• Scope: Row

• Based on the Change Data Capture (CDC) component of IBM InfoSphere Data Replication

• INSERT/UPDATE/DELETE statements captured from DB2 log data and replicated

to the Accelerator

- Default apply interval approx 10 seconds

- UPDATES are decomposed into DELETEs and INSERTs

• Tables enabled for incremental update require either enforced uniqueness (primary key, unique

index) or defined informational constraint (via ACCEL_ADD_TABLES stored procedure)

- Required for DELETEs

• Continuous replication

- Base table not locked while table initially loaded to the Accelerator

- Replication not stopped if replication subscription is changed (tables added, removed, loaded,

reloaded)

26

Using IDAA for OLTP – InfoSphere CDC for z/OS

Incremental Update Architecture

27

Using IDAA for OLTP – InfoSphere CDC for z/OS

Source system CPU resources are required for

these processes:

Target system CPU resources are required for these

processes

• Capturing, decoding, and staging the changed data

stored in the database log files. The product must

process all records from the DB2 log for Units of

Recovery and records for tables having DATA

CAPTURE CHANGES configured. The DB2 IFI

interface must filter the entire database log at all

times, even if the majority of the log contains

out-of-scope data.

• Transmitting the captured change data to the

target system using TCP/IP.

• Querying the database directly for %GETCOL or

%SELECT functions.

• Receiving captured change data from the source system.

• Converting the captured change data to database

operations.

• Committing the database operations to the target

database.

• Execution of SQL operations by the target database during

the apply process on the target system.

• Multiple indexes on tables in the target database often

requires additional CPU resources (n.a. on IDAA)

• Code page conversion.

Rule of thumb: 2 CPU seconds on z/OS every 100 MB read from the DB2 log

5 CPU seconds on z/OS every 100 MB sent to IDAA

Actual CPU usage in Production: 100/170 mips per hour

Please note: the cost of CDC log reading is one-time, i.e. does not depend on the number of the

exploiters (but additional CPU is used for each exploiter sending data to the accelerator)

28

Query DDF APPMDNMC -

Credit Monitoring:

performance

optimization of some

Java calls

Using CDC

• From a z15

perspective:

-700 mips GP

-900 mips zIIP

• Avg. Response time:

before ≈ 20 seconds

After ≈ 4 seconds

Using IDAA per OLTP – Overall Report Class results

29

Query DDF APPMDNMC-

Credit Monitoring: new

complex and strategic Java

call optimization

Using CDC

• From a z15 perspective:

-130 mips GP

-200 mips zIIP

• Avg. Response time:

before ≈ 70 seconds

After ≈ 3 seconds

Using IDAA for OLTP – java call detail

30

Business Areas Achievements

Direct Debit (Batch)

Goal

Performance

improvement:

reduction of MIPS

usage on z15 and

DRAMATIC

reduction of

elapsed times of

some Batch and

OLTP processes

Bonifici (Batch)

Elise - Gestione Finanziamenti Rateali

(Batch)

EF - Gestione Accentrata Contratti

(Weekly Batch)

MK - Marketing (Batch)

Batch daily : -41.000 seconds ≈

-1.500 mips/hour in batch window

Elapsed time decreased by 90%

FZ - Fondi (Batch)

GA - Gestione Andamentale (Batch)

D3 - Preautorizzazioni (Batch)

YU - Matrice di Rischio

(Monthly Batch)

Batch: weekly : -7.000 seconds per exec

Batch: monthly: 55.000 seconds per exec

MD - Monitoraggio del Credito

(DDF OLTP)

MH - Major incident

(DDF OLTP)

OLTP Prime Time: -1.100 mips/hour

Elapsed time decreased by 60% - 90%

Total batch elapsed time from

30 to 1,5 hours

Takeaways - from z15 to IDAA: 1.500 mips/hour in daily batch window

1.100 mips/hour in OLTP Prime Time