barclays use of tasm to manage ad hoc business users ... · to manage ad hoc business users &...

35
Barclays use of TASM to manage Ad Hoc Business Users & Operational/Batch workloads Ian Russell Teradata Certified Master V2R5 Barclays Bank

Upload: lylien

Post on 05-May-2018

237 views

Category:

Documents


1 download

TRANSCRIPT

Barclays use of TASM to manage Ad Hoc Business Users & Operational/Batch workloads

Ian RussellTeradata Certified Master V2R5Barclays Bank

2(C) Teradata Universe Seoul 2008 All Rights Reserved.

2

Agenda

• Brief overview

> Barclays

> Teradata data warehouse environment

> Teradata data warehouse workload

• Workload management prior to TASM and issues• How TASM was implemented • Workload management under TASM• Outstanding issues & Successes

3(C) Teradata Universe Seoul 2008 All Rights Reserved.

3

Brief overview of Barclays

• Barclays is one of the world’s largest financial services providers offering banking, investment banking and investment management services to more than 27 million customers in over 50 countries, and employing more than 123,000 people worldwide.

• 2006 pre-tax profits £7,136m > 2007 half-year profit of £4,101m

• Our ambition is to become one of the handful of universal banks leading the global financial services industry.

• Our strategy follows a simple premise: anticipate the needs of our customers and clients, then serve them by helping them achieve their goals.

4(C) Teradata Universe Seoul 2008 All Rights Reserved.

4

Overview of the Teradata data warehouse environment

• Barclays has 5 Teradata servers with a total of 62TB storage, running V2R6.2

• These are connected to 5 Mainframe LPARs and many mid-tier servers

• Batch primarily is executed from Mainframe with a smaller extent from UNIX

• Ab Initio is being introduced for batch work

• Archives of source system data going back over 10 years are held

• Users have access to a wide variety of tools – OpenText BI, SQL Assistant, SAS, Business Objects, White Light, MS Access/Excel, Informatica, KXEN, etc.. access over Wide Area Network

5(C) Teradata Universe Seoul 2008 All Rights Reserved.

5

Overview of the Teradata data warehouse workload

• 1,500 business users across Barclays organisation – not all UK based, off-shore development/test

• Up to 1,500,000 ad hoc queries submitted daily

• Users have access to sand-pits and ability to create/drop database objects and load/unload data

• User have access to a variety of tools to access the DW

• The dictionary is updated regularly

• Batch/operational database loads or builds 700 databases on either a daily, weekly or monthly basis

• No integrated data model

• Priority is given to batch overnight and users during the day

• Data is restored frequently for analysis

• TCRM application

6(C) Teradata Universe Seoul 2008 All Rights Reserved.

6

Agenda

• Brief overview

> Barclays

> Teradata data warehouse environment

> Teradata data warehouse workload

• Workload management prior to TASM and issues• How TASM was implemented • Workload management under TASM• Outstanding issues & Successes

7(C) Teradata Universe Seoul 2008 All Rights Reserved.

7

Pre-TASM and Issues

• Response times to user queries varies widely

• PSF split between BAU (User) and Operational work

• Limited ability to push critical path of batch through

• The main Teradata server was at CPU capacity

• Poorly written queries could take huge amounts of system resources that required manual identification and action. This was not available over night and batch could be impacted.

• Well written short queries were being delayed along with the bad within TDQM

• Certain business areas felt they owned percentages of the platform owing to how it was funded.

8(C) Teradata Universe Seoul 2008 All Rights Reserved.

8

Answer

• To overcome this Barclays engaged with Teradata for the provision of a world class workload management - TASM

• Teradata Professional Services provided the following consultants> TASM Proof of Concept: March 2006 Paul Davies

> TASM Implementation: September 2006 Brian Middleton

9(C) Teradata Universe Seoul 2008 All Rights Reserved.

9

Agenda

• Brief overview

> Barclays

> Teradata data warehouse environment

> Teradata data warehouse workload

• Workload management prior to TASM and issues

• How TASM was implemented

• Workload management under TASM• Outstanding issues & Successes

10(C) Teradata Universe Seoul 2008 All Rights Reserved.

10

Proof of Concept - Lessons learnt

• TASM was the direction Barclays wished to go

• Teradata Workload Analyser was immature

• Barclays specifics> Attempts to apportion % of machine to specific businesses by

financial contribution can lead to inefficient use of the system> Account Strings needed cleaning up> One central batch user with many account strings needed changing> Profiles should be introduced> Need to decide on workloads> DBQL data needs to be in DBC format for analysis> DBQL StepInfo needs activating> DBQL TDWM columns need holding in DBQL history> Colloquial knowledge of workloads does not match actual

• Further assistance from Teradata was required

11(C) Teradata Universe Seoul 2008 All Rights Reserved.

11

Define Workloads

• Barclays had already made the split between Operational workloadand User queries – BAU

• Further workloads were identified> Tactical – guaranteed response times

> High, Standard & Low Level Service

> Batch

> DBA work, including BAR

> System• Decide on policy of rewarding short queries and penalising long

queries

• Time periods for when specific workloads have priority were alsodecided upon

12(C) Teradata Universe Seoul 2008 All Rights Reserved.

12

Workload Classifications (1/2)

Account String = ‘$OPS$&D&H’, Estimated Processing <= 0.1 SecondFast Operations Support queriesOP_Support_Instant

Account String = ‘$OPS$&D&H’Operations Support queriesOP_Support

Account String = ‘$BAU$&D&H’, Load Utility Type = ALLUtility BAU queriesBAU_Utility

Account String = ‘$BAU$&D&H’, Estimated Processing <= 0.1 Second, <= 30 Seconds, <= 20 Minutes & > 20 Minutes

Fast, short, medium and long-running BAU queries

BAU_Instant, BAU_Short, BAU_Medium,BAU_Long

Account String = ‘$HSL$&D&H’, Load Utility Type = ALLUtility HSL queriesHSL_Utility

Account String = ‘$HSL$&D&H’, Estimated Processing <= 0.1 Second, <= 30 Seconds, <= 25 Minutes & > 25 Minutes

Fast, short, medium and long-running HSL queries

HSL_Instant, HSL_Short, HSL_Medium,HSL_Long

Account String = ‘$OPS$&D&H’, Load Utility Type = ALLUtility BAU Support queriesOP_Support_Utility

User = Crashdumps, DBADMIN, USERADMIN, DBAXK, APPLADMINLow priority DBA work - housekeepingDBA_Low

User = DBC, DBCMANAGER, TDWMDBC, DBCMANAGER, TDWMDefault_High

Account String = ‘$TAC$&D&H’All-Amp Tactical QueriesTactical_AllAmp

User = DBADMIN, USERADMIN, Estimated Processing <= 10 SecondsDBADMIN, USERADMINDBA_High

Account String = ‘$TAC$&D&H’, Single or few AMPsSingle-Amp Tactical QueriesTactical_Single

ClassificationPurposeWorkload

13(C) Teradata Universe Seoul 2008 All Rights Reserved.

13

All queries not defined aboveCatch allDefault

Account String = ‘$LSL$&D&H’Long running LSL queriesLSL_Long

Account String = ‘$ARC$&D&H’Routine archivesOP_Archives

Account String = ‘$RST$&D&H’User-initiated restoresOP_Restores

Account String = ‘$OPH$&D&H’, Load Utility Type = ALLAccount String =‘$OPM$&D&H’, Load Utility Type = ALLAccount String =‘$OPL$&D&H’, Load Utility Type = ALL

High. Medium & low Priority Batch Utility Jobs

OP_High_Utility,OP_Medium_Utility,OP_Low_Utility

Account String = ‘$LSL$&D&H’, Estimated Processing <= 0.1 SecondFast LSL queriesLSL_Instant

NoneSystem workloadsConsoleR, ConsoleH, ConsoleMConsoleL

Account String = ‘$LSL$&D&H’, Load Utility Type = ALLUtility BAU queriesLSL_Utility

Account String = ‘$OPL$&D&H’Low Priority Batch JobsOP_Low

Account String = ‘$OPM$&D&H’Medium Priority Batch JobsOP_Medium

Account String = ‘$OPH$&D&H’High Priority Batch JobsOP_High

Account String In ‘$OPH$&D&H’, ‘$OPM$&D&H’,Estimated Processing <= 0.1 Second

Fast OP queriesOP_Instant

ClassificationPurposeWorkload

Workload Classifications (2/2)

14(C) Teradata Universe Seoul 2008 All Rights Reserved.

14

Workload Exceptions (1/2)

Continue and LogCPU millisec per IO > 10 for 600 Seconds BAU_Long

Change to BAU_LongCPU Time >= 2000 SecondsBAU_Medium

Change to BAU_MediumCPU Time >= 200 SecondsBAU_Short

Change to HSL_LongCPU Time >= 2000 SecondsHSL_Medium

Continue and LogCPU millisec per IO > 10 for 600 Seconds HSL_Long

Change to HSL_MediumCPU Time >= 200 SecondsHSL_Short,

Change to OP_SupportCPU Time >= 10 SecondsOP_Support_Instant

Continue and LogCPU millisec per IO > 10 for 600 Seconds OP_Support

No Exception MonitoringBAU_Utility

Change to BAU_ShortCPU Time >= 10 SecondsBAU_Instant

No Exception MonitoringHSL_Utility

Change to HSL_ShortCPU Time >= 10 SecondsHSL_Instant

No Exception MonitoringOP_Support_Utility

No Exception MonitoringDBA_Low

No Exception MonitoringDefault_High

Continue and LogCPU Time >= 10 SecondsTactical_AllAmp

Change to DBA_LowCPU Time >= 100 SecondsDBA_High

Continue and LogCPU Time >= 2 SecondsTactical_Single

Exception ActionException CriteriaWorkload

15(C) Teradata Universe Seoul 2008 All Rights Reserved.

15

Workload Exceptions (2/2)

No Exception MonitoringOP_Medium_Utility

No Exception MonitoringOP_Low_Utility

No Exception MonitoringDefault

Continue and LogCPU millisec per IO > 10 for 600 Seconds LSL_Long

No Exception MonitoringOP_Archives

No Exception MonitoringOP_Restores

No Exception MonitoringOP_High_Utility

Change to LSL_LongCPU Time >= 10 SecondsLSL_Instant

No Exception MonitoringConsoleR, ConsoleH, ConsoleMConsoleL

No Exception MonitoringLSL_Utility

Continue and LogCPU millisec per IO > 10 for 600 Seconds OP_Low

Continue and LogCPU millisec per IO > 10 for 600 Seconds OP_Medium

Continue and LogCPU millisec per IO > 10 for 600 Seconds OP_High

Change to OP_MediumCPU Time >= 10 SecondsOP_Instant

Exception ActionException CriteriaWorkload

16(C) Teradata Universe Seoul 2008 All Rights Reserved.

16

Periods

Weekday Daytime

Weekday Evening

Weekday Night

09:00 18:00 20:00 08:00

Monday

Weekday Daytime

Weekday Evening

Weekday Night

09:00 18:00 20:00 08:00

Wednesday

Weekday Daytime

Weekday Evening

Weekday Night

09:00 18:00 20:00 08:00

Tuesday

Weekday Daytime

Weekday Evening

Weekday Night

09:00 18:00 20:00 08:00

Thursday

Weekday Daytime

Weekday Evening

Weekday Night

09:00 18:00 20:00 08:00

Friday Saturday Sunday

Default

Default

Wrap Around Midnight

Wrap Around Midnight

Wrap Around Midnight

Wrap Around Midnight

Wrap Around Midnight

17(C) Teradata Universe Seoul 2008 All Rights Reserved.

17

Allocation Group Weights

Standard

Standard

Standard

Standard

Standard

Default

Default

Default

Default

Tactical

Resource Partition

1

8

20

40

50

5

10

30

40

Tactical

Weekday Night(M-F, 20:00–08:00)

303030H

101010M

555L

404040R

20205UserLow

403010UserHigh

20120OpLow

40835OpHigh

805050Priority

TacticalTacticalTacticalTactical

Weekday Evening

(M-F, 18:00–20:00)

Weekday Daytime

(M-F, 09:00–18:00)

DefaultAG

18(C) Teradata Universe Seoul 2008 All Rights Reserved.

18

Workload to Allocation Group Mapping

Standard

Standard

Standard

Standard

Standard

Default

Default

Default

Default

Tactical

Resource Partition

DBA_Low, BAU_Long, LSL_Long, BAU_Medium, BAU_Short, LSL_Utility, BAU_Utility

HSL_Utility, HSL_Long, HSL_Medium, HSL_Short,

OP_Medium_Utility, OP_Archives, OP_Low, OP_Low_Utility, OP_Medium, OP_Restores

OP_Support_Utility, OP_High, OP_Support, OP_High_Utility

LSL_Instant, OP_Instant, BAU_Instant, HSL_Instant, OP_Support_Instant

ConsoleL

Default, ConsoleM

Default_High, ConsoleH, DBA_High

ConsoleR

Tactical_Single, Tactical_AllAmp

Workload

H

M

L

R

UserLow

UserHigh

OpLow

OpHigh

Priority

Tactical

Allocation Group

19(C) Teradata Universe Seoul 2008 All Rights Reserved.

19

Workload Throttles (Monitor and Tune)

NoneNoneNoneNoneOP_Support_Utility

NoneNoneNoneNoneOP_Support_Instant

2222OP_Support

2363BAU_Medium

1111BAU_Long

NoneNoneNoneNoneBAU_Utility

NoneNoneNoneNoneBAU_Instant

6122012BAU_Short

2222HSL_Medium

1111HSL_Long

3343HSL_Short

NoneNoneNoneNoneHSL_Instant

NoneNoneNoneNoneHSL_Utility

None

None

None

None

None

Weekday Night(M-F, 20:00–08:00)

NoneNoneNoneDefault_High

NoneNoneNoneDBA_Low

NoneNoneNoneTactical_AllAmp

NoneNoneNoneDBA_High

NoneNoneNoneTactical_Single

Weekday Evening(M-F, 18:00–20:00)

Weekday Daytime(M-F, 09:00–18:00)

DefaultWorkload

20(C) Teradata Universe Seoul 2008 All Rights Reserved.

20

Workload Throttles (Monitor and Tune)

4434OP_High

NoneNoneNoneNoneOP_Low_Utility

NoneNoneNoneNoneOP_Instant

NoneNoneNoneNoneOP_Medium_Utility

NoneNoneNoneNoneOP_High_Utility

208220OP_Medium

4214OP_Low

NoneNoneNoneNoneOP_Archives

NoneNoneNoneNoneOP_Restores

Weekday Night(M-F, 20:00–08:00)

Weekday Evening(M-F, 18:00–20:00)

Weekday Daytime(M-F, 09:00–18:00)

DefaultWorkload

NoneNoneNoneNoneDefault

1111LSL_Long

NoneNoneNoneNoneLSL_Instant

NoneNoneNoneNoneLSL_Utility

NoneNoneNoneNoneConsoleL

NoneNoneNoneNoneConsoleM

NoneNoneNoneNoneConsoleH

NoneNoneNoneNoneConsoleR

21(C) Teradata Universe Seoul 2008 All Rights Reserved.

21

Service Level Goals

1,080 Seconds

1,680 Seconds

1,080 Seconds

1,800 Seconds

240 Seconds

40 Seconds

2 Seconds

360 Seconds

600 Seconds

120 Seconds

20 Seconds

1 Seconds

360 Seconds

120 Seconds

300 Seconds

1 Seconds

0

0

Response Time Goal (95%)

0

0

0

0

0

0

0

0

0

0

0

0

0

500

350

800

0

0

Arrival Rate (QpH)

Nightime

Throughput Goal (QpH)

Daytime

900 Seconds100BAU_Long

360 Seconds600BAU_Utility

51 Seconds20,000BAU_Instant

90 Seconds3,000BAU_Short

360 Seconds300BAU_Medium

1,080 Seconds10OP_Support_Utility

10

80

135

100

500

5,000

600

500

350

800

0

0

Arrival Rate (QpH)

Throughput Goal (QpH)

1,080 Seconds

1,680 Seconds

660 Seconds

300 Seconds

50 Seconds

15 Seconds

360 Seconds

120 Seconds

300 Seconds

1 Seconds

0

0

Response Time Goal (95%)

Tactical_Single

Tactical_AllAmp

Default_High

DBA_High

DBA_Low

HSL_Utility

HSL_Instant

OP_Support_Instant

OP_Support

HSL_Long

HSL_Medium

HSL_Short

Workload

22(C) Teradata Universe Seoul 2008 All Rights Reserved.

22

Service Level Goals

0

660 Seconds

15 Seconds

360 Seconds

0

0

0

0

0

0

120 Seconds

540 Seconds

120 Seconds

600 Seconds

120 Seconds

540 Seconds

120 Seconds

Response Time Goal (95%)

0

0

0

0

0

0

0

0

0

0

100

200

3

1,500

100

200

3

Arrival Rate (QpH)

Nightime

Throughput Goal (QpH)

Daytime

660 Seconds25LSL_Long

00ConsoleM

00ConsoleL

360 Seconds600LSL_Utility

15 Seconds5,000LSL_Instant

00Default

0

0

0

0

0

0

0

1,500

0

0

0

Arrival Rate (QpH)

Throughput Goal (QpH)

0

0

0

0

120 Seconds

540 Seconds

120 Seconds

600 Seconds

120 Seconds

540 Seconds

120 Seconds

Response Time Goal (95%)

OP_High_Utility

OP_Medium_Utility

OP_Low_Utility

OP_Instant

OP_High

OP_Medium

OP_Low

ConsoleH

ConsoleR

OP_Restores

OP_Archives

Workload

23(C) Teradata Universe Seoul 2008 All Rights Reserved.

23

Agenda

• Brief overview

> Barclays

> Teradata data warehouse environment

> Teradata data warehouse workload

• Workload management prior to TASM and issues• How TASM was implemented • Workload management under TASM• Outstanding issues & Successes

24(C) Teradata Universe Seoul 2008 All Rights Reserved.

24

Routine Monitoring

• Delay Queue Statistics

• Service Level Goals

• TDWM Events

• TDWM Exceptions

• TDWM Summary

• DBQL

• In-house MS Excel spreadsheets

25(C) Teradata Universe Seoul 2008 All Rights Reserved.

25

WLD Usage – Partition CPU

26(C) Teradata Universe Seoul 2008 All Rights Reserved.

26

WLD Usage - BAU OP CPU

27(C) Teradata Universe Seoul 2008 All Rights Reserved.

27

WLD Usage – Bad Queries

28(C) Teradata Universe Seoul 2008 All Rights Reserved.

28

WLD Reports – HSL Usage

29(C) Teradata Universe Seoul 2008 All Rights Reserved.

29

WLD Reports - SLG

30(C) Teradata Universe Seoul 2008 All Rights Reserved.

30

User Intranet page - current usage

31(C) Teradata Universe Seoul 2008 All Rights Reserved.

31

DBA work

• Dynamic change of priority

• Release of Delays Query

• Catch up of Delayed Batch rule set

• Application of Throttles to Users

• Query investigation and allocation to workload

32(C) Teradata Universe Seoul 2008 All Rights Reserved.

32

Agenda

• Brief overview

> Barclays

> Teradata data warehouse environment

> Teradata data warehouse workload

• Workload management prior to TASM and issues• How TASM was implemented • Workload management under TASM• Outstanding issues & Successes

33(C) Teradata Universe Seoul 2008 All Rights Reserved.

33

Outstanding Issues

• Business• Matching workflow between Dev, OAT & Production. What works well in Dev/OAT

can be impacted by TASM in Production.

• Further time periods – first 5 working days of the month, first day of the month

• Users demanding access to HSL

• No take up of the LSL

• TASM• Zero estimated processing time

• SELECT DISTINCT

• Delay Time is not shown in Sessions Delayed

• Cannot see the SQL of queries held in Delay

• Have to log on to TDWM as tdwm

• TASM Modeller

34(C) Teradata Universe Seoul 2008 All Rights Reserved.

34

Pre-TASM and Issues

• Response times to user queries varies widely

• Limited ability to push batch critical path through

• The main Teradata server was at CPU capacity

• Poorly written queries could take huge amounts of system resources that required manual identification and action. This was not available over night and batch could be impacted.

• Well written short queries were being delayed by the bad

35(C) Teradata Universe Seoul 2008 All Rights Reserved.

35

Barclays TASM implementation

• Questions

• Contact [email protected]