lost in the smt world - gse homepageconferences.gse.org.uk/attachments/presentations/... · lost in...

49
Lost in the SMT world Danilo Gipponi EPV Technologies [email protected] www.epvtech.com

Upload: lamnhan

Post on 18-Aug-2018

221 views

Category:

Documents


1 download

TRANSCRIPT

Lost in the SMT world

Danilo Gipponi

EPV Technologies

[email protected]

www.epvtech.com

Disclaimer, copyright and trademarks

2

Disclaimer:

THE INFORMATION CONTAINED IN THIS PRESENTATION HAS NOT BEEN SUBMITTED TO ANY FORMAL REVIEW AND IS DISTRIBUTED ON AN “AS IS” BASIS WITHOUT ANY WARRANTY EITHER EXPRESS OR IMPLIED. THE USE OF THIS INFORMATION OR THE IMPLEMENTATION OF ANY OF THESE TECHNIQUES IS A USER RESPONSIBILITY AND DEPENDS ON THE USER’S ABILITY TO EVALUATE AND INTEGRATE THEM INTO THE USER’S OPERATIONAL ENVIRONMENT. WHILE EACH ITEM MAY HAVE BEEN REVIEWED FOR ACCURACY IN A SPECIFIC SITUATION, THERE IS NO GUARANTEE THAT THE SAME OR SIMILAR RESULTS WILL BE OBTAINED ELSEWHERE. USERS ATTEMPTING TO ADAPT THESE TECHNIQUES TO THEIR OWN ENVIRONMENTS DO SO AT THEIR OWN RISK.

Copyright Notice:

© EPV Technologies. All rights reserved.

Trademarks:

All the trademarks mentioned here belong to their respective companies.

Agenda

• Introduction

• Terminology

• SMT overview

• Capacity Factors

• CORE productivity and utilization

• MT-1 equivalent time

• Conclusions

3

Introduction

4

Introduction

• Simultaneous Multi Threading (SMT) is already used on other

platforms

• Currently available technologies can’t provide big additional

improvements in processor speed so IBM started introducing SMT

on the Mainframe with the z13 announcement

• Only for zIIP and IFL (for the moment)

• The reason of this prudent approach is that, from Capacity

Management point of view, this is a very critical change

5

Terminology

6

Terminology

7

Terminology

“The CPU Activity section reports on logical core and

logical processor activity. For each processor, the report

provides a set of calculations that are provided at a

particular granularity that depends on whether

multithreading is disabled or enabled ...”

8

RMF Report Analysis V2R2 SC34-2665-02

Terminology

“If multithreading is disabled for a processor type, all

calculations are at logical processor granularity.

If multithreading is enabled for a processor type, some

calculations are provided at logical core granularity and

some are provided at logical processor (thread)

granularity.”

9

RMF Report Analysis V2R2 SC34-2665-02

Terminology

•What do you mean by CPU if you are:

PR/SM Physical Processor - CP - CORE

z/OS Logical Processor - LCP - Logical CORE - Thread

Application Logical Processor – Thread

SMT terms

10

SMT overview

11

SMT overview

•Mainframe cores process instructions in multiple pipes

composed of a number of stages each performing one step

in the processing of an instruction, similar to an assembly line

• But a core can operate on a single instruction stream

• A big part of the core capacity is normally wasted when an

instruction stream gets stalled waiting for a cache miss to be

resolved

12

SMT overview

• With SMT, multiple instruction streams can be processed

simultaneously; when a thread is waiting for a cache miss the

core can continue doing work on behalf of the other threads

• Unfortunately, the additional throughput from SMT does not

scale very well with the number of threads

• This is because all the threads on a core share some limited

resources (e.g. pipes, processor cache, TLB)

13

SMT overview

• To activate SMT on z/OS, you have to:

define the PROCVIEW CORE option in LOADxx; if you do

not want to use SMT you can omit the PROCVIEW

parameter or specify PROCVIEW CPU which is the default;

IPL is needed to change it

set MT_ZIIP_MODE=2 in IEAOPTxx; it can be dynamically

changed

14

SMT overview

•Yellow LPARs are really using zIIP SMT

•Green LPARs are ready to use it

15

SYSTEM SYSPLEX OS LEVEL GMTOFF HDISP CPUS AAPS IIPS CORE MT CPU MT IIP

SYS1 SYS1PLX ZV011300 2 Y 2 0 2 N 1 1

SYS2 SYS2PLX ZV020100 2 Y 2 0 2 N 1 1

SYS3 SYS3PLX ZV020100 2 Y 2 0 2 N 1 1

SYS4 SYS4PLX ZV020100 2 Y 1 0 1 Y 1 2

SYS5 SYS5PLX ZV020100 2 Y 1 0 1 Y 1 2

SYS6 SYS6PLX ZV020100 2 Y 6 0 6 Y 1 1

SYS7 SYS7PLX ZV020100 2 Y 6 0 6 Y 1 1

SYS8 SYS8PLX ZV011300 2 Y 3 0 2 N 1 1

SMT overview

•MT-1 means that there is only 1 thread per CORE; this

is the only possible option for standard CPUs at the

moment

• MT-2 means that there are 2 threads per CORE; you

can activate it on zIIPs (or IFLs)

16

SMT overview

17

MT-1

MT-2

Faster executionLower throughput

Slower executionHigher throughput

SMT

18

SMT overview

• Expected speed reduction when 2 threads active:

Similar to having more slower engines

In the 30-40% range

19

SMT overview

• Throughput variability:

Throughput depends on workload (threads) characteristics

On average up to 40% increase when 2 threads active

But it may also decrease

20

Capacity Factors

21

Capacity Factors

• The MT-2 Maximum Capacity Factor (Max CF) is the ratio of

the maximum amount of work that can be accomplished using

2 threads to the amount of work that would have been

accomplished with 1 thread

• MT-1 Max Capacity Factor is 1.0

• MT-2 Max Capacity Factor is workload dependent; max

theoretical value is 2

22

Capacity Factors

• The MT-2 Capacity Factor (CF) is the ratio of the maximum

amount of work that has been accomplished using 1 or 2

threads to the amount of work that would have been

accomplished with multithreading disabled

• Thread Density (TD) represents the average number of active

threads when a core is dispatched

• If most of the time TD is 1, CF should be close to 1; if most of

the time TD is 2, CF should be close to MAX CF

23

Capacity Factors – Manual example

• In this RMF report snapshot you can note that:

MT-1 is used for CP; MAX CF, CF and AVG TD value is 1

MT-2 is used for zIIP; MAX CF is 1,804 and CF is 1,746

zIIP CF and MAX CF are very close because TD is almost 2

24

Capacity Factors – Real case

• SMT throughput benefit about 14% on this system

• Average thread density a bit less than 1,4

25

Capacity Factors

• New MT Diagnostic Counter set in z13

Counter 448 – Cycle count with one thread active

Counter 449 – Cycle count with two threads active

• HISMT API provided to get metrics even if the HIS AS is not

active

• WLM and RMF can retrieve metrics for workload management

and reporting

26

Capacity Factors

• Where do they come from ?

• Some more information recently provided by IBM

• Instructions performed and cycles used with 1 and 2 active

threads are the base measurements

27

Capacity Factors – Thread Density

D448 D449

• AVG TD = -------------------- * 1 + -------------------- * 2

(D448 + D449) (D448 + D449)

28

Capacity Factors – Thread Density

29

30

Capacity Factors – Thread Density

Capacity Factors – New formulas

31

I I_1+I_2

Instructions per cycle = IPC = ------ = -------------------

C C_1+C_2

I_1

Instructions per cycle 1 thread = IPC_1 = --------

C_1

I_2

Instructions per cycle 2 threads = IPC_2 = --------

C_2

Capacity Factors – New formulas

32

IPC I_1 + I_2

Productivity = ----------- = --------------------------

IPC_2 IPC_2 * (C_1+C_2)

IPC_2

Max CF = -----------

IPC_1

(I_1 + I_2)

CF = Productivity * Max CF = --------------------------

(IPC_1/(C_1+C_2))

Capacity Factors – Speculations

• Issues in previous formulas if you have always 1 thread or

always 2 threads

• Some correction should be done in extreme cases

• In real life sometimes MAX CF and CF show strange values

• Next slides show two systems with the same workload running

33

Capacity Factors – Strange values

34

CORE productivity

and utilization

35

CORE productivity and utilization

• MAX CF is an estimated value of the maximum possible

throughput

• It is also used to re-evaluate CPU utilization which is not simply

measured anymore in MT-2

• This is needed to maintain a proportion between Throughput

and Utilization

36

CORE productivity and utilization

• CORE productivity is the percentage of the maximum core

capacity that has been used while the logical core was

dispatched to physical hardware

• If CORE productivity equals 100% all threads on the core are

executing work and all core resources are being used

• Can be calculated as a ratio between CF and MAX CF by

inverting the formula previously discussed

CF = Productivity * Max CF37

CORE productivity and utilization

38

MT % PROD = 1,746 / 1,804 = 96,78

CORE productivity and utilization

• LPAR busy simply tells you that the logical core is dispatched

• CORE utilization is supposed to be a more precise metric than

LPAR busy; it should tell you how much work the CORE can still

execute

• CORE utilization is calculated by multiplying LPAR busy and

CORE productivity

39

CORE productivity and utilization

40

MT % UTIL = 76,08 * 96,48 / 100 = 73,40

MT-1 equivalent time

41

MT-1 equivalent time

• With SMT enabled all accounting fields (SMF 30, 72, etc)

report zIIP consumption of workloads as MT-1 Equivalent Time

and Service Units

• MT-1 Equivalent Time is the zIIP time that would have taken to

run the same work in MT-1 mode

• MT-1 Equivalent Time is internally calculated as

MAX CF * zIIP time

42

MT-1 equivalent time

• Most important consequence of MT-1 Equivalent Time

measurements is that when working in MT-2 you have to

change the calculation of the capacity used by any workload

• Example of the old algorithm:

Workload A used 1.800 zIIP seconds in 1 hour

1 CORE is targeted 1.000 MIPS

used COREs = 1.800 / 3.600 = ,5

used MIPS = 1.000 * ,5 = 500

43

MT-1 equivalent time

• Example of the new algorithm if MT=2 and MAX CF is 1,25

Workload A used 1.800 zIIP seconds in 1 hour

1 CORE is targeted 1.000 MIPS

used COREs = 1.800 / ( 3.600 * 1,25 ) = 0,4

used MIPS = 1.000 * 0,4 = 400

44

MT-1 equivalent time

• If you don’t correct your reports strange results can be obtained

45

Conclusions

46

Conclusions

• The introduction of SMT changed an important part of the

Mainframe terminology

• With SMT new metrics have been added which have to be

clearly understood in order to perform correct Capacity

Management activities

• Most of the currently used accounting formulas should be

reviewed especially if SMT will be extended to standard CP

47