lost in the smt world - gse homepageconferences.gse.org.uk/attachments/presentations/... · lost in...
TRANSCRIPT
Lost in the SMT world
Danilo Gipponi
EPV Technologies
www.epvtech.com
Disclaimer, copyright and trademarks
2
Disclaimer:
THE INFORMATION CONTAINED IN THIS PRESENTATION HAS NOT BEEN SUBMITTED TO ANY FORMAL REVIEW AND IS DISTRIBUTED ON AN “AS IS” BASIS WITHOUT ANY WARRANTY EITHER EXPRESS OR IMPLIED. THE USE OF THIS INFORMATION OR THE IMPLEMENTATION OF ANY OF THESE TECHNIQUES IS A USER RESPONSIBILITY AND DEPENDS ON THE USER’S ABILITY TO EVALUATE AND INTEGRATE THEM INTO THE USER’S OPERATIONAL ENVIRONMENT. WHILE EACH ITEM MAY HAVE BEEN REVIEWED FOR ACCURACY IN A SPECIFIC SITUATION, THERE IS NO GUARANTEE THAT THE SAME OR SIMILAR RESULTS WILL BE OBTAINED ELSEWHERE. USERS ATTEMPTING TO ADAPT THESE TECHNIQUES TO THEIR OWN ENVIRONMENTS DO SO AT THEIR OWN RISK.
Copyright Notice:
© EPV Technologies. All rights reserved.
Trademarks:
All the trademarks mentioned here belong to their respective companies.
Agenda
• Introduction
• Terminology
• SMT overview
• Capacity Factors
• CORE productivity and utilization
• MT-1 equivalent time
• Conclusions
3
Introduction
• Simultaneous Multi Threading (SMT) is already used on other
platforms
• Currently available technologies can’t provide big additional
improvements in processor speed so IBM started introducing SMT
on the Mainframe with the z13 announcement
• Only for zIIP and IFL (for the moment)
• The reason of this prudent approach is that, from Capacity
Management point of view, this is a very critical change
5
Terminology
“The CPU Activity section reports on logical core and
logical processor activity. For each processor, the report
provides a set of calculations that are provided at a
particular granularity that depends on whether
multithreading is disabled or enabled ...”
8
RMF Report Analysis V2R2 SC34-2665-02
Terminology
“If multithreading is disabled for a processor type, all
calculations are at logical processor granularity.
If multithreading is enabled for a processor type, some
calculations are provided at logical core granularity and
some are provided at logical processor (thread)
granularity.”
9
RMF Report Analysis V2R2 SC34-2665-02
Terminology
•What do you mean by CPU if you are:
PR/SM Physical Processor - CP - CORE
z/OS Logical Processor - LCP - Logical CORE - Thread
Application Logical Processor – Thread
SMT terms
10
SMT overview
•Mainframe cores process instructions in multiple pipes
composed of a number of stages each performing one step
in the processing of an instruction, similar to an assembly line
• But a core can operate on a single instruction stream
• A big part of the core capacity is normally wasted when an
instruction stream gets stalled waiting for a cache miss to be
resolved
12
SMT overview
• With SMT, multiple instruction streams can be processed
simultaneously; when a thread is waiting for a cache miss the
core can continue doing work on behalf of the other threads
• Unfortunately, the additional throughput from SMT does not
scale very well with the number of threads
• This is because all the threads on a core share some limited
resources (e.g. pipes, processor cache, TLB)
13
SMT overview
• To activate SMT on z/OS, you have to:
define the PROCVIEW CORE option in LOADxx; if you do
not want to use SMT you can omit the PROCVIEW
parameter or specify PROCVIEW CPU which is the default;
IPL is needed to change it
set MT_ZIIP_MODE=2 in IEAOPTxx; it can be dynamically
changed
14
SMT overview
•Yellow LPARs are really using zIIP SMT
•Green LPARs are ready to use it
15
SYSTEM SYSPLEX OS LEVEL GMTOFF HDISP CPUS AAPS IIPS CORE MT CPU MT IIP
SYS1 SYS1PLX ZV011300 2 Y 2 0 2 N 1 1
SYS2 SYS2PLX ZV020100 2 Y 2 0 2 N 1 1
SYS3 SYS3PLX ZV020100 2 Y 2 0 2 N 1 1
SYS4 SYS4PLX ZV020100 2 Y 1 0 1 Y 1 2
SYS5 SYS5PLX ZV020100 2 Y 1 0 1 Y 1 2
SYS6 SYS6PLX ZV020100 2 Y 6 0 6 Y 1 1
SYS7 SYS7PLX ZV020100 2 Y 6 0 6 Y 1 1
SYS8 SYS8PLX ZV011300 2 Y 3 0 2 N 1 1
SMT overview
•MT-1 means that there is only 1 thread per CORE; this
is the only possible option for standard CPUs at the
moment
• MT-2 means that there are 2 threads per CORE; you
can activate it on zIIPs (or IFLs)
16
SMT overview
• Expected speed reduction when 2 threads active:
Similar to having more slower engines
In the 30-40% range
19
SMT overview
• Throughput variability:
Throughput depends on workload (threads) characteristics
On average up to 40% increase when 2 threads active
But it may also decrease
20
Capacity Factors
• The MT-2 Maximum Capacity Factor (Max CF) is the ratio of
the maximum amount of work that can be accomplished using
2 threads to the amount of work that would have been
accomplished with 1 thread
• MT-1 Max Capacity Factor is 1.0
• MT-2 Max Capacity Factor is workload dependent; max
theoretical value is 2
22
Capacity Factors
• The MT-2 Capacity Factor (CF) is the ratio of the maximum
amount of work that has been accomplished using 1 or 2
threads to the amount of work that would have been
accomplished with multithreading disabled
• Thread Density (TD) represents the average number of active
threads when a core is dispatched
• If most of the time TD is 1, CF should be close to 1; if most of
the time TD is 2, CF should be close to MAX CF
23
Capacity Factors – Manual example
• In this RMF report snapshot you can note that:
MT-1 is used for CP; MAX CF, CF and AVG TD value is 1
MT-2 is used for zIIP; MAX CF is 1,804 and CF is 1,746
zIIP CF and MAX CF are very close because TD is almost 2
24
Capacity Factors – Real case
• SMT throughput benefit about 14% on this system
• Average thread density a bit less than 1,4
25
Capacity Factors
• New MT Diagnostic Counter set in z13
Counter 448 – Cycle count with one thread active
Counter 449 – Cycle count with two threads active
• HISMT API provided to get metrics even if the HIS AS is not
active
• WLM and RMF can retrieve metrics for workload management
and reporting
26
Capacity Factors
• Where do they come from ?
• Some more information recently provided by IBM
• Instructions performed and cycles used with 1 and 2 active
threads are the base measurements
27
Capacity Factors – Thread Density
D448 D449
• AVG TD = -------------------- * 1 + -------------------- * 2
(D448 + D449) (D448 + D449)
28
Capacity Factors – New formulas
31
I I_1+I_2
Instructions per cycle = IPC = ------ = -------------------
C C_1+C_2
I_1
Instructions per cycle 1 thread = IPC_1 = --------
C_1
I_2
Instructions per cycle 2 threads = IPC_2 = --------
C_2
Capacity Factors – New formulas
32
IPC I_1 + I_2
Productivity = ----------- = --------------------------
IPC_2 IPC_2 * (C_1+C_2)
IPC_2
Max CF = -----------
IPC_1
(I_1 + I_2)
CF = Productivity * Max CF = --------------------------
(IPC_1/(C_1+C_2))
Capacity Factors – Speculations
• Issues in previous formulas if you have always 1 thread or
always 2 threads
• Some correction should be done in extreme cases
• In real life sometimes MAX CF and CF show strange values
• Next slides show two systems with the same workload running
33
CORE productivity and utilization
• MAX CF is an estimated value of the maximum possible
throughput
• It is also used to re-evaluate CPU utilization which is not simply
measured anymore in MT-2
• This is needed to maintain a proportion between Throughput
and Utilization
36
CORE productivity and utilization
• CORE productivity is the percentage of the maximum core
capacity that has been used while the logical core was
dispatched to physical hardware
• If CORE productivity equals 100% all threads on the core are
executing work and all core resources are being used
• Can be calculated as a ratio between CF and MAX CF by
inverting the formula previously discussed
CF = Productivity * Max CF37
CORE productivity and utilization
• LPAR busy simply tells you that the logical core is dispatched
• CORE utilization is supposed to be a more precise metric than
LPAR busy; it should tell you how much work the CORE can still
execute
• CORE utilization is calculated by multiplying LPAR busy and
CORE productivity
39
MT-1 equivalent time
• With SMT enabled all accounting fields (SMF 30, 72, etc)
report zIIP consumption of workloads as MT-1 Equivalent Time
and Service Units
• MT-1 Equivalent Time is the zIIP time that would have taken to
run the same work in MT-1 mode
• MT-1 Equivalent Time is internally calculated as
MAX CF * zIIP time
42
MT-1 equivalent time
• Most important consequence of MT-1 Equivalent Time
measurements is that when working in MT-2 you have to
change the calculation of the capacity used by any workload
• Example of the old algorithm:
Workload A used 1.800 zIIP seconds in 1 hour
1 CORE is targeted 1.000 MIPS
used COREs = 1.800 / 3.600 = ,5
used MIPS = 1.000 * ,5 = 500
43
MT-1 equivalent time
• Example of the new algorithm if MT=2 and MAX CF is 1,25
Workload A used 1.800 zIIP seconds in 1 hour
1 CORE is targeted 1.000 MIPS
used COREs = 1.800 / ( 3.600 * 1,25 ) = 0,4
used MIPS = 1.000 * 0,4 = 400
44
Conclusions
• The introduction of SMT changed an important part of the
Mainframe terminology
• With SMT new metrics have been added which have to be
clearly understood in order to perform correct Capacity
Management activities
• Most of the currently used accounting formulas should be
reviewed especially if SMT will be extended to standard CP
47