how to transform rmf & smf into availability intelligence · 2020-02-20 · 2 how to transform...

36
Welcome to today's webinar: How to Transform RMF & SMF into Availability Intelligence The presentation will begin shortly

Upload: others

Post on 02-Aug-2020

10 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: How to Transform RMF & SMF into Availability Intelligence · 2020-02-20 · 2 How to Transform RMF & SMF into Availability Intelligence Session Abstract: It is time for a new, more

Welcome to today's webinar:

How to Transform RMF & SMF into Availability Intelligence

The presentation will begin shortly

Page 2: How to Transform RMF & SMF into Availability Intelligence · 2020-02-20 · 2 How to Transform RMF & SMF into Availability Intelligence Session Abstract: It is time for a new, more

2

How to Transform RMF & SMF into Availability Intelligence

Session Abstract:

It is time for a new, more intelligent approach to interpreting the RMF & SMF data.One that provides a dramatically different result that you can easily verify on your own data.

RMF & SMF produce the world’s richest source of machine-generated data about enterprise infrastructure performance and configuration. But even the best run shops are not able to use this data to avoid incidents causing unavailability.

To outsmart unavailability, you have to automatically “crawl” through all the workload data every day at a very granular level. This data needs to be enriched and constantly evaluated against detailed expert knowledge about the infrastructure. Statistical analysis (the primary method in other new Analytics solutions) is not enough.

Using expert knowledge in this kind of process, you can see for the first time, the risk in your infrastructure to handle your peak workloads. And how that risk is changing over time. This new visibility gives you warning before your online monitors can even detect any disruption to service levels.

Page 3: How to Transform RMF & SMF into Availability Intelligence · 2020-02-20 · 2 How to Transform RMF & SMF into Availability Intelligence Session Abstract: It is time for a new, more

3

Availability on z/OS Systems

• What does the “z” stand for?

“zero downtime”

• What is your availability?

• z/OS vs. end-user experience

Page 4: How to Transform RMF & SMF into Availability Intelligence · 2020-02-20 · 2 How to Transform RMF & SMF into Availability Intelligence Session Abstract: It is time for a new, more

4

z/OS Infrastructure Areas

• Many necessary for availability:

‒ Processor, WLM Goals, etc.

‒ Channels

‒ Coupling Facility

‒ XCF

‒ FICON

‒ Disk Storage

‒ Replication / DR

‒ Tape / Virtual Tape Storage

Page 5: How to Transform RMF & SMF into Availability Intelligence · 2020-02-20 · 2 How to Transform RMF & SMF into Availability Intelligence Session Abstract: It is time for a new, more

5

Predictable

Unpredictable

Incidents Leading to Application Unavailability

Response for Unpredictable:

• Find the problem earlier

• Accelerate the problem fix

Response for Predictable:

• Avoid incident with proactive action

Page 6: How to Transform RMF & SMF into Availability Intelligence · 2020-02-20 · 2 How to Transform RMF & SMF into Availability Intelligence Session Abstract: It is time for a new, more

6

Increasing the Predictable Portion

Predictable

Unpredictable

What would be the impact on:

1. Your IT staff?2. Your Employees?3. Your Customers?

Page 7: How to Transform RMF & SMF into Availability Intelligence · 2020-02-20 · 2 How to Transform RMF & SMF into Availability Intelligence Session Abstract: It is time for a new, more

7

Seeing Threats to Continuous Availability

• Question: Which has better intelligence to avoid outages:

‒ A 20 thousand Dollar automobile; or

‒ A 20 million Dollar mainframe?

Page 8: How to Transform RMF & SMF into Availability Intelligence · 2020-02-20 · 2 How to Transform RMF & SMF into Availability Intelligence Session Abstract: It is time for a new, more

8© IntelliMagic 2014

Time

Response Time

Your existing monitors look at symptoms

here, only after users experience problems

SLA

Per

form

an

ce

IT Infrastructure Availability Monitoring Today

Easy to get, but is an effect,

not a cause

Page 9: How to Transform RMF & SMF into Availability Intelligence · 2020-02-20 · 2 How to Transform RMF & SMF into Availability Intelligence Session Abstract: It is time for a new, more

9

Availability Intelligence identifies risk here, before

response time suffers

© IntelliMagic 2014

Time

Response Time

Sub-component SaturationSL

A P

erfo

rma

nce

Monitoring with Availability Intelligence

Requires evaluating every data point

with expert domain knowledge about every component

Easy to get, but is an effect,

not a cause

Page 10: How to Transform RMF & SMF into Availability Intelligence · 2020-02-20 · 2 How to Transform RMF & SMF into Availability Intelligence Session Abstract: It is time for a new, more

10© IntelliMagic 2014

Time

Response Time Sub-component Saturation

SLA

Per

form

an

ce

Most infrastructure “fires” can be prevented by

intervening here

Changing the Outcome - Avoiding Disruptions

Page 11: How to Transform RMF & SMF into Availability Intelligence · 2020-02-20 · 2 How to Transform RMF & SMF into Availability Intelligence Session Abstract: It is time for a new, more

11

Maintaining IT Availability Today: Two States

Little

Full

Panic

Engaged

Disengaged

FocusLevel

BrainState

s

Free

Page 12: How to Transform RMF & SMF into Availability Intelligence · 2020-02-20 · 2 How to Transform RMF & SMF into Availability Intelligence Session Abstract: It is time for a new, more

12

With Availability Intelligence: A New 3rd State

Little

Full

Panic

Engaged

Disengaged

Focus Level

BrainState

Free

Page 13: How to Transform RMF & SMF into Availability Intelligence · 2020-02-20 · 2 How to Transform RMF & SMF into Availability Intelligence Session Abstract: It is time for a new, more

13

What: Foreknowledge about hidden threats to availability

Why: To better protect continuous availability at primary site by1. Avoiding incidents (make more of them predictable)2. Accelerating the resolution (reduce MTTR)

How: Use built-in expert domain knowledge in automaticanalysis of the performance and configuration data

What is Availability Intelligence?

Page 14: How to Transform RMF & SMF into Availability Intelligence · 2020-02-20 · 2 How to Transform RMF & SMF into Availability Intelligence Session Abstract: It is time for a new, more

14

• For Availability Intelligence, it is not enough to have:

‒ Easier, nicer graphs

‒ Statistical analysis (as is common with IT Operations Analytics)

• Instead, it requires:

‒ Detailed knowledge about specific hardware components in use

‒ Best practices to configure, manage infrastructure components

‒ Calculate new, meaningful metrics out of the raw data

‒ Good or Bad? How to asses and rate the risk in the infrastructure

‒ How to visualize the risk and problems in the infrastructure

Expert Knowledge & How to Use it

Page 15: How to Transform RMF & SMF into Availability Intelligence · 2020-02-20 · 2 How to Transform RMF & SMF into Availability Intelligence Session Abstract: It is time for a new, more

15

Example: Foreknowledge of Hidden Threats Inside the Storage Arrays

Storage Array Response

Times

Within Array

Between Arrays

Imbalance?

Application Workloads

Config or Failure

Changes?Disk Device

Loads

FW Bypass, etc.

Back-end,Cache

AdapterUtilization

FICON Errors

Front-end

Lag Measure:

Lead Measures:Lead Measures:

Page 16: How to Transform RMF & SMF into Availability Intelligence · 2020-02-20 · 2 How to Transform RMF & SMF into Availability Intelligence Session Abstract: It is time for a new, more

16

7. Visualize

Apply Infrastructure knowledge and expertise about

HW/SW is applied in each step

6. Recommend

Availability Intelligence

Benefits1. Avoid Incidents2. Accelerate fixes

Sample actions:• Rebalance work• Fix lost redundancy• Isolate change• Correct error • Hardware upgrade

Machine-GeneratedData

Domain Knowledge,Expertise

Availability IntelligenceAutomation

1. Collect

2. Normalize

3. Enrich

4. Assess

5. Rate

7 Key Areas to Apply Expert Knowledge to SMF/RMF

Page 17: How to Transform RMF & SMF into Availability Intelligence · 2020-02-20 · 2 How to Transform RMF & SMF into Availability Intelligence Session Abstract: It is time for a new, more

17

Automating the Application of Expert Knowledge

• Assessing risk every interval, for every device, in every data center

• Automated application of expert knowledge to the data using all 7 areas is the only way to continually execute the ITIL v3 definition Capacity Management:

– The Process responsible for ensuring that the Capacity of IT Services and the IT Infrastructure is able to deliver agreed Service Level Targets in a Cost Effective and timely manner… considers all Resources required to deliver the IT Service...

Page 18: How to Transform RMF & SMF into Availability Intelligence · 2020-02-20 · 2 How to Transform RMF & SMF into Availability Intelligence Session Abstract: It is time for a new, more

18

IntelliMagic

• Industry Leadership in “Availability Intelligence” Solutions:

‒ Provides new visibility of threats to continuous availability using built-in expert knowledge to interpret the data

• More than 20 years of solutions for deep infrastructure analysis

• Privately held, financially independent

• Customer centric, responsive

• Solutions used daily in some of the world’s largest data centers

Page 19: How to Transform RMF & SMF into Availability Intelligence · 2020-02-20 · 2 How to Transform RMF & SMF into Availability Intelligence Session Abstract: It is time for a new, more

19

1. z/OS Systems

‒ Processors, WLM, Coupling Facility, XCF, Jobs/Datasets

2. z/OS Disk

‒ Supports every Disk vendor and configuration

‒ FICON, Replication, Jobs, Datasets, Storage groups, GDPS…

3. z/OS Tape/Virtual Tape

‒ IBM TS7700, Oracle StorageTek VSM

‒ Next year: EMC DLm

IntelliMagic Vision for z/OS: 3 Modules

Page 20: How to Transform RMF & SMF into Availability Intelligence · 2020-02-20 · 2 How to Transform RMF & SMF into Availability Intelligence Session Abstract: It is time for a new, more

20

• Frequently updated hardware knowledge

• Very quick time to results (~24 hours)

• Okay for security - no PII in infrastructure measurement data

• Easy dissemination of intelligence reports

• Easy access to expert consultants

Availability Intelligence: a Good Fit for SaaS

Page 21: How to Transform RMF & SMF into Availability Intelligence · 2020-02-20 · 2 How to Transform RMF & SMF into Availability Intelligence Session Abstract: It is time for a new, more

21

Data Center Rollups of Key Risk Indicators

21© IntelliMagic 2014

Disk Storage Systems

Performance Metrics

Key Risk Indicators

Highest Rating for this Dashboard

Consolidate individual ratings on infrastructure resources into data center views to see risk across enterprise at a glance

Page 22: How to Transform RMF & SMF into Availability Intelligence · 2020-02-20 · 2 How to Transform RMF & SMF into Availability Intelligence Session Abstract: It is time for a new, more

22

Visualizing Risk to Continuous Availability

What does the data mean for your infrastructure availability?Automatic rating of key metrics according to built-in expert knowledge, to obtain intelligence about threats you can use to protect availability

No Border, No Rating Green Border, Good

Yellow Border, Early Warning

Red Border, Performance Exceptions

Page 23: How to Transform RMF & SMF into Availability Intelligence · 2020-02-20 · 2 How to Transform RMF & SMF into Availability Intelligence Session Abstract: It is time for a new, more

23

Rating the Risk using Expert Domain Knowledge

Based on straight thresholds where appropriate (like hardware limits)

Based on dynamic thresholds where the limits also depend on

workload characteristics

Page 24: How to Transform RMF & SMF into Availability Intelligence · 2020-02-20 · 2 How to Transform RMF & SMF into Availability Intelligence Session Abstract: It is time for a new, more

24

DASD Infrastructure Example: Avoiding disruption to production service levels

Page 25: How to Transform RMF & SMF into Availability Intelligence · 2020-02-20 · 2 How to Transform RMF & SMF into Availability Intelligence Session Abstract: It is time for a new, more

25

Disk Storage System Dashboard [rating: 0.49]Rating based on DSS data using DSS Thresholds

Response Time on first storage array is

rated green – no discernable problem

to end-users yet.

But a threat to availability exists in an underlying metric (back-end disk drive read response rate)

Page 26: How to Transform RMF & SMF into Availability Intelligence · 2020-02-20 · 2 How to Transform RMF & SMF into Availability Intelligence Session Abstract: It is time for a new, more

26

Response Time (ms) [rating: 0.00]Rating based on DSS data using DSS Thresholds

Response time is a lag measure

But seeing it plotted against the dynamic

thresholds (grey backgrounds) is useful

to have an idea of what can be expected

for that type of workload on that particular array configuration

Page 27: How to Transform RMF & SMF into Availability Intelligence · 2020-02-20 · 2 How to Transform RMF & SMF into Availability Intelligence Session Abstract: It is time for a new, more

27

Breakdown of Response Time Components (ms)

Breakdown of response time into its components allows identification of the largest contributors

Page 28: How to Transform RMF & SMF into Availability Intelligence · 2020-02-20 · 2 How to Transform RMF & SMF into Availability Intelligence Session Abstract: It is time for a new, more

28

Disconnect (ms) [rating: 0.00]Rating based on DSS data using DSS Thresholds

Overall, Disconnect Time is not yet out of range for this array

Page 29: How to Transform RMF & SMF into Availability Intelligence · 2020-02-20 · 2 How to Transform RMF & SMF into Availability Intelligence Session Abstract: It is time for a new, more

29

Disconnect time components (ms)

Built-in knowledge enables a further

breakdown of disconnect time into

its components

Page 30: How to Transform RMF & SMF into Availability Intelligence · 2020-02-20 · 2 How to Transform RMF & SMF into Availability Intelligence Session Abstract: It is time for a new, more

30

Drive Read Response (ms) [rating: 0.49]Rating based on DSS data using DSS Thresholds

What was identified on the exception report is a

deeper issue:

Back-end drives are starting to become

saturated.

With minimal workload growth, this will soon show up in response

time and impact production users

Page 31: How to Transform RMF & SMF into Availability Intelligence · 2020-02-20 · 2 How to Transform RMF & SMF into Availability Intelligence Session Abstract: It is time for a new, more

31

Cost Effective Remediation Example: Holistic Evaluation (CPU vs. IO)

Page 32: How to Transform RMF & SMF into Availability Intelligence · 2020-02-20 · 2 How to Transform RMF & SMF into Availability Intelligence Session Abstract: It is time for a new, more

32

Using and Delay components per Service Class(%) (top 20) for all Service Classes by Service Class

Faster job executionis required.

Question:For the select

service class(es), is it cheaper to

obtain the needed performance win

with upgraded CPU or storage?

Page 33: How to Transform RMF & SMF into Availability Intelligence · 2020-02-20 · 2 How to Transform RMF & SMF into Availability Intelligence Session Abstract: It is time for a new, more

33

Is it the time spent waiting on DASD already the

best in class, or is there room

for improvement?

0

0.5

1

1.5

2

2.5

3

3.5

4

0:30 0:45 1:00 1:15 1:30 1:45 2:00 2:15 2:30

ms

Average Response Time Components for Entire Subsystem

IOSQ Pending Connect Disconnect

Approx 65% of Time is Using/Waiting on DASD

Page 34: How to Transform RMF & SMF into Availability Intelligence · 2020-02-20 · 2 How to Transform RMF & SMF into Availability Intelligence Session Abstract: It is time for a new, more

34

Comparing Options for Run Time Improvement

CPU Using

CPU Delay

DASD Using

& Delay

TotalSeconds

Run Time savings

Before 1196 1523 3915 6634 na

1. CPU Upgrade

416 265 3915 4596 15%

2.Storage Upgrade

1196 1523 1027 3746 44%

Results of Modeling:

1. upgrading CPU to best available

vs.

2. upgrading storage to next generation

Page 35: How to Transform RMF & SMF into Availability Intelligence · 2020-02-20 · 2 How to Transform RMF & SMF into Availability Intelligence Session Abstract: It is time for a new, more

35

Availability intelligence uses expert knowledge in interpretation of the data

Offers new protection of continuous availability at the primary site to:

1. Avoid Service Disruptions

2. Accelerate Fixes

Fast and easy to prove at your site with a low commitment contract for IntelliMagic Vision as a Service

Conclusion

“Any sufficiently advanced technology

is indistinguishable from Magic”

Arthur C. Clarke, 1962

Page 36: How to Transform RMF & SMF into Availability Intelligence · 2020-02-20 · 2 How to Transform RMF & SMF into Availability Intelligence Session Abstract: It is time for a new, more

Join us in San Antonio for the 2015 CMG Conference!

Save the dates:

November 2nd to 5th at The St. Anthony in downtown San Antonio

3 blocks to both the Alamo and the Riverwalk