managing performance with apm acceptance criteria

31
Managing Performance with APM Acceptance Criteria Michael Sydor DOX08S #CAWorld CA Technologies Service Assurance DevOps

Upload: ca-technologies

Post on 13-Jul-2015

318 views

Category:

Technology


0 download

TRANSCRIPT

Managing Performance with APM Acceptance Criteria

Michael Sydor

DOX08S #CAWorld

CA Technologies Service Assurance

DevOps

2 © 2014 CA. ALL RIGHTS RESERVED.

Abstract

Reliable processes for identifying appropriate metrics and validating these metrics via collaboration with the various stakeholders across the application lifecycle remain unaddressed. APM visibility exposes these metrics and directly supports their aggregation as transaction profiles and performance baselines. This establishes a framework for reliable acceptance criteria and also defines the roles and responsibilities through which the stakeholders can collaborate to both validate the monitoring configuration as well as align with business objectives.

Full paper in CATX 2014:: Beyond Deployment Automation - Realizing Dev/Ops Metrics and Collaboration through APM Visibility

Michael Sydor

CA Technologies

Sr. Engineering Services Architect

3 © 2014 CA. ALL RIGHTS RESERVED.

Agenda

THE CHALLENGE FOR DEVOPS INITIATIVE

NON-FUNCTIONAL REQUIREMENTS AND KPIS

ACCEPTANCE CRITERIA LIFECYCLE

CLOSING POINTS

1

2

3

4

4 © 2014 CA. ALL RIGHTS RESERVED.

The Challenge for DevOps: Simply deploying faster can make things worse!

Dev -> QA -> UAT -> Prod

Quarterly

Weekly

Daily

50

140 total

300+ total

5 © 2014 CA. ALL RIGHTS RESERVED.

We Know What Happens From Agile Sprints Issues encountered and resolved while new functionality is introduced.

6 © 2014 CA. ALL RIGHTS RESERVED.

Where are Performance Problems Identified?

Increasing performance testing maturity

Performance test established

UAT established

Production

Pre-production

QA

Performance Visibility introduced

FireFighting practice

NFRs and KPIs Non-functional requirements and key performance indicators

Putting teeth into a DevOps initiative

8 © 2014 CA. ALL RIGHTS RESERVED.

Who Cares About NFRs?

1 - WebSphere Commerce V5.4 Handbook: Architecture and Integration Guide

2 - Mastering the Requirements Process - Getting Requirements Right

REQUIREMENTS WEBSPHERE1

REQUIREMENTS MASTERING REQUIREMENTS2

APM VISIBILITY

General business understanding and objectives

Cultural, look-and-feel, usability and humanity, legal

Applications used in the solution Operational, environmental, maintainability, support

Security Security

Performance Performance

Capacity planning

Scalability

Availability

Testing

Customer (end-user) metrics

9 © 2014 CA. ALL RIGHTS RESERVED.

Baselines Configuration – Do we have a valid monitoring configuration?

Application – Do we have visibility into the key transactions?

Performance – Can we identify KPIs for availability, performance and capacity?

KPIs Suspect – significant because of frequency of execution

Validated – known to correlate with performance issues

Some Terms

10 © 2014 CA. ALL RIGHTS RESERVED.

Lifecycle Visibility Achieved

Pote

nti

al m

etri

cs

KP

Is

Unit test Functional test Stress test UAT Production Triage

10,000 5,000

3,000

1,500 2,500 2,500

20 30 50 10 30 35

60 40 55 45

15

Pote

nti

al

met

rics

K

PIs

Unit test Functional

test Stress test UAT Production Triage

5,500 4,500

40 100

35 75

Suspect KPI Validated KPI

Production-only visibility

11 © 2014 CA. ALL RIGHTS RESERVED.

KPI Management Maturity D

iag

no

stic

va

lue

KPI maturity

(Platform) (Application) (Transaction)

SGCM Stalls,

GC settings,

Concurrency,

Memory management trends

APC Availability,

Performance,

Capacity

EKB Errors,

Key resource performance,

Business transaction survey

12 © 2014 CA. ALL RIGHTS RESERVED.

KPI Evolution

PLATFORM Coarse information ..but not really APM

Application, transactions, resources The APM Advantage

GOOD BETTER (ADDITIONAL) BEST (ADDITIONAL)

Stalls Availability – connected status Errors

GC settings Availability – metric count Key resource performance

Concurrency Suspect performance Business transaction survey

Memory management (graph) Suspect capacity

How to Find KPIs

14 © 2014 CA. ALL RIGHTS RESERVED.

Capacity KPIs – “Tree Rings”

15 © 2014 CA. ALL RIGHTS RESERVED.

Performance KPI – Trap

16 © 2014 CA. ALL RIGHTS RESERVED.

Performance KPI – Volume Adjusted

17 © 2014 CA. ALL RIGHTS RESERVED.

Performance KPIs – Summary

High volume +

significant response time

18 © 2014 CA. ALL RIGHTS RESERVED.

Validation of KPIs

90 minutes before 30 minutes after

Incident confirmed

2 hour window uncorrelated

degraded

correlated

19 © 2014 CA. ALL RIGHTS RESERVED.

Baseline – Reporting

Acceptance Criteria Lifecycle

21 © 2014 CA. ALL RIGHTS RESERVED.

Baselines

None

Smoke test

Configuration Application

(transactions)

Performance

Often leads to a QA practice – Functional

Ineffective

No test Smoke test Use case test Performance/Stress test Load-to-failure

Capacity forecast

Often leads to a performance practice

22 © 2014 CA. ALL RIGHTS RESERVED.

Baselines – Summary

Foundation for any significant benefit from APM

You need to establish ‘normal’ before you can consistently triage. Or you need very capable staff and a LOT of experience.

You need to report on what is significant, not simply provide hundreds of metrics and “... Just go figure it out!”

Absence of baselines will reinforce a “why bother with QA” and “test-in-production” mentality.

Danger signs Focus on availability but no performance or capacity interest. Lots of metrics, metric groupings and dashboards but no report templates. You still can’t triage production incidents effectively.

23 © 2014 CA. ALL RIGHTS RESERVED.

Acceptance Criteria – KPIs

None

Package assembly

Stalls Errors Often leads to a QA practice – Functional

Ineffective

No test Smoke test Use case test Performance/Stress test Load-to-failure

Often leads to a performance practice

Memory profile Concurrency Response

time

24 © 2014 CA. ALL RIGHTS RESERVED.

Acceptance Criteria – Summary

Foundation for any pre-production review

You will need to ‘phase-in’ acceptance criteria. App server configuration tuning

Performance advisory

“We saw __X__. It is a potential concern and we will confirm in production.”

Performance exception

“We saw __Y__. It is a problem and you need signoff to continue to production.”

Performance requirement

“We sax __Z__. You cannot continue to production.”

Danger signs Lots of criteria but no process for remediation prior to production or confirmation in production.

25 © 2014 CA. ALL RIGHTS RESERVED.

Configuration baseline

Performance baseline

Application baseline

NFRs

FRs

Use cases

Compatible APM configurations

Suspect KPIs

Security

Scalability

Capacity plan Stress test

Certification

Hierarchal dashboards

Baseline report Management

module

Pre-production Checklist

Overhead absent Excess metrics absent Suspect KPIs identified Availability alert defined Acceptance criteria evaluated (performance) Saturation alert defined (scalability) Capacity alert defined Failover capability assessed Security certification Overview Architecture/operations Triage view Resource view Visibility assessed (transaction trace completeness) Business transaction definition

26 © 2014 CA. ALL RIGHTS RESERVED.

Pre-production Checklist

Overhead validated Excess metrics absent validated Suspect KPIs validated Availability alert validated Acceptance criteria validated Saturation alert validated Capacity alert validated Failover capability validated Security certification Overview validated Architecture/operations validated Triage view validated Resource view validated Visibility validated (transaction trace completeness) Business transaction definition validated

Pre-production review

Operational period

Incident

Triage and root-cause

Post-production review

Validation

Application audit

Closing Points

28 © 2014 CA. ALL RIGHTS RESERVED.

Resources

Community site

Cookbook: APM HealthCheck

Understanding which metrics matter (KPI discussion)

Cookbook: Application audit

More details on the baseline techniques and process

Blog entries

Redefine triage by learning the golden nuggets of APM...

What are KPIs and how can I get some quick?!

Big Data – What does it mean for APM????

Why does ABA find anomalies when there is nothing wrong in production?

APM best practices – Realizing Application Performance Management

available on Amazon.com and Apress.com

Baselines, test plans, app audits, triage, firefighting

Organizational models, service catalogs

29 © 2014 CA. ALL RIGHTS RESERVED.

Summary A Few Words to Review

Key topics

You cannot expect to deploy quicker to get better app quality.

APM gives you visibility into NFRs and KPIs.

Acceptance criteria is how you will harness DevOps deployment acceleration.

Findings

APM documents NFRs and KPIs.

Acceptance criteria pre-production allows for true proactive management of the app lifecycle.

Experiences

Agile techniques show what will happen without viable acceptance criteria.

KPIs are easy to find and manage via baselines.

Baselines make reporting and triage more effective.

30 © 2014 CA. ALL RIGHTS RESERVED.

For More Information

To learn more about DevOps, please visit:

http://bit.ly/1wbjjqX

Insert appropriate screenshot and text overlay from following “More Info Graphics” slide here;

ensure it links to correct page DevOps

31 © 2014 CA. ALL RIGHTS RESERVED.

For Informational Purposes Only

© 2014 CA. All rights reserved. All trademarks referenced herein belong to their respective companies.

This presentation provided at CA World 2014 is intended for information purposes only and does not form any type of warranty. Some of the specific slides with customer references relate to customer's specific use and experience of CA products and solutions so actual results may vary.

Terms of this Presentation