production assurance - cdm media · reactive: intervention here at risk of dealing with the...
TRANSCRIPT
© 2013CPT Global Limited
Independent Experience
CIO Forum
Savoy Hotel 5th December 2013 CPT Global – Production Assurance
Alan Sloan – General Manager, CPT Global (Europe)
Alan Mackenzie – CTO, CPT Global
December 2013 CPT Production Assurance
About CPT Global
2
• Melbourne
• Sydney
• Canberra
• London
• Munich
• Paris
• New York
• Toronto
• Singapore
• Sao Paulo
Founded in Melbourne 1993, listed 2000 (ASX:CGO)
Over 170 specialist consultants
Annual turnover $AUD40million +
Operations in five continents
Australia since 1993
Europe since 1998
North America since 2002
South America since 2012
Asia since 2012
What do we do?
CPT‟s Services are based around our core disciplines of
Capacity Planning
Performance Management
Testing Services
MIT
Our key services have been created to provide
IT Cost Reductions and efficiency programs
Production Assurance
Test Effectiveness and Management
Independent client-side consulting
3
December 2013 CPT Production Assurance
This session and CPT
This session will focus on Production Assurance, it‟s importance
in IT Estates and what we have learned
Our technical consultants are IT fire-fighters specialising in
resolution of performance and stability issues, often when
Problem has reached crisis point
Internal and vendor efforts to resolve have failed
BUT, preferably as early as possible to ensure pro-activeness
We consequently get exposed to diverse
Problems
Clients and environments
Technology combinations
4
December 2013 CPT Production Assurance
Production Assurance Background
5
December 2013 CPT Production Assurance
Over 20 years as a provider of specialist technical services, CPT has worked
with a broad number of clients all trying to address common problems and
concerns:
Will my application actually deliver to the expectations of the business
stakeholders ?
Why is my application now not meeting SLAs ?
Will my application scale effectively for business growth ?
The cost of my application / system is escalating
CPT has been involved in a number of situations where the application is live
and not performing, or undergoing development/testing where concerns have
been raised leading to our involvement as an organisation to solve the
problems.
Production Assurance - Cause
6
December 2013 CPT Production Assurance
Typically, we find problems are typically caused by one or more of the following:
Business Pressure resulting in
A rush to get new applications and changes in to production
Minimal or no design considerations for performance and scalability
Limited understanding of the characteristics of underpinning technologies
Inadequate Non-Functional Testing (Stress / Performance etc.)
Poor capacity planning and modelling
Poor communication between groups
Conflicts internally and externally
SILOS of activity
Can be particularly prevalent with out-sourcing either application development or infrastructure services
The Following slides cover off the typical methods for dealing with Production Assurance
Assurance Framework: Benefits v Cost
7
December 2013
Reactive: Intervention
here at risk of dealing
with the symptoms and
not the root cause(s).
Analysis Design Code Test Implement
$ Cost
Risk
The later in the SDLC an Assurance Framework is implemented, the greater the
investment required and the less sustainable will be the improvements.
Bad practices earlier within the SDLC will remain untreated and will continue.
Whilst improvements in production will be more immediate, the impact of assurance
intervention activities will be more intensive which will be likely to slow down the
delivery process through increased prevention techniques.
The earlier in the SDLC where an Assurance
Framework is implemented, the more
sustainable the framework becomes.
Whilst the ROI may be longer, potential risks can
be identified and mitigated earlier.
Reactive: Service Restoration
Proactive: Service Assurance
Predictive: Risk Avoidance
Without an assurance framework, the cost of mitigating and remedying risks increases progressively in the SDLC.
Proactive: Intervention here at
established quality gates will
begin to identify risks and
mitigation approaches earlier
in the SDLC.
Predictive: A mature assurance framework underpinned by established quality gates inserted into the SDLC with
accompanying processes. The objective: identify, assess the nature and determine the size/impact of the risk as
early as possible in the SDLC. Early enough to be able to implement mitigation solutions prior to implementation.
CPT Production Assurance
Situation and response
8
December 2013 CPT Production Assurance
Firefighting – Reactive
Reactive Preventative
This is often where CPT gets involved
Performance Risk
Sta
bilit
y R
isk
9
December 2013 CPT Production Assurance
Reactive
Reactive
Preventative
Situation and response
Performance Risk
Sta
bilit
y R
isk
Firefighting
10
December 2013 CPT Production Assurance
Situation and response
Performance Risk
Sta
bilit
y R
isk
Reactive Firefighting
Reactive Preventative
Fire-Fighting: Service Restoration
Situation:
Re-Active: Restore a production service back to acceptable levels of performance, stability or
cost.
Approach:
Assess – Fact based analysis – seeking to define the exact nature of the “problem” as experienced
by the users
Analysis of the service/application current state and future state goals including:
The technical environment
Hardware configuration, System software, Database configuration
System utilisation metrics covering CPU, Memory, Storage, Network, Database utilisation and activity metrics
Requires access to system, middleware, application logs etc
Plan - Intensive triage and diagnosis activities leading to proposed solutions, recommendations
and prioritisation of each issue found.
Implement – implement recommendations through the change management framework.
Review - Post implementation measurement (% improvement) and review.
Typically an iterative process involving testing and measurement activities in a non-production
environment to mitigate risks by proving the benefit of proposed changes.
11
June 2013 CPT Production Assurance
Fire-Fighting: Service Restoration
Short-Term Desired Outcomes:
Recommendations identified to address the observed “problems”.
Implementation of “quick win” recommendations through to production, within
the governance and change management framework balanced against the impact
of the problems and the nature and supporting argument for the proposed
change.
Medium-Term Desired Outcomes
A report documenting a series of tactical and strategic recommendations that
will resolve the current issues and prevent other problems occurring in the
future. The recommendations should be categorised by priority, benefit and
ease of implementation.
Ensure monitoring and gating processes are in place for future growth and
change, and where applicable, a continuous improvement program is included.
12
June 2013 CPT Production Assurance
Examples of Fire-Fighting - Reactive
Upgrading infrastructure produced widely varying response times – batch
run from 40 hours to 147 hours risking missing end of year processing
deadlines.
Oracle bug
Recode SQL
Rogue application taking 4 hours to perform data extract was reduced to
less that 4 minutes
Java Application design
A new Settlements application was not going to scale effectively. Needed
complete re-design of 75% of the application which effectively provided a
6-fold improvement in throughput, whilst reducing the Hardware
requirements.
13
December 2013 CPT Production Assurance
Examples of Fire-Fighting - Reactive
Batch over-runs in a funds trading system of a major European Bank.
Application had been ported from the IBM Mainframe to Unix / Oracle and
Service Provider said there was nothing that could be done except to buy
more Hardware or re-write the application. CPT were asked to review try
and resolve issues;
Changes recommended to the Oracle Dbase to prevent death by random I/O
Reconfigured Back-up processing
Changes made to memory allocation between batch and online processing to
exploit the free memory overnight and avoid I/Os
Some minor SQL changes also recommended
Batch times almost halved, ensuing SLAs could be met consistently and a
level of contingency provided, existing resources better utilised and
hardware upgrade avoided
14
December 2013 CPT Production Assurance
Pro-Active: Service Assurance
Situation:
Pro-Active: Part of the overall design/development/testing/tuning lifecycle to
ensure that the application will perform as anticipated once it is implemented
into production.
Approach:
Insertion of key resources into capability areas where deficiencies may exist. In
past experiences, CPT has typically found expertise gaps across:
Test Environment Management
Test Management
Performance Engineering and Performance Tuning
Stress, Volume and Performance Testing
Capacity Planning
Governance
15
June 2013 CPT Production Assurance
Pro-Active: Service Assurance
16
June 2013 CPT Production Assurance
Outcomes:
A combination of services provided and specific outcomes and artefacts
that will attest that the system will perform as expected on the
production infrastructure, supporting the anticipated volume of users
and transactions, with all known risks documented.
The services and outcomes/artefacts can encompass all or some of the
following dependant upon the system requirements:
Test Management: Management, co-ordination and oversight, Test Strategy, Test Planning,
Stakeholder Reporting
Test Environment Management: Environment Strategy, Test Data Strategy, Environment
Build services, Environment Co-ordination, Environment Technical Support
Performance engineering and Performance tuning
Stress, Volume and Performance Testing: Strategy, Planning, Scripting, Data Preparation,
Execution, Tuning and Reporting
Capacity Planning: Capacity Plan / Infrastructure Capacity Assessment
Predictive: Risk Avoidance
Situation:
Pre-Emptive: Undertake business and technical focused risk assessments
of the applications and underlying infrastructure.
Approach:
Ensure application and infrastructure designs are signed off for
Performance, Scalability and Stability during the initial planning stages,
as well as reviewing key components during development
Expertise must include at a minimum the following capabilities:
Capacity Planning
Architecture
Performance Tuning
Performance, Stress and Volume Testing
17
June 2013 CPT Production Assurance
Predictive: Risk Avoidance
Outcomes:
A performance sign-off for each step of the SDLC.
Typically a Capacity Plan / Infrastructure Capacity Assessment dependant upon
the application requirements. input to the report would leverage from current
and planned production users / transaction volumes combined with Stress,
Volume and Performance Testing outcomes.
Pro-active identification of infrastructure components at risk and the required
capacity and timeframes by which the additional capacity should be provisioned.
18
December 2013 CPT Production Assurance
Situation and response
19
December 2013 CPT Production Assurance
Service restoration
Application
Assessment
& Tuning
Monitoring & Design/
Change considerations
Availability
Management
& Planning
Performance Risk
Sta
bilit
y R
isk
Case Study – Fire-Fighting to Pre-emptive
Background – Major Telco in North America
Introducing Sales and Service Portal SSP planned to replace all existing
portals for shops, booths, home customers covering wireless, cable, etc.
5 Pilot users in a couple of stores using new product, but not under an stress
Feedback on user experience was ridiculous with dreadful response times –
this was supposed to scale initially to 10,000+ users
Management confidence very low
Two compelling events
iPhone 5C launch - September 2013 – high demand for on line activation
Back to school 2013 - „Cable Christmas‟ – high demand for cable as mums and
dads pay for college students cable in their new student accommodation
20
December 2013 CPT Production Assurance
Working backwards over a period of 6 months
Identify and fix immediate performance problems and get pilot to run better
Put in place SLAs to ensure future success criteria were established
First increase in confidence levels
Root cause analysis of performance problems
Performance testing leading to: -
Capacity planning and modelling to assess scalability to 10,000 users
Recommendations
Applications design
Infrastructure improvements
Intensive monitoring of SSP through a „war room‟ over „Back to School 2013‟. SSP processed highest volumes ever
iPhone Launch – similar approach – successful, uninterrupted launch
Embedded a coherent approach which encompassed on going performance tuning, capacity planning, embargo planning, incident management, production monitoring, command centre management
21
December 2013 CPT Production Assurance
Case Study – Fire-Fighting to Pro-Active
Flow-on Effects from Production Assurance
22
December 2013 CPT Production Assurance
Some of the flow on effects for moving towards a more pro-active Production Assurance include
Cost Reductions through immediate and ongoing consumption reduction
Depending on commercial structures in place – In-Sourced / Out-Sourced, MLC etc.
Cost Avoidance through reducing or negating the need for HW Upgrades
Using less consumption will ensure a smaller footprint required as the application is rolled out or business volumes increase etc.
Improved Scalability costs
Faster Testing cycles
A better managed Test environments will mean less hardware and faster testing turnaround.
Better focused testing and and analysis of results
Improved relations with business sponsors
SLAs clearly defined, and monitored to ensure they are being met, and if they are at risk, pro-actively tackling the problem before it impacts the business.
Questions ?
23
December 2013 CPT Production Assurance
IT Management Consulting: ICT Strategic Planning (O)
ITIL Implementation (S, O)
IT Effectiveness and Change (O)
Business Measurements of IT Services (O)
Selective Sourcing Strategies (O)
Enterprise Architecture (S, O)
Program and Project Management (S)
Program and Project Lifecycle Reviews (S, O)
Business Case Development and Reviews (S)
Development of Business Requirements for Market Testing (S)
Package Assessments and Evaluations (S)
Troubled Project Reviews and Remediation (S, O)
Production Assurance – CPT Service Lines
24
December 2013 CPT Production Assurance
Capacity & Performance Services:
Capacity Planning
Capability Reviews (S, O)
Frameworks and roadmaps (S, O)
Reviews and assessments (T)
Plans: Application / Infrastructure /
Enterprise (T)
Modelling and forecasting (T)
As a service / on demand / hybrid (T, S, O)
“Cost of Running” modelling and reporting (T)
Performance Management (T, S, O)
Performance Tuning Services
Application /infrastructure specific (T)
Corporate wide programmes (S, O)
Performance Diagnosis (T)
Technical Support Resources (T)
Database Administrators
Systems Programmers
Systems Administrators
Analyst/Programmers
Architecture Services (T, S)
Technical Architecture and Design Reviews
Technical Test Environment Support and Build
Resources (T, S)
Testing Services:
Capability and Effectiveness
Testing Capability and Maturity
Reviews (S, O)
Testing Lifecycle Effectiveness (S)
Test Strategies (S)
Service Validation and Testing (S)
Test Data Extraction, Reduction
and Masking (S)
Testing Tools and Techniques (S)
Test Coverage (S)
Test Reporting and Metrics (S)
Operational
Test Management (T)
Stress and Volume/Performance
Testing (T)
Non-functional Testing (T)
Specialist Test Execution (T)
Test Automation Techniques,
Strategies and Execution (T, S)
Test Environment Management
and Optimisation (T, S)
Test Data Extraction, Reduction
and Masking (T, S)
Production Assurance Services Legend: T = Tactical, S = Strategic, O = Organisational
About CPT Global
CPT Global is a specialised consultancy with two focus areas. Its Technical Consulting services enhance the
control, quality, stability, efficiency and reliability of all technology platforms with core offerings of
Capacity Planning, Performance Management and Testing. Its Management Consulting services review and
improve the business processes associated with Information Technology, with offerings that include Program
and Project Management, IT Governance Reviews, Strategic Sourcing Strategies and Technology Transition
Planning.
AUSTRALASIA
Melbourne
Level 1, 4 Riverside Quay
Southbank VIC 3006
Telephone +61 3 9684 7900
Facsimile +61 3 9684 7999
Sydney
Suite 3, Level 5, 80 Clarence St
Sydney, NSW 2000
Telephone +61 2 8234 7400
Facsimile +61 2 8234 7499
Canberra
Level 4, 161 London Circuit
Canberra City ACT 2601
Telephone +61 2 6206 9700
Facsimile +61 2 6206 9799
Singapore
10 Anson Road, #32/15
International Plaza,
Singapore 079903
Telephone: +65 6226 2555
AMERICAS
New York
410 Park Avenue, 15th Floor,
New York NY 10022
Telephone +1 917 210 8668
Facsimile +1 917 210 8182
Toronto
100 King Street West, Suite 3700
Toronto, Ontario M5X 1C9
Telephone +1 416 642 2886
Facsimile +1 416 644 8801
Sao Paulo
Al. Europa 1206 – Santana de
Parnaiba - SP – CEP 06543-325
Telephone: +55 11 8454-0869
EUROPE
London
Parkshot House, 5 Kew Road
Richmond, Surrey, TW9 2PR
Telephone +44 20 8334 8085
Facsimile +44 20 8334 8541
Munich
Landsberger Str. 302
D-80687 Munich
Telephone +49 89 9040 5955
Facsimile +49 89 9040 5965
Paris
140 bis rue de Rennes
75006 Paris
Telephone +33 1 70 38 23 21
Facsimile +33 1 70 38 23 00
www.cptglobal.com
25
December 2013 CPT Production Assurance