phase 1 - creatin g a coordinat ed and reliabl e a w s ... · a lean manufacturing mental model was...

14
2016 Abstract : This case study shows how Front Five worked with an insurance client on a ‘high-stakes challenge1 to improve their AWS environment provisioning capability so that they could provision 20+ catalogued Java / Oracle AWS platform development and test environments to meet a fixed date deadline. These new environments would substitute for decommissioned on-premise and we took their AWS environment provisioning capability into a stable coordinated BAU state. As a second phase Front Five would work with the team to create a set of self service environment provisioning and environment BAU utilities implemented on a Jenkins dashboard. Working with an initially siloed Service Delivery department to understand the conflicted demands on its functional teams Front Five analysed the as-is AWS provisioning operation and explored an improved future state. The strategic analysis was done using a Systems Thinking approach, namely the creation of human and technical activity models 2 illustrating the provisioning activities.Then Goldratt’s Theory of Constraints 3 was used to identify the bottlenecks needing immediate attention. A lean manufacturing mental model was established by influencing key programme management stakeholders through the use of manufacturing metaphors when describing environment provisioning. Kanban 4 was used to operationalise a coordinated environment provisioning operation and Stafford Beer’s Viable Systems Model 5 to diagnose the provisioning operation and arrange it so that it would remain sustainable over time. In the second phase Front Five worked with the Service Delivery department to establish a cross departmental DevOps stakeholder group run as a SCRUM team. Using an emergent strategy development technique, SODA - Strategic Options Development and Analysis 6, Front Five interviewed and created analysable cognitive maps of 14 stakeholders, getting their buy in for delivery. Front Five then worked with those stakeholders to select from the maps actions that informed the direction of the SCRUM backlog. A Front Five lead then worked with Front Five engineers and the client’s team to have them successfully deliver a self service environment provisioning and BAU utility platform hosted on Jenkins. Introduction : Our insurance client, a FTSE 250 customer was completing a data centre migration programme which involved moving internal and external digital services from their active on-premise data centre to a new on-premise data centre. At the DC the client intended to decommission their on-premise solaris development and test environments and had a project in-flight to replace them with AWS environments. The client wanted to use the AWS environments as a direct substitute in order to continue the development and testing of their monolithic application. AWS was also to be used in a digital transformation stream for technology innovation - the introduction of a micro services architecture. Front Five was engaged to work with the client to ensure that the substitute AWS environments were provisioned and operational in time for the businesses ongoing development, test and release activities. It was important that these activities not be interrupted once the on-premise development and test environments were decommissioned. Operationally, there were concerns related to the efficacy of the puppet deployments and other activities used to create the AWS development and test environments. These concerns stemmed from the long environment build lead times that development teams were subject to when the clients Service Delivery department initially tried to deliver catalogued environment builds comprising of a number of java and oracle technologies.

Upload: vankien

Post on 09-Sep-2018

218 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Phase 1 - Creatin g a coordinat ed and reliabl e A W S ... · A lean manufacturing mental model was ... AWS context - much to the frustration of the development stakeholders as the

2016 Abstract : This case study shows how Front Five worked with an insurance client on a ‘high-stakes challenge‘ 1 to improve their AWS environment provisioning capability so that they could provision 20+ catalogued Java / Oracle AWS platform development and test environments to meet a fixed date deadline. These new environments would substitute for decommissioned on-premise and we took their AWS environment provisioning capability into a stable coordinated BAU state. As a second phase Front Five would work with the team to create a set of self service environment provisioning and environment BAU utilities implemented on a Jenkins dashboard. Working with an initially siloed Service Delivery department to understand the conflicted demands on its functional teams Front Five analysed the as-is AWS provisioning operation and explored an improved future state. The strategic analysis was done using a Systems Thinking approach, namely the creation of human and technical activity models 2 illustrating the provisioning activities.Then Goldratt’s Theory of Constraints 3 was used to identify the bottlenecks needing immediate attention. A lean manufacturing mental model was established by influencing key programme management stakeholders through the use of manufacturing metaphors when describing environment provisioning. Kanban 4 was used to operationalise a coordinated environment provisioning operation and Stafford Beer’s Viable Systems Model 5 to diagnose the provisioning operation and arrange it so that it would remain sustainable over time. In the second phase Front Five worked with the Service Delivery department to establish a cross departmental DevOps stakeholder group run as a SCRUM team. Using an emergent strategy development technique, SODA - Strategic Options Development and Analysis 6, Front Five interviewed and created analysable cognitive maps of 14 stakeholders, getting their buy in for delivery. Front Five then worked with those stakeholders to select from the maps actions that informed the direction of the SCRUM backlog. A Front Five lead then worked with Front Five engineers and the client’s team to have them successfully deliver a self service environment provisioning and BAU utility platform hosted on Jenkins. Introduction : Our insurance client, a FTSE 250 customer was completing a data centre migration programme which involved moving internal and external digital services from their active on-premise data centre to a new on-premise data centre. At the DC the client intended to decommission their on-premise solaris development and test environments and had a project in-flight to replace them with AWS environments. The client wanted to use the AWS environments as a direct substitute in order to continue the development and testing of their monolithic application. AWS was also to be used in a digital transformation stream for technology innovation - the introduction of a micro services architecture. Front Five was engaged to work with the client to ensure that the substitute AWS environments were provisioned and operational in time for the businesses ongoing development, test and release activities. It was important that these activities not be interrupted once the on-premise development and test environments were decommissioned. Operationally, there were concerns related to the efficacy of the puppet deployments and other activities used to create the AWS development and test environments. These concerns stemmed from the long environment build lead times that development teams were subject to when the clients Service Delivery department initially tried to deliver catalogued environment builds comprising of a number of java and oracle technologies.

Page 2: Phase 1 - Creatin g a coordinat ed and reliabl e A W S ... · A lean manufacturing mental model was ... AWS context - much to the frustration of the development stakeholders as the

Phase one of the engagement would see Front Five working across a large multi function Service Delivery department and interacting with a number of project development teams to analyse the situation, develop a strategy and then to execute a set of coherent actions 1 to substantially improve the AWS environment provisioning, BAU coordination and management capabilities. Prior to the migration of development and test environments to AWS the client’s development teams had the advantage of a number of automated provisioning and BAU self-service capabilities. As time moved on, these had become core rigidities 7 coupled to their on-premise solaris hosted deploy environments. When the migration to AWS happened much of this self-service competitive advantage became obsolete. Activities which were previously automated and self-service now needed to be manually executed within the new AWS context - much to the frustration of the development stakeholders as the fulfillment time was greatly increased! A second phase of consultancy would see Front Five using an emergent strategy to analyse the extent of the problem and then building an agile team comprised of a number of the client’s employees and contractors working alongside Front Five engineers to re-instate the self service provisioning and BAU automation capabilities. Case Study Presentation : The case study below follows the following format for both phase 1 and 2 :

- It describes the problem situation at the beginning of the phase. - It optionally illustrates the problem situation at the beginning of the phase. - It describes the creation of a methodology and the implementation of an intervention using a

resource based view 8 and describing what was changed across the clients people, processes, technology, organisational routines and management systems.

- It describes how praxis was achieved - ie how the intervening team learnt together and any significant reflections during the intervention.

- It describes the transformed situation at the end of the phase. - It includes a table which systematically lists the post transformation benefits.

Page 3: Phase 1 - Creatin g a coordinat ed and reliabl e A W S ... · A lean manufacturing mental model was ... AWS context - much to the frustration of the development stakeholders as the

Phase 1 - Creating a coordinated and reliable AWS environment provisioning operation. Problem Situation: A sub optimal AWS environment provisioning / BAU capability in a large ‘Service Delivery’ department comprising the following functional teams :

- Systems Administration (SysOps) - Web and Middleware support - Release Management - DBA - Batch - Service Operations - Networks

The department’s resources are conflicted due to four streams of concurrent work.

1. An extremely resource intensive data centre migration that must be completed by a hard date deadline.

2. The support of development activities on a number of monolithic software change sets which are expected to go into production imminently, ie. shortly after the data centre migration is complete.

3. Platform and operations incident support on live services. 4. Maturing and operationalising the AWS development and test environment provisioning capabilities.

The environment provisioning and deployment mechanism (4) is said to be unreliable with long lead times and manual fixes. Environment provisioning requests by the internal development and test customers to the fulfilling Service Delivery department is via email with the requesting customers often frustrated by the lack of a timely response, tracking comms and a coordinated operation in fulfilling their request.

Page 4: Phase 1 - Creatin g a coordinat ed and reliabl e A W S ... · A lean manufacturing mental model was ... AWS context - much to the frustration of the development stakeholders as the

Problem situation illustration :

The methodology and intervention : Our initial actions were to interview managers and engineers in the Service Delivery department to understand the ‘as is’ state of environment provisioning. We used part of a Systems Thinking approach developed by a management and Systems thinker, Peter Checkland in order to create what are referred to as ‘Purposeful Activity Models’ (Soft Systems Methodology). This analysis resulted in a number of activity models illustrating the activity steps of the department’s environment provisioning capability. The models showed us how things were currently working but also allowed us to think about how they may work in a future state. Using the models we identified the bottlenecks that needed to be managed. In managing these we chose to apply Goldratt’s Theory of Constraints looking for the bottlenecks that were the most problematic to the environment provisioning System and managed the team to tackle those first. The meta activity model model is shown below. In the original work there are similar sub-system models allowing for one to ‘drill down’ into the System to better understand its mechanism and problems. One of the key constraints was database provisioning and we worked with the SysOps and DBA teams in order to manage the creation of enabling Oracle database AMIs and reference data updates. Other bottlenecks that were identified and that needed immediate attention included inadequate puppet module versioning and a confusing puppet hiera data taxonomy.

Page 5: Phase 1 - Creatin g a coordinat ed and reliabl e A W S ... · A lean manufacturing mental model was ... AWS context - much to the frustration of the development stakeholders as the

Provisioning of built platform environments can be approached using a lean manufacturing mindset and after our initial activity model analysis and remedials our approach turned to working with the programme management team to have them see the value of a lean manufacturing mental model for environment provisioning. We used car manufacturing metaphors when in conversation with the programme manager, the Service Delivery functional teams and the development consumers. These conversations laid the groundwork for having them see the environment provisioning activities as an interconnected chain with a need for centralised co-ordination.

Page 6: Phase 1 - Creatin g a coordinat ed and reliabl e A W S ... · A lean manufacturing mental model was ... AWS context - much to the frustration of the development stakeholders as the

Once we had the lean manufacturing mental model accepted by the key stakeholders we looked at mapping the environment provisioning activities onto a Jira Kanban board. We established the department’s Release Management (RM) team as the single point of contact for environment provisioning and environment BAU service requests. Conceptualising a hub and spoke model (shown below), the Release Management (RM) team was made responsible for centralised coordination of fulfillment activities by the Service Delivery functional teams and the workflow was implemented on the Jira Kanban board. At this stage we moved the development and test customers off of email service requests and onto trackable jira service requests.

Page 7: Phase 1 - Creatin g a coordinat ed and reliabl e A W S ... · A lean manufacturing mental model was ... AWS context - much to the frustration of the development stakeholders as the

We had the challenge of conflicted fulfillment teams that were often subject to time consuming and overriding production incident workloads (such as the web support and batch teams) and we engaged with their management to carve out dedicated engineering capacity for environment provisioning. There were staffing challenges that we talked through and advised on some approaches that would create the sustainable environment provisioning channel. We integrated into the provisioning operation QA resources that were made available by the client for the project to provision the substitute development and test environments and developed a triage and assignment methodology, coordinating the resolution of provision time incidents across the functional teams of the Service Delivery group. Using a diagnostic Systems Thinking approach, Stafford Beer’s Viable Systems Model, we analysed how we might create a sustainably coordinated environment provisioning operation. We decided to use the Release Management team (RM) as a demand regulator that meets with a senior management group daily to make prioritisation decisions on the incoming environment provisioning and BAU operations requests. They then coordinate the environment provisioning operation (which includes an orchestration of fulfillment activities by many of the service delivery functional teams) and as already mentioned we operationalised this using a hub and spoke model for rapid feedback from the functional teams to the Release Management (RM) team.

Page 8: Phase 1 - Creatin g a coordinat ed and reliabl e A W S ... · A lean manufacturing mental model was ... AWS context - much to the frustration of the development stakeholders as the

We established a Kanban daily standup to manage the environment provisioning operation. Using Goldratt’s Theory of Constraints in conjunction with a lean problem management process we worked across the Service Delivery group to manage the most pressing problems affecting environment provisioning. We started provisioning the substitute environments, processing small batches at a time and having the team stop frequently to reflect on the process as we went along and then improving on or fixing problems and taking learnings forward so that problem situations were not repeated with future batches. In the process we managed to deliver the 20+ substitute environments that were required prior to the on-premise data centre hard exit date. A substantial achievement by the team! Knowledge of the new lean approach was codified and diffused across the Service Delivery group by means of an operations guide and coaching. Once we had matured our operation we did extensive workshops with the numerous development, test and other business stakeholders in order to diffuse the changed service offering across the business and onboard them into the operation. The environment provisioning operation then went into a coordinated BAU phase and we turned our attention to our next significant challenge - phase 2. Phase 2 would involve the removal of waste from the environment provisioning value stream, the engineering work required to better automate the value stream and the building of a dashboard of self service utilities that operate the automation. Post transformation benefits - Phase 1 :

1 The environment provisioning operation can now be visualised by the stakeholders in terms of the manufacturing metaphor and with the Kanban visualisation

2 We consolidated fulfilment communications off of email and onto Jira.

3 A single point of contact was established for development and test customers who no longer had to send fire and delay emails into the various functional teams within the Service Delivery operation.

4 The Service Delivery fulfillment teams are protected from direct contact by the development and test customers and can focus on getting on with the job and taking the overarching and coordinated instructions from the Release Management (RM) team.

5 Coordination of the operation has improved substantially as the Release Management (RM) team regulate the demand and coordinate using the hub and spoke model.

6 Major technical constraints impeding environment provisioning have been identified and timeously managed to resolution by using Goldratt’s Theory of Constraints.

7 Workshops have been done with development and test customers and they have been onboarded into the coordinated environment provisioning operation.

8 The environment provisioning operation is understood by senior stakeholders, is trackable and is predictable in its service levels.

Page 9: Phase 1 - Creatin g a coordinat ed and reliabl e A W S ... · A lean manufacturing mental model was ... AWS context - much to the frustration of the development stakeholders as the

Phase 2 : Automating the fulfillment parts of the provisioning operation and creating a self service portal Problem Situation: Although now coordinated and predictable, environment provisioning is still a resource intensive activity across multiple functional teams in a large Service Delivery department. The demand on the central orchestrator, the release management (RM) team is high due to the team now being designed to be the single point of contact for provisioning and BAU requests from the development and test customers. Although much of the provisioning and deployment value stream has been automated with puppet, large blocks, particularly decommissioning, provisioning and deployment orchestration, batch environment sanity and database reference data updates require manual activity. These manual activities are time consuming, at times tacit and can be error prone. In addition, many BAU service requests which had prior to the AWS switch been self service are now manual. Examples are requests from developers to publish a refreshed line of content into a development environment or requests from testers to pump test data into a test environment. Not having enabling automation across the entire provisioning value stream and the lack of a self service dashboard of utilities concentrates pressure on the coordinating release management team and frustrates the functional fulfillment teams with repetitive manual activities. The methodology and intervention : In order to avoid a siloed approach to the delivery of improvements Front Five designed a SCRUM team which would include members from across the Service Delivery departments functional teams (SysOps, Web Support, Release Management, DBA, Batch) and a member from a high profile development team. We then used an emergent strategy making technique, Edens SODA (Strategic Options Development and Analysis) to interview a wide stakeholder group that included the SCRUM team members, other lead developers and project managers. Fourteen individuals into total were interviewed in relation to two broad strategic objectives :

1. How do we reduce the environment provisioning time? 2. How do we Improve BAU operations?

This strategy making approach allowed for perspectives on how to meet the objectives to literally bubble to the surface and we captured them as individual cognitive maps, each one mapping the perspective expressed by one of the fourteen interviewees. We then used these maps in facilitatory meetings in order to select a set of directions that had emerged to meet the objectives (1,2 above). In these collaborative meetings the cognitive SODA maps were put up on a projector and this allowed for the stakeholder team to clarify their understandings and absorb the perspectives of the others in a highly structured way. The team then collectively decided and negotiated which actions the SCRUM team should focus on to meet the above objectives. The group sessions provided significant buy in by a wide DevOps stakeholder group for the delivery process that would then follow. The DevOps SCRUM team design and sample selection of the cognitive SODA interview maps that were used in the facilitated strategy making meetings are shown on the next page.

Page 10: Phase 1 - Creatin g a coordinat ed and reliabl e A W S ... · A lean manufacturing mental model was ... AWS context - much to the frustration of the development stakeholders as the

nx14

Page 11: Phase 1 - Creatin g a coordinat ed and reliabl e A W S ... · A lean manufacturing mental model was ... AWS context - much to the frustration of the development stakeholders as the

A Front Five delivery lead then worked with the product owner and SCRUM team to create the backlog for four sprints. In summary, the backlog would comprise the areas of technical work listed below and the processing of the sprints resulted in a jenkins job execution dashboard which can be used by the Service Delivery department and development teams for self provisioning and automated BAU environment management activities. Reflections and learnings by the team at SCRUM retrospectives were used to adjust approaches taken in future sprints and to consider the trajectory of the work. Technical Work :

- A dashboard of tabs within Jenkins to create an intuitive ordered view of automation jobs - Decommission jobs that

- Terminate ec2 instances on a pattern match - Delete EBS volumes on termination - Clean up a environments puppet control branches and hiera branches - Remove certificates for the deleted environment - Remove environment host entries from monit - Removes DNS entries - Removes TWS extended agents - Other decom activities

- Provisioning jobs that - Create scalable AWS EFS file storage for hosts - Creation of a puppet control branched cloned from the production version - Creation of randomised passwords for an environment and storage in a safe - The cloning of the release history of the prod database to the environment database - Use Terraform to deploy the IaaS components of the environment - Deploy, start and smoke test applications in an orchestrated way to create catalogued

development and test environments - Sanity check and initialise the deployed batch environment - Create database restore points of the environments Oracle database - Refresh reference data from the production database clone to the environment database - Other provisioning jobs

- Jenkins pipeline plugin jobs that - Connect the individual jobs into a pipeline so that decommissioning and provisioning

activities can be run end to end using a pipeline view. - BAU management jobs that

- Publish a line of content to a target development environment - Create database dump files - Pump test data into environments - Other BAU management jobs

A snapshot of the jenkins dashboard showing a selection of the jobs and an executing decommission pipeline is shown on the next page.

Page 12: Phase 1 - Creatin g a coordinat ed and reliabl e A W S ... · A lean manufacturing mental model was ... AWS context - much to the frustration of the development stakeholders as the
Page 13: Phase 1 - Creatin g a coordinat ed and reliabl e A W S ... · A lean manufacturing mental model was ... AWS context - much to the frustration of the development stakeholders as the

Post transformation benefits - Phase 2 :

1 A large variety of strategic directions for reducing the environment provisioning time and improving BAU operations (and the options/actions required to realise these objectives) have been explored by a broad DevOps stakeholder group.

2 The alternatives have been discussed and negotiated in meetings by members of the broad DevOps stakeholder group in order to clarify understandings and to agree which streams of work the SCRUM team will focus on. This process has led to participative buy in by the stakeholders for the delivery project and this includes the SCRUM team implementers.

3 A collaborative and engaged SCRUM team comprising both Development and Service Delivery members is now established and the SCRUM story backlog is being informed by the SODA maps of the 14 contributors and others stakeholders.

4 Four sprints have been completed and a jenkins job execution dashboard with backing automation has been created. This dashboard will be used by the Release Management (RM) and Development / Test groups for self service environment provisioning and automated BAU operations.

Conclusion : This Front Five case study demonstrates that seizing 9 and operationalising cloud technology as a substitute for on-premise development and test platform environments can be successful if a strategy for the reconfiguration of the service providers resources, skills, routines and management systems 8 is properly considered and then realised. In our case, developing the new capability involved a methodology that

Page 14: Phase 1 - Creatin g a coordinat ed and reliabl e A W S ... · A lean manufacturing mental model was ... AWS context - much to the frustration of the development stakeholders as the

started by analysing and understanding the service providers current provisioning activity system and then addressing and managing its most pressing constraints. Later we switched the programme managements mental model to one of manufacturing and worked with an environment provisioning team to create a regulated provisioning operation coordinated via Kanban. Provisioning the substitute platform environments in small batches we bootstrapped an experiential learning approach which was then used to address the cloud engineering problems. Later, contingent strategy was used, emergent from the first phase that allowed for the exploring of next action options. SCRUM was then used to create a cross functional delivery team that built a self-service environment provisioning automation platform and other utilities that can be used by development team customers. Patrick Hyland, Director - Strategic Delivery Front Five References : 1 Rumelt (2011) ‘Good Strategy, Bad Strategy’ 2 Checkland (1999), ‘Systems Thinking. Systems Practice’ 3 Goldratt (2004), ‘The Goal, A Process of Ongoing Improvement’ 4 Ohno (1988), ‘Toyota Production System: Beyond Large-Scale Production’ 5 Beer (1995), ‘Diagnosing the System for Organisations’ 6 Reynolds (2010), ‘Systems Approaches to Managing Change: A Practical Guide’ 7 Dorothy Leonard-Barton. “Core Capabilities and Core Rigidities: A Paradox in Managing New Product Development,” Strategic Management Journal 13 (1992): 111-125. 8 A Resource-Based View of the Firm Birger Wernerfelt Strategic Management Journal, Vol. 5, No. 2. (Apr. - Jun., 1984), pp. 171-180. 8 The Resource-Based View of the Firm: Ten Years After Birger Wernerfelt Strategic Management Journal, Vol. 16, No. 3. (Mar., 1995), pp. 171-174. 9 Teece, D.J. 2011, "Dynamic capabilities: A guide for managers", Ivey Business Journal Online, , pp. 1.