introduction to dev ops
TRANSCRIPT
Introduction to DevOps
Len Bass
copyright 2015 Len Bass 1
Overview
• DevOps: What and why
• Architectural impact of different categories of DevOps practices
copyright 2015 Len Bass 2
DevOps: What and why
copyright 2015 Len Bass 3
Over the wall development
copyright 2015 Len Bass 4
Board has idea
Developers implement
Operators place in production
Time
Where Does the Time Go?
• As Software Engineers our view is that there are the following activities in software development– Requirements– Design– Implementation– Test
• Code Complete• Different methodologies will organize these activities in
different ways.• Agile focuses on getting to Code Complete faster than
with other methods.
5
Developers implement
copyright 2015 Len Bass
What is wrong?
• Code Complete Code in Production
• Between the completion of the code and the placing of the code into production is a step called: Deployment
• Deploying completed code can be very time consuming because of concern about errors that could occur.
6 copyright 2015 Len Bass
What is the work flow for code from a multiteam development effort
• You develop and test your code in isolation
• Your code is integrated with code developed by other teams to see if an executable can be constructed.
• The built system is tested for correctness
• The built system is tested for performance and other qualities (staging)
• The built system is placed into production
copyright 2015 Len Bass 7
What can go wrong – Integration
• Not all portions of the system are available– Portions developed by other teams– Portions developed by 3rd party– Names and signatures of methods from other
software are inconsistent
• Sequencing errors– Other teams do not follow the contract with your
code in terms of sequence of method calls
• Version incompatability– Your team assumed version A of 3rd party software but
the build downloads version B
copyright 2015 Len Bass 8
What can go wrong – integration 2
• Data problems– Database data is not refreshed for each test
– Data does not flow correctly to your code
• Configuration problems– Configuration parameter settings for code developed
by different teams is incompatible
– Configuration parameters are not specified
– External services are not reachable for security or configuration reasons
• Etc
copyright 2015 Len Bass 9
What can go wrong – staging
• Configuration problems
– External services not available because of lack of permissions
– Inconsistent configuration settings
– Leaking into production environment
• Data problems
– Database is not representative of production database
– Stale database after tests
• Etc
copyright 2015 Len Bass 10
What can go wrong – production
• Configuration problems– Requires authentication and authorization– Keys must be kept securely– Inconsistent configurations
• Performance problems– Under actual load, system may not have adequate
performance
• Logical problems– May require new version to be rolled back– Database may have been corrupted
• Etc
copyright 2015 Len Bass 11
Time is passing
• Every error must either be corrected or prevented.
• Preventing errors can be done through some combination of– Process– Architecture– Tooling– Coordination among teams.
• Coordination takes time. • Correcting errors takes time
copyright 2015 Len Bass 12
How much time?
• Historically, releases are scheduled for once a quarter or once a year to give time to coordinate and adequately test.
• This means there may be months delay before a new concept or feature is added to a system.
• This delay has become more and more unacceptable.
• Weekly or daily releases are becoming the norm.
copyright 2015 Len Bass 13
Goal of DevOps
• The goal of DevOps is to reduce the time to market without compromising quality by
– Reducing the number of errors that occur during the workflow of placing your code into production
– Reducing the time for correcting errors that occur
– Minimizing the necessity for coordination among teams
copyright 2015 Len Bass 14
DevOps is a set of practices intended to reduce the time between committing a change to a system and the change being placed into normal production, while ensuring high quality.*
• DevOps practices involve developers and operators’ processes, architectures, and tools.
• DevOps is also a movement – like agile.
*DevOps: A Software Architect’s Perspective
What is DevOps?
15
TEAR DOWN THAT WALL!!
5 Categories of DevOps Practices
1. Treat operators as first class citizens
2. Make Dev more responsible for incident handling
3. Enforce deployment practices uniformly across both dev and ops
4. Use continuous deployment
5. Develop infrastructure code using same processes as application code
16 copyright 2015 Len Bass
Overview
• DevOps: What and why
• Architectural impact of different categories of DevOps practices
copyright 2015 Len Bass 17
Treat Operators as First Class Citizens
copyright 2015 Len Bass 18
Operators will add requirements
• Type and characteristics of error messages
• Type and characteristics of logs
• Expose performance information
copyright 2015 Len Bass 19
Incident handling
copyright 2015 Len Bass 20
Goal of incident handling
• An incident is something out of the ordinary.– Failure
– Performance problem
– Abnormal activity
– Erroneous output of a system
• Goal is – Get system back on track as soon as possible
(mitigate)
– Understand root cause to prevent repetition.
copyright 2015 Len Bass 21
Normal Incident handling process
• Incident is reported to operations by– Developer (client of one of the software elements)– Customer– Internal user of software– Monitoring software
• Operator may be paged if high priority• Operations personnel have analysis tools that help
them determine probable cause and diagnosis tests• If a problem is related to developers code, then it is
escalated to the developer• Process is managed by a ticketing system.
copyright 2015 Len Bass 22
DevOps incident handling
• Incident is reported to operations or developers by
– Developer (client of one of the software elements) –reported to development team
– Customer – reported to operations
– Internal user of software – reported to operations
– Monitoring software – reported to developer or operations depending on type of alert
• Developers wear pagers to ensure fast response.
copyright 2015 Len Bass 23
Architectural Implications
• Make application level data available to amonitoring system
• Collect performance information and make it available to a monitoring system
• Have test or diagnostic mode in code so that reliability engineer can run tests specific to the incident
• Ensure error and logs contain context information.
copyright 2015 Len Bass 24
Developing infrastructure code
copyright 2015 Len Bass 25
Goal
• Reduce error rate in infrastructure code.
• Large number of errors are created during operational activities
– Upgrade
– Reconfiguration
– Race conditions
copyright 2015 Len Bass 26
Techniques
• Use software engineering principles in the development of infrastructure code– Modularization– Test driven development– Version control/configuration management
• Difficult to test infrastructure code– Hard to create environment that mimics real
environment– Many errors are caused by cloud and not by code.
• No impact from this set of practices on application architecture
copyright 2015 Len Bass 27
Continuous Deployment
copyright 2015 Len Bass 28
Goal
• Allow developers to deploy to production without the necessity for coordination
copyright 2015 Len Bass 29
Technique
• Base your system on “microservicearchitecture” style.
• Organization of material
– What is a microservice architecture?
– How does it cut down on coordination?
– What are its properties?
copyright 2015 Len Bass 30
Definition
• A microservice architecture is
– A collection of independently deployable processes
– Packaged as services
– Communicating only via messages
• It is a stripped down version of Service Oriented Architecture (SOA)
copyright 2015 Len Bass 31
~2002 Amazon instituted the following design rules - 1
• All teams will henceforth expose their data and functionality through service interfaces.
• Teams must communicate with each other through these interfaces.
• There will be no other form of inter-process communication allowed: no direct linking, no direct reads of another team’s data store, no shared-memory model, no back-doors whatsoever. The only communication allowed is via service interface calls over the network.
32
Amazon design rules - 2
• It doesn’t matter what technology they[services] use.
• All service interfaces, without exception, must be designed from the ground up to be externalizable.
• Amazon is providing the specifications for the “Microservice Architecture”.
33
In Addition
• Amazon has a “two pizza” rule.
• No team should be larger than can be fed with two pizzas (~7 members).
• Each (micro) service is the responsibility
of one team
• This means that microservices are
small and intra team bandwidth
is high
• Large systems are made up of many microservices.
• There may be as many as 140 in a typical Amazon page.
34
Services can have multiple instances
• The elasticity of the cloud will adjust the number of instances of each service to reflect the workload.
• Requests are routed through a load balancer for each service
• This leads to
– Lots of load balancers
– Overhead for each request.
copyright 2015 Len Bass 35
Micro service architecture
36
Service• Each user request is satisfied
by some sequence of services.
• Most services are not externally available.
• Each service communicates with other services through service interfaces.
• Service depth may
– Shallow (large fan out)
– Deep (small fan out, more dependent services)
How does microservice architecture reduce requirements for coordination?
• Coordination decisions can be made
– incrementally as system evolves or
– be built into the architecture.
• Microservice architecture builds most coordination decisions into architecture
• Consequently they only need to be made once for a system, not once per release.
copyright 2015 Len Bass 37
Seven Decision Categories
• Architectures can be categorized by means of seven categories1. Allocation of functionality
2. Coordination model
3. Data model
4. Management of resources
5. Mapping among architectural elements
6. Binding time decisions
7. Technology choices
copyright 2015 Len Bass 38
Design decisions made or delegated by choice of microservice architecture
• Microservice architecture either specifies or delegates to the development team five out of the seven categories of design decisions.1. Allocation of responsibilities. 2. Coordination model. 3. Data model. 4. Management of resources. 5. Mapping among architectural elements. 6. Binding time decisions. 7. Choice of technology
39
Roadmap for next several slides
• Micro service oriented architectural style will either specify or allow delegation of five different categories of design decisions.
• Each decision category will be discussed separately.
40
Decision 1 – allocation of responsibilities
• This decision is not delegated to the team or specified.
• Development teams must coordinate to divide responsibilities for features that are to be added.
• Typically this happens at the beginning of each iteration cycle.
41
Decision 2 - coordination model
• Elements of service interaction
– Services communicate asynchronously through message passing
– Each service could (in principle) be deployed anywhere on the net.
• Latency requirements will probably force particular deployment location choices.
• Services must discover location of dependent services.
– State must be managed
42
State management
• Services can be stateless or stateful
– Stateless services
• Allow arbitrary creation of new instances for performance and availability
• Allow messages to be routed to any instance
• State must be provided to stateless services
– Stateful services
• Require clients to communicate with same instance
• Reduces overhead necessary to acquire state
43
Where to keep the state?
• Persistent state is kept in a database– Modern database management systems (relational)
provide replication functionality– Some NoSQL systems may be replicated. Others will
require manual replication.
• Transient small amounts of state can be kept consistent across instances by using tools such as Memcached or Zookeeper. This is a mechanism for making a statefulservice stateless.
• Instances may cache state for performance reasons. It may be necessary to purge the cache before bringing down an instance.
44
Decision 3 – Data model
• Schema based database system (relational). Requires coordination.– Development teams must coordinate when schema is
defined or modified.– Schema definition happens once when the
architecture is defined. Schema modification should be rare occurrence. Schema extensions (new fields or tables) do not cause problems.
• NoSQL systems. Will still require coordination over semantics of data.– Data written by one service is typically read by others,
they must agree on semantics.
45
Decision 4 – Resource Management
• Each instance of a service can process a certain workload.– Could be expressed in terms of requests– Could be expressed in terms of resource requirements
– e.g. CPU
• Each client instance will require resources from the service to process its requests.
• Service Level Agreements (SLAs) are a means for automating the resource assumptions of the clients and the resource requirements of the service.
46
Decision 5 – Mapping among architectural elements
• Decisions about packaging modules into processes and processes into a service are delegated to the service development team.
• Decisions about deployment of a service will be discussed later.
47
Decision 6 – Binding time
• Configuration information binding time is decided during the development of architecture and the deployment pipeline.
• Other binding time decisions are delegated to the service development team.
48
Decisions 7 – Technology choices
• All technology choices are delegated to the service development team.
49
Quality Analysis of MicroserviceArchitecture
• Deployability
• Availability
• Reusability
• Security
• Modifiability
• Performance
copyright 2015 Len Bass 50
Deployability
• The microservice architecture style is designed to make it easy to deploy by reducing the requirement for coordination.
• There may be dependencies among the services or their versions.
copyright 2015 Len Bass 51
Availability
• If an instance fails, another instance will be created through elasticity.– Stateless instances need no additional
mechanisms
– Stateful instances can keep copy of state in Memcached or Zookeeper. Need to ensure that failure of a single instance does not delete state maintained in Memcached or Zookeeper
• Clients must have rapid timeout and reissue requests that fail.
copyright 2015 Len Bass 52
Reusability
• Small grained reuse
– Teams are independent and do not coordinate or share code. Small grained reuse does not happen using a microservice architecture.
• Large grained reuse
– Large grained reuse is embodied in the architecture and is treated as a service.
copyright 2015 Len Bass 53
Security
• Security tokens (a la Kerberos) can be passedfrom client to service.
• Tokens contain information about access privileges.
copyright 2015 Len Bass 54
Modifiability
• Microservice architecture is modular and coordination mechanisms prevent side effects from a change to one service from affecting another
• Special provisions affect evolution of services.
• Managing all of the services and understanding what each service does is complicated because of the proliferation of services.
copyright 2015 Len Bass 55
Performance
• The main performance issue is message traffic.
• Each user request may involve many messages.
• Monitoring of services will also add to message traffic.
• Microservice architecture is not designed for high transaction volume because of the amount of message traffic.
copyright 2015 Len Bass 56
Enforce Deployment Process
copyright 2015 Len Bass 57
What problem is being attacked?
• When application code is deployed, it goes through several verification steps with gates at each step
• This is not necessarily so with code deployed by operators.
• Security patches, for example, may be deployed directly.
• Opertors may SSH into a VM to perform some action.
• The goal of enforcing the deployment process on operators is to reduce errors caused by incorrect operator actions on VMs.
copyright 2015 Len Bass 58
Traceability
• For every portion of an executing system, it should be possible to know– What version of what components are included in the
executing code– What version of the configuration parameters was
used in invoking the system– What version of what script was used to create the
system– What version of what script was used to create the
environment in which the system is executing
• Without this knowledge it becomes difficult to determine root causes of errors
copyright 2015 Len Bass 59
Architectural implications
• As a portion of initialization, a system should
– verify that it was deployed using a deployment tool
– Record in a log file its pedigree and the pedigree of the configuration parameters, creation script, and environment.
copyright 2015 Len Bass 60
Summary
• DevOps is a movement driven by the need to reduce time to market
• It involves a variety of different practices each of which has its own architectural implications
• Continuous deployment can be done using a microservice architecture but the movement to a microservice architecture has multiple dimensions
copyright 2015 Len Bass 61
More InformationContact [email protected]
DevOps: A Software Architect’s Perspective is available from your favorite bookseller
62