2016 - 10 questions you should answer before building a new microservice

49
Creating a Microservice? Answer These 10 Questions First. Brian Kelly, VP Engineering, Datawire DevOpsDays Austin, May 2nd 2016 @brikelly [email protected]

Upload: devopsdaysaustin

Post on 16-Apr-2017

143 views

Category:

Software


0 download

TRANSCRIPT

Creating a Microservice? Answer These 10 Questions First.

Brian Kelly, VP Engineering, Datawire

DevOpsDays Austin, May 2nd 2016

@brikelly [email protected]

datawire.io

Hi!

Me * Working in distributed systems most of my career * Built a number of middleware and messaging products * Strangled a SaaS monolith with microservices

Datawire * Based in Boston and San Francisco * We provide technology for companies adopting microservices * We’ve spent a lot of time with the master microservices practitioners

from high-growth technology companies

datawire.io

Microservices increase development velocity

DevOps increases release velocity

For organizations scaling rapidly, doing one without the other is…“suboptimal”

Microservices and DevOps: A Perfect Match

datawire.io 4

A microservice is deliberately simple on the inside

It’s what’s outside that’s hard

datawire.io 5

“There are only two hard problems in distributed systems:

1. Exactly-once delivery 2. Guaranteed order of messages 1. Exactly-once delivery”

@mathiasverraes

datawire.io

Force awareness in your teams of latent concerns * For example, potential future issues with scalability and reliability

It’s OK to not have sophisticated answers for each question * But asking them is important!

Why Ask These 10 Questions?

datawire.io 7

Organization

DevelopmentArchitecture

Categories

datawire.io 8

Organization

1. Have you invested enough in developer infrastructure?

datawire.io 9

Developer Infrastructure Teams

The dev infrastructure team focuses on developer education, core infrastructure, and driving standards through a great DX.

datawire.io 10

Investing in the core infrastructure necessary for independent iteration is key

Continuous delivery workflow

Loosely coupled services

Application resilience

datawire.io 11

2. How will your new service be deployed and upgraded?

Organization

datawire.io

Bake

DockerPacker

Deploy

AWSCloud FoundryDockerGCPKubernetesMesosMicroso! Azure

Build

Circle CIGo.cdJFrogJenkinsTravis

Define

Datawire QuarkFinagle / Thri!HTTP / JSONgRPC / Protobuf

Monitor

AppDynamicsDataDogInfluxDataNagiosNew RelicSignalFXSysdigWavefrontZipkin

Connect

Datawire ConnectHomegrownHystrix / RIbbonSmartStack

DevOps

Development

Buildand package the code/contract into a source

artifact

GitHub / Source

JAR, Gem, npm

AMI, Container,

VM

Microservice

Definethe contract (API, data

format, protocol)

the business logicCode

Connectthe microservice to other

microservices

Monitorthe health of the deployed

microservice

Deploythe artifact to run on the

appropriate compute resources

the application & dependencies into deploy-

able artifact

Bake

Continuous Delivery Ecosystem for Microservices

Automated DevOps workflow: Spinnaker

12

Our Model

datawire.io 13

Continuous delivery workflow

1. Workflow needs to be defined but does not need to be fully automated. Increase automation as the number of microservices grows.

2. Need to have service running in production in order to fully test.

Quickly move from commit to customer

datawire.io 14

Each upgrade is an opportunity to break the contract between your new service and any other dependent services

Plenty of techniques exist for mitigating the chance of failure: * Well-specified structural and behavioral service contracts * Dark launching for examining the effect of prod traffic without risk * Response diff’ing for ensuring contract compliance * Canary testing for progressive rollout * Blue/Green deployment for fast rollback

Upgrading your Service

datawire.io 15

3. How will it be monitored and measured?

Organization

datawire.io 16

Ways of monitoring your service’s health:

OK: * Health check from monitor to service (GET /health from an ELB)

Better: * “Call Home” health check from service to monitor (APM approach)

Best: * The client’s experience calling real APIs on the service

Monitoring and Measuring your Service

datawire.io 17

Which service is introducing the maximum latency into a request?

Which service is the root cause of a cascade failure?

Monitor the traffic, not just the services

Diagnosis

datawire.io 18

4. How will it be tested?

Development

datawire.io 19

Unit testing a single service is the easy part

What’s harder: testing the entire system

How will a developer verify that their changes to a single microservice will not break other parts of the system?

Staging environments bring a little comfort, but add significant cost, complexity, and distractions

Testing

datawire.io 20

Test before launch

Mock services Sophisticated deployment workflows Automated regression tests

Test after launch

Dark launch Canary testing Blue / green deployment

Microservice Testing Is Required on Both Sides of Deployment

Reduce probability of failure Reduce impact of failure

datawire.io 21

5. How will it be secured?

Development

datawire.io 22

Most likely type of attack vectors: * Exploitation of OWASP Top 10 vulnerabilities in your web application * Internal staff with existing access * Social engineering

Less likely type of attack vector: * Attacker gains access behind your perimeter, logs on to your containers,

reverse-engineers your internal service APIs, sends fake requests to and from each microservice

Prioritize Potential Attack Vectors

datawire.io 23

6. How will it be configured?

Development

datawire.io 24

“Configuration” can be categorized:

• Static configuration (log file locations, ports to listen on, …)

• Runtime configuration (thread pool sizes, JVM heap size, …)

• Behavioral configuration (feature flags, request routing rules, …)

Configuration

datawire.io 25

Prevent arbitrary static configuration changes to production systems * Instead, deploy those changes into new immutable, copy-on-write

containers

Strive for adaptive, elastic services that require zero dynamic configuration changes at runtime to stay healthy

Reserve behavioral configuration for progressive rollouts, dark launching, routing

Configuration

datawire.io 26

7. How will it be consumed by the rest of the system?

Architecture

datawire.io 27

Your new microservice will provide new value to the rest of the system

But will it offer an SLA for its latency, uptime, and reliability?

Those who consume it will appreciate it: • They can specify timeouts and trip circuit breakers when response latency is high • They will know which operations are idempotent • They could cache some responses for large queries • They can spot uptime SLA discrepancies

Datawire’s Quark is an IDL that captures both structure and behavior

Your microservice needs a contract

datawire.io 28

Structural vs. Behavioral Contracts

Structural: Intended for Tools

datawire.io 29

Structural vs. Behavioral Contracts

Behavioral: Intended for Humans

Structural: Intended for Tools

datawire.io 30

8. How will it be discovered?

Architecture

datawire.io 31

The simpler your discovery system, the less flexibility it offers.

DNS schemes: very simple, but don’t take into account availability, also makes the developer experience difficult

Strongly consistent datastores (e.g. Zookeeper): more flexible, but don’t handle network partitions at all

Eventually consistent datastores with pub/sub (e.g. Datawire Connect): very flexible, handles partitions well, clients and services unaffected even when they can’t reach the discovery system

Service Discovery

datawire.io 32

9. How will it scale?

Architecture

datawire.io 33

9. How will it scale?

Architecture

datawire.io 34

9. How will it fail to scale?

Architecture

datawire.io

Node

NodeNode

35

What will be the sequence of failures in the event of a large increase in traffic?

* Example sequence: First the database maxes out, then RAM, then CPU, then file descriptors, then ELBs, then NICs

Awareness of the likely failure sequence will help you be aware of your headroom and help build a plan for capacity growth

Knowing your Chokepoint Sequence

NodeNode

NodeCassandra Cassandra Cassandra

NodeHAProxy HAProxy

NodeNode

datawire.io

Node

NodeNode

36

What will be the sequence of failures in the event of a large increase in traffic?

* Example sequence: First the database maxes out, then RAM, then CPU, then file descriptors, then ELBs, then NICs

Awareness of the likely failure sequence will help you be aware of your headroom and help build a plan for capacity growth

Knowing your Chokepoint Sequence

NodeNode

NodeCassandra Cassandra Cassandra

NodeHAProxy HAProxy

NodeNode

datawire.io

NodeNode

Node

37

What will be the sequence of failures in the event of a large increase in traffic?

* Example sequence: First the database maxes out, then RAM, then CPU, then file descriptors, then ELBs, then NICs

Awareness of the likely failure sequence will help you be aware of your headroom and help build a plan for capacity growth

Knowing your Chokepoint Sequence

NodeNode

NodeCassandra Cassandra Cassandra

NodeHAProxy HAProxy

NodeNode

datawire.io

NodeNode

Node

38

What will be the sequence of failures in the event of a large increase in traffic?

* Example sequence: First the database maxes out, then RAM, then CPU, then file descriptors, then ELBs, then NICs

Awareness of the likely failure sequence will help you be aware of your headroom and help build a plan for capacity growth

Knowing your Chokepoint Sequence

NodeNode

NodeCassandra Cassandra Cassandra

NodeHAProxy HAProxy

NodeNode

datawire.io

NodeNode

Node

39

What will be the sequence of failures in the event of a large increase in traffic?

* Example sequence: First the database maxes out, then RAM, then CPU, then file descriptors, then ELBs, then NICs

Awareness of the likely failure sequence will help you be aware of your headroom and help build a plan for capacity growth

Knowing your Chokepoint Sequence

NodeNode

NodeCassandra Cassandra Cassandra

NodeHAProxy HAProxy

NodeNode NodeNode NodeNode

datawire.io

NodeNode

Node

40

What will be the sequence of failures in the event of a large increase in traffic?

* Example sequence: First the database maxes out, then RAM, then CPU, then file descriptors, then ELBs, then NICs

Awareness of the likely failure sequence will help you be aware of your headroom and help build a plan for capacity growth

Knowing your Chokepoint Sequence

NodeNode

NodeCassandra Cassandra Cassandra

NodeHAProxy HAProxy

NodeNode NodeNode NodeNode

datawire.io 41

10. How will dependency failures be handled?

Architecture

datawire.io 42

datawire.io 43

Microservice architectures are a highly distributed system by their nature

That means failures will occur, and on a frequent basis

Dependency Failures

datawire.io 44

Upstream and Downstream Dependencies

Downstream MicroservicesUpstream Microservices

Request

Response

Request

Response

datawire.io 45

Any microservice calling another must handle downstream failure, with: * Timeouts * Circuit breakers to prevent cascading failure * Backpressure * Default response values * Caching prior responses * Retries * Fallback to alternative endpoints

Don’t assume that downstream failures manifest as dead endpoints * Services get sick more often than they die!

Downstream Dependency Failure

datawire.io 46

Understand what it means for the rest of the system when (not if) your service fails

A non-critical service (e.g. a logging service invoked asynchronously over UDP) can fail without causing upstream disruption, at the expense of log data loss

A critical synchronous service (e.g. a credit card payment service invoked over RPC) will require careful use by upstream components if transactions fail mid-stream

Failing to Serve Upstream Dependencies

datawire.io 47

Demo:

Resilient services with Datawire Connect

datawire.io

It’s free and OSS!

https://github.com/datawire/datawire-connect

We work in a public Slack channel - feel free to join to ask questions about microservices in general, or about our tech (link on the GitHub page)

Watch the talks from our recent Microservices Practitioner Summit (speakers from Facebook, Netflix, Uber, Google, Yelp, New Relic…) on microservices.com

And like every other organization in here, we’re hiring!

48

Trying Datawire Connect

datawire.io 49

Thank you!

Any questions?

@brikelly [email protected]