resilience? never heard of it · circuit breaker • probably most often cited resilience pattern...
TRANSCRIPT
![Page 1: Resilience? Never heard of it · Circuit Breaker • Probably most often cited resilience pattern • Extension of the timeout pattern • Takes downstream unit offline if calls fail](https://reader035.vdocuments.net/reader035/viewer/2022071213/6041851a660bc722276db0d8/html5/thumbnails/1.jpg)
Patterns of Resilience How to build robust, scalable & responsive systems
Uwe Friedrichsen (codecentric AG) – GOTO Night Amsterdam – 18. May 2015
![Page 2: Resilience? Never heard of it · Circuit Breaker • Probably most often cited resilience pattern • Extension of the timeout pattern • Takes downstream unit offline if calls fail](https://reader035.vdocuments.net/reader035/viewer/2022071213/6041851a660bc722276db0d8/html5/thumbnails/2.jpg)
@ufried Uwe Friedrichsen | [email protected] | http://slideshare.net/ufried | http://ufried.tumblr.com
![Page 3: Resilience? Never heard of it · Circuit Breaker • Probably most often cited resilience pattern • Extension of the timeout pattern • Takes downstream unit offline if calls fail](https://reader035.vdocuments.net/reader035/viewer/2022071213/6041851a660bc722276db0d8/html5/thumbnails/3.jpg)
Resilience? Never heard of it …
![Page 4: Resilience? Never heard of it · Circuit Breaker • Probably most often cited resilience pattern • Extension of the timeout pattern • Takes downstream unit offline if calls fail](https://reader035.vdocuments.net/reader035/viewer/2022071213/6041851a660bc722276db0d8/html5/thumbnails/4.jpg)
re•sil•ience (rɪˈzɪl yəns) also re•sil′ien•cy, n. 1. the power or ability to return to the original form, position,
etc., after being bent, compressed, or stretched; elasticity. 2. ability to recover readily from illness, depression, adversity,
or the like; buoyancy. Random House Kernerman Webster's College Dictionary, © 2010 K Dictionaries Ltd. Copyright 2005, 1997, 1991 by Random House, Inc. All rights reserved.
http://www.thefreedictionary.com/resilience
![Page 5: Resilience? Never heard of it · Circuit Breaker • Probably most often cited resilience pattern • Extension of the timeout pattern • Takes downstream unit offline if calls fail](https://reader035.vdocuments.net/reader035/viewer/2022071213/6041851a660bc722276db0d8/html5/thumbnails/5.jpg)
What’s all the fuss about?
![Page 6: Resilience? Never heard of it · Circuit Breaker • Probably most often cited resilience pattern • Extension of the timeout pattern • Takes downstream unit offline if calls fail](https://reader035.vdocuments.net/reader035/viewer/2022071213/6041851a660bc722276db0d8/html5/thumbnails/6.jpg)
It‘s all about production!
![Page 7: Resilience? Never heard of it · Circuit Breaker • Probably most often cited resilience pattern • Extension of the timeout pattern • Takes downstream unit offline if calls fail](https://reader035.vdocuments.net/reader035/viewer/2022071213/6041851a660bc722276db0d8/html5/thumbnails/7.jpg)
Business
Production
Availability
![Page 8: Resilience? Never heard of it · Circuit Breaker • Probably most often cited resilience pattern • Extension of the timeout pattern • Takes downstream unit offline if calls fail](https://reader035.vdocuments.net/reader035/viewer/2022071213/6041851a660bc722276db0d8/html5/thumbnails/8.jpg)
Availability ≔ MTTF MTTF + MTTR
MTTF: Mean Time To Failure MTTR: Mean Time To Recovery
![Page 9: Resilience? Never heard of it · Circuit Breaker • Probably most often cited resilience pattern • Extension of the timeout pattern • Takes downstream unit offline if calls fail](https://reader035.vdocuments.net/reader035/viewer/2022071213/6041851a660bc722276db0d8/html5/thumbnails/9.jpg)
How can I maximize availability?
![Page 10: Resilience? Never heard of it · Circuit Breaker • Probably most often cited resilience pattern • Extension of the timeout pattern • Takes downstream unit offline if calls fail](https://reader035.vdocuments.net/reader035/viewer/2022071213/6041851a660bc722276db0d8/html5/thumbnails/10.jpg)
Traditional stability approach
Availability ≔ MTTF MTTF + MTTR
Maximize MTTF
![Page 11: Resilience? Never heard of it · Circuit Breaker • Probably most often cited resilience pattern • Extension of the timeout pattern • Takes downstream unit offline if calls fail](https://reader035.vdocuments.net/reader035/viewer/2022071213/6041851a660bc722276db0d8/html5/thumbnails/11.jpg)
reliability degree to which a system, product or component performs specified functions under specified conditions for a specified period of time ISO/IEC 25010:2011(en)
https://www.iso.org/obp/ui/#iso:std:iso-iec:25010:ed-1:v1:en
Underlying assumption
![Page 12: Resilience? Never heard of it · Circuit Breaker • Probably most often cited resilience pattern • Extension of the timeout pattern • Takes downstream unit offline if calls fail](https://reader035.vdocuments.net/reader035/viewer/2022071213/6041851a660bc722276db0d8/html5/thumbnails/12.jpg)
What’s the problem?
![Page 13: Resilience? Never heard of it · Circuit Breaker • Probably most often cited resilience pattern • Extension of the timeout pattern • Takes downstream unit offline if calls fail](https://reader035.vdocuments.net/reader035/viewer/2022071213/6041851a660bc722276db0d8/html5/thumbnails/13.jpg)
(Almost) every system is a distributed system
Chas Emerick
![Page 14: Resilience? Never heard of it · Circuit Breaker • Probably most often cited resilience pattern • Extension of the timeout pattern • Takes downstream unit offline if calls fail](https://reader035.vdocuments.net/reader035/viewer/2022071213/6041851a660bc722276db0d8/html5/thumbnails/14.jpg)
The Eight Fallacies of Distributed Computing
1. The network is reliable 2. Latency is zero 3. Bandwidth is infinite 4. The network is secure 5. Topology doesn't change 6. There is one administrator 7. Transport cost is zero 8. The network is homogeneous
Peter Deutsch
https://blogs.oracle.com/jag/resource/Fallacies.html
![Page 15: Resilience? Never heard of it · Circuit Breaker • Probably most often cited resilience pattern • Extension of the timeout pattern • Takes downstream unit offline if calls fail](https://reader035.vdocuments.net/reader035/viewer/2022071213/6041851a660bc722276db0d8/html5/thumbnails/15.jpg)
A distributed system is one in which the failure of a computer you didn't even know existed can render your own computer unusable.
Leslie Lamport
![Page 16: Resilience? Never heard of it · Circuit Breaker • Probably most often cited resilience pattern • Extension of the timeout pattern • Takes downstream unit offline if calls fail](https://reader035.vdocuments.net/reader035/viewer/2022071213/6041851a660bc722276db0d8/html5/thumbnails/16.jpg)
Failures in todays complex, distributed and interconnected systems are not the exception. • They are the normal case
• They are not predictable
![Page 17: Resilience? Never heard of it · Circuit Breaker • Probably most often cited resilience pattern • Extension of the timeout pattern • Takes downstream unit offline if calls fail](https://reader035.vdocuments.net/reader035/viewer/2022071213/6041851a660bc722276db0d8/html5/thumbnails/17.jpg)
… and it’s getting “worse”
• Cloud-based systems
• Microservices
• Zero Downtime
• IoT & Mobile
• Social
! Ever-increasing complexity and connectivity
![Page 18: Resilience? Never heard of it · Circuit Breaker • Probably most often cited resilience pattern • Extension of the timeout pattern • Takes downstream unit offline if calls fail](https://reader035.vdocuments.net/reader035/viewer/2022071213/6041851a660bc722276db0d8/html5/thumbnails/18.jpg)
Do not try to avoid failures. Embrace them.
![Page 19: Resilience? Never heard of it · Circuit Breaker • Probably most often cited resilience pattern • Extension of the timeout pattern • Takes downstream unit offline if calls fail](https://reader035.vdocuments.net/reader035/viewer/2022071213/6041851a660bc722276db0d8/html5/thumbnails/19.jpg)
Resilience approach
Availability ≔ MTTF MTTF + MTTR
Minimize MTTR
![Page 20: Resilience? Never heard of it · Circuit Breaker • Probably most often cited resilience pattern • Extension of the timeout pattern • Takes downstream unit offline if calls fail](https://reader035.vdocuments.net/reader035/viewer/2022071213/6041851a660bc722276db0d8/html5/thumbnails/20.jpg)
resilience (IT) the ability of a system to handle unexpected situations
- without the user noticing it (best case) - with a graceful degradation of service (worst case)
![Page 21: Resilience? Never heard of it · Circuit Breaker • Probably most often cited resilience pattern • Extension of the timeout pattern • Takes downstream unit offline if calls fail](https://reader035.vdocuments.net/reader035/viewer/2022071213/6041851a660bc722276db0d8/html5/thumbnails/21.jpg)
Designing for resilience A small pattern language
![Page 22: Resilience? Never heard of it · Circuit Breaker • Probably most often cited resilience pattern • Extension of the timeout pattern • Takes downstream unit offline if calls fail](https://reader035.vdocuments.net/reader035/viewer/2022071213/6041851a660bc722276db0d8/html5/thumbnails/22.jpg)
Isolation
![Page 23: Resilience? Never heard of it · Circuit Breaker • Probably most often cited resilience pattern • Extension of the timeout pattern • Takes downstream unit offline if calls fail](https://reader035.vdocuments.net/reader035/viewer/2022071213/6041851a660bc722276db0d8/html5/thumbnails/23.jpg)
Isolation
• System must not fail as a whole
• Split system in parts and isolate parts against each other
• Avoid cascading failures
• Requires set of measures to implement
![Page 24: Resilience? Never heard of it · Circuit Breaker • Probably most often cited resilience pattern • Extension of the timeout pattern • Takes downstream unit offline if calls fail](https://reader035.vdocuments.net/reader035/viewer/2022071213/6041851a660bc722276db0d8/html5/thumbnails/24.jpg)
Isolation
Bulkheads
![Page 25: Resilience? Never heard of it · Circuit Breaker • Probably most often cited resilience pattern • Extension of the timeout pattern • Takes downstream unit offline if calls fail](https://reader035.vdocuments.net/reader035/viewer/2022071213/6041851a660bc722276db0d8/html5/thumbnails/25.jpg)
Bulkheads
• Core isolation pattern
• a.k.a. “failure units” or “units of mitigation”
• Used as units of redundancy
• Pure design issue
![Page 26: Resilience? Never heard of it · Circuit Breaker • Probably most often cited resilience pattern • Extension of the timeout pattern • Takes downstream unit offline if calls fail](https://reader035.vdocuments.net/reader035/viewer/2022071213/6041851a660bc722276db0d8/html5/thumbnails/26.jpg)
Isolation
Bulkheads
Complete Parameter Checking
![Page 27: Resilience? Never heard of it · Circuit Breaker • Probably most often cited resilience pattern • Extension of the timeout pattern • Takes downstream unit offline if calls fail](https://reader035.vdocuments.net/reader035/viewer/2022071213/6041851a660bc722276db0d8/html5/thumbnails/27.jpg)
Complete Parameter Checking
• As obvious as it sounds, yet often neglected
• Protection from broken/malicious calls (and return values)
• Pay attention to Postel’s law
• Consider specific data types
![Page 28: Resilience? Never heard of it · Circuit Breaker • Probably most often cited resilience pattern • Extension of the timeout pattern • Takes downstream unit offline if calls fail](https://reader035.vdocuments.net/reader035/viewer/2022071213/6041851a660bc722276db0d8/html5/thumbnails/28.jpg)
Complete Parameter Checking // How to design request parameters // Worst variant – requires tons of checks String buySomething(Map<String, String> params); // Still a bad variant – still a lot of checks required String buySomething(String customerId, String productId, int count); // Much better – only null checks required PurchaseStatus buySomething(Customer buyer, Article product, Quantity count);
![Page 29: Resilience? Never heard of it · Circuit Breaker • Probably most often cited resilience pattern • Extension of the timeout pattern • Takes downstream unit offline if calls fail](https://reader035.vdocuments.net/reader035/viewer/2022071213/6041851a660bc722276db0d8/html5/thumbnails/29.jpg)
Isolation
Bulkheads
Complete Parameter Checking
Loose Coupling
![Page 30: Resilience? Never heard of it · Circuit Breaker • Probably most often cited resilience pattern • Extension of the timeout pattern • Takes downstream unit offline if calls fail](https://reader035.vdocuments.net/reader035/viewer/2022071213/6041851a660bc722276db0d8/html5/thumbnails/30.jpg)
Loose Coupling
• Complements isolation
• Reduce coupling between failure units
• Avoid cascading failures
• Different approaches and patterns available
![Page 31: Resilience? Never heard of it · Circuit Breaker • Probably most often cited resilience pattern • Extension of the timeout pattern • Takes downstream unit offline if calls fail](https://reader035.vdocuments.net/reader035/viewer/2022071213/6041851a660bc722276db0d8/html5/thumbnails/31.jpg)
Isolation
Bulkheads
Loose Coupling
Complete Parameter Checking
Asynchronous Communication
![Page 32: Resilience? Never heard of it · Circuit Breaker • Probably most often cited resilience pattern • Extension of the timeout pattern • Takes downstream unit offline if calls fail](https://reader035.vdocuments.net/reader035/viewer/2022071213/6041851a660bc722276db0d8/html5/thumbnails/32.jpg)
Asynchronous Communication
• Decouples sender from receiver
• Sender does not need to wait for receiver’s response
• Useful to prevent cascading failures due to failing/latent resources
• Breaks up the call stack paradigm
![Page 33: Resilience? Never heard of it · Circuit Breaker • Probably most often cited resilience pattern • Extension of the timeout pattern • Takes downstream unit offline if calls fail](https://reader035.vdocuments.net/reader035/viewer/2022071213/6041851a660bc722276db0d8/html5/thumbnails/33.jpg)
Isolation
Bulkheads
Loose Coupling
Asynchronous Communication
Complete Parameter Checking
Location Transparency
![Page 34: Resilience? Never heard of it · Circuit Breaker • Probably most often cited resilience pattern • Extension of the timeout pattern • Takes downstream unit offline if calls fail](https://reader035.vdocuments.net/reader035/viewer/2022071213/6041851a660bc722276db0d8/html5/thumbnails/34.jpg)
Location Transparency
• Decouples sender from receiver
• Sender does not need to know receiver’s concrete location
• Useful to implement redundancy and failover transparently
• Usually implemented using load balancers or middleware
![Page 35: Resilience? Never heard of it · Circuit Breaker • Probably most often cited resilience pattern • Extension of the timeout pattern • Takes downstream unit offline if calls fail](https://reader035.vdocuments.net/reader035/viewer/2022071213/6041851a660bc722276db0d8/html5/thumbnails/35.jpg)
Isolation
Bulkheads
Loose Coupling
Asynchronous Communication Location
Transparency
Complete Parameter Checking
Event-Driven
![Page 36: Resilience? Never heard of it · Circuit Breaker • Probably most often cited resilience pattern • Extension of the timeout pattern • Takes downstream unit offline if calls fail](https://reader035.vdocuments.net/reader035/viewer/2022071213/6041851a660bc722276db0d8/html5/thumbnails/36.jpg)
Event-Driven
• Popular asynchronous communication style
• Without broker location dependency is reversed
• With broker location transparency is easily achieved
• Very different from request-response paradigm
![Page 37: Resilience? Never heard of it · Circuit Breaker • Probably most often cited resilience pattern • Extension of the timeout pattern • Takes downstream unit offline if calls fail](https://reader035.vdocuments.net/reader035/viewer/2022071213/6041851a660bc722276db0d8/html5/thumbnails/37.jpg)
Request/response (Sender depends on receiver)
Lookup
Sender
Receiver
Request/Response
// from sender receiver = lookup() // from sender result = receiver.call()
Event-driven without broker
(Receiver depends on sender)
// from sender queue.send(msg) // from receiver queue = sender.subscribe() msg = queue.receive()
Subscribe
Sender
Receiver
Send
Receive
Event-driven with broker
(Sender and receiver decoupled)
// from sender broker = lookup() broker.send(msg) // from receiver queue = broker.subscribe() msg = queue.receive()
Subscribe
Sender
Receiver
Send
Broker
Receive
Lookup
![Page 38: Resilience? Never heard of it · Circuit Breaker • Probably most often cited resilience pattern • Extension of the timeout pattern • Takes downstream unit offline if calls fail](https://reader035.vdocuments.net/reader035/viewer/2022071213/6041851a660bc722276db0d8/html5/thumbnails/38.jpg)
Isolation
Bulkheads
Loose Coupling
Asynchronous Communication
Event-Driven
Location Transparency
Complete Parameter Checking Stateless
![Page 39: Resilience? Never heard of it · Circuit Breaker • Probably most often cited resilience pattern • Extension of the timeout pattern • Takes downstream unit offline if calls fail](https://reader035.vdocuments.net/reader035/viewer/2022071213/6041851a660bc722276db0d8/html5/thumbnails/39.jpg)
Stateless
• Supports location transparency (amongst other patterns)
• Service relocation is hard with state
• Service failover is hard with state
• Very fundamental resilience and scalability pattern
![Page 40: Resilience? Never heard of it · Circuit Breaker • Probably most often cited resilience pattern • Extension of the timeout pattern • Takes downstream unit offline if calls fail](https://reader035.vdocuments.net/reader035/viewer/2022071213/6041851a660bc722276db0d8/html5/thumbnails/40.jpg)
Isolation
Bulkheads
Loose Coupling
Asynchronous Communication
Event-Driven
Location Transparency
Stateless
Complete Parameter Checking
Relaxed Temporal
Constraints
![Page 41: Resilience? Never heard of it · Circuit Breaker • Probably most often cited resilience pattern • Extension of the timeout pattern • Takes downstream unit offline if calls fail](https://reader035.vdocuments.net/reader035/viewer/2022071213/6041851a660bc722276db0d8/html5/thumbnails/41.jpg)
Relaxed Temporal Constraints
• Strict consistency requires tight coupling of the involved nodes
• Any single failure immediately compromises availability
• Use a more relaxed consistency model to reduce coupling
• The real world is not ACID, it is BASE!
![Page 42: Resilience? Never heard of it · Circuit Breaker • Probably most often cited resilience pattern • Extension of the timeout pattern • Takes downstream unit offline if calls fail](https://reader035.vdocuments.net/reader035/viewer/2022071213/6041851a660bc722276db0d8/html5/thumbnails/42.jpg)
Isolation
Bulkheads
Loose Coupling
Asynchronous Communication
Event-Driven
Relaxed Temporal
Constraints
Location Transparency
Stateless
Complete Parameter Checking
Idempotency
![Page 43: Resilience? Never heard of it · Circuit Breaker • Probably most often cited resilience pattern • Extension of the timeout pattern • Takes downstream unit offline if calls fail](https://reader035.vdocuments.net/reader035/viewer/2022071213/6041851a660bc722276db0d8/html5/thumbnails/43.jpg)
Idempotency
• Non-idempotency is complicated to handle in distributed systems
• (Usually) increases coupling between participating parties
• Use idempotent actions to reduce coupling between nodes
• Very fundamental resilience and scalability pattern
![Page 44: Resilience? Never heard of it · Circuit Breaker • Probably most often cited resilience pattern • Extension of the timeout pattern • Takes downstream unit offline if calls fail](https://reader035.vdocuments.net/reader035/viewer/2022071213/6041851a660bc722276db0d8/html5/thumbnails/44.jpg)
Unique request token (schematic) // Client/Sender part // Create request with unique request token (e.g., via UUID) token = createUniqueToken() request = createRequest(token, payload) // Send request until successful while (!successful) send(request, timeout) // Do not forget failure handling
// Server/Receiver part // Receive request request = receive() // Process request only if token is unknown if (!lookup(request.token)) // needs to implemented in a CAS way to be safe process(request) store(token) // Store token for lookup (can be garbage collected eventually)
![Page 45: Resilience? Never heard of it · Circuit Breaker • Probably most often cited resilience pattern • Extension of the timeout pattern • Takes downstream unit offline if calls fail](https://reader035.vdocuments.net/reader035/viewer/2022071213/6041851a660bc722276db0d8/html5/thumbnails/45.jpg)
Isolation
Bulkheads
Loose Coupling
Asynchronous Communication
Event-Driven
Idempotency
Relaxed Temporal
Constraints
Location Transparency
Stateless
Complete Parameter Checking
Self-Containment
![Page 46: Resilience? Never heard of it · Circuit Breaker • Probably most often cited resilience pattern • Extension of the timeout pattern • Takes downstream unit offline if calls fail](https://reader035.vdocuments.net/reader035/viewer/2022071213/6041851a660bc722276db0d8/html5/thumbnails/46.jpg)
Self-Containment
• Services are self-contained deployment units
• No dependencies to other runtime infrastructure components
• Reduces coupling at deployment time
• Improves isolation and flexibility
![Page 47: Resilience? Never heard of it · Circuit Breaker • Probably most often cited resilience pattern • Extension of the timeout pattern • Takes downstream unit offline if calls fail](https://reader035.vdocuments.net/reader035/viewer/2022071213/6041851a660bc722276db0d8/html5/thumbnails/47.jpg)
Use a framework …
Spring Boot
Dropwizard
Jackson
…
Metrics
… or do it yourself
![Page 48: Resilience? Never heard of it · Circuit Breaker • Probably most often cited resilience pattern • Extension of the timeout pattern • Takes downstream unit offline if calls fail](https://reader035.vdocuments.net/reader035/viewer/2022071213/6041851a660bc722276db0d8/html5/thumbnails/48.jpg)
Isolation
Bulkheads
Loose Coupling
Asynchronous Communication
Event-Driven
Idempotency
Self-Containment Relaxed Temporal
Constraints
Location Transparency
Stateless
Complete Parameter Checking
Latency Control
![Page 49: Resilience? Never heard of it · Circuit Breaker • Probably most often cited resilience pattern • Extension of the timeout pattern • Takes downstream unit offline if calls fail](https://reader035.vdocuments.net/reader035/viewer/2022071213/6041851a660bc722276db0d8/html5/thumbnails/49.jpg)
Latency control
• Complements isolation
• Detection and handling of non-timely responses
• Avoid cascading temporal failures
• Different approaches and patterns available
![Page 50: Resilience? Never heard of it · Circuit Breaker • Probably most often cited resilience pattern • Extension of the timeout pattern • Takes downstream unit offline if calls fail](https://reader035.vdocuments.net/reader035/viewer/2022071213/6041851a660bc722276db0d8/html5/thumbnails/50.jpg)
Isolation
Latency Control
Bulkheads
Loose Coupling
Asynchronous Communication
Event-Driven
Idempotency
Self-Containment Relaxed Temporal
Constraints
Location Transparency
Stateless
Complete Parameter Checking
Timeouts
![Page 51: Resilience? Never heard of it · Circuit Breaker • Probably most often cited resilience pattern • Extension of the timeout pattern • Takes downstream unit offline if calls fail](https://reader035.vdocuments.net/reader035/viewer/2022071213/6041851a660bc722276db0d8/html5/thumbnails/51.jpg)
Timeouts
• Preserve responsiveness independent of downstream latency
• Measure response time of downstream calls
• Stop waiting after a pre-determined timeout
• Take alternate action if timeout was reached
![Page 52: Resilience? Never heard of it · Circuit Breaker • Probably most often cited resilience pattern • Extension of the timeout pattern • Takes downstream unit offline if calls fail](https://reader035.vdocuments.net/reader035/viewer/2022071213/6041851a660bc722276db0d8/html5/thumbnails/52.jpg)
Timeouts with standard library means // Wrap blocking action in a Callable Callable<MyActionResult> myAction = <My Blocking Action> // Use a simple ExecutorService to run the action in its own thread ExecutorService executor = Executors.newSingleThreadExecutor(); Future<MyActionResult> future = executor.submit(myAction); MyActionResult result = null; // Use Future.get() method to limit time to wait for completion try { result = future.get(TIMEOUT, TIMEUNIT); // Action completed in a timely manner – process results } catch (TimeoutException e) { // Handle timeout (e.g., schedule retry, escalate, alternate action, …) } catch (...) { // Handle other exceptions that can be thrown by Future.get() } finally { // Make sure the callable is stopped even in case of a timeout future.cancel(true); }
![Page 53: Resilience? Never heard of it · Circuit Breaker • Probably most often cited resilience pattern • Extension of the timeout pattern • Takes downstream unit offline if calls fail](https://reader035.vdocuments.net/reader035/viewer/2022071213/6041851a660bc722276db0d8/html5/thumbnails/53.jpg)
Isolation
Latency Control
Timeouts
Bulkheads
Loose Coupling
Asynchronous Communication
Event-Driven
Idempotency
Self-Containment Relaxed Temporal
Constraints
Location Transparency
Stateless
Complete Parameter Checking
Circuit Breaker
![Page 54: Resilience? Never heard of it · Circuit Breaker • Probably most often cited resilience pattern • Extension of the timeout pattern • Takes downstream unit offline if calls fail](https://reader035.vdocuments.net/reader035/viewer/2022071213/6041851a660bc722276db0d8/html5/thumbnails/54.jpg)
Circuit Breaker
• Probably most often cited resilience pattern
• Extension of the timeout pattern
• Takes downstream unit offline if calls fail multiple times
• Specific variant of the fail fast pattern
![Page 55: Resilience? Never heard of it · Circuit Breaker • Probably most often cited resilience pattern • Extension of the timeout pattern • Takes downstream unit offline if calls fail](https://reader035.vdocuments.net/reader035/viewer/2022071213/6041851a660bc722276db0d8/html5/thumbnails/55.jpg)
![Page 56: Resilience? Never heard of it · Circuit Breaker • Probably most often cited resilience pattern • Extension of the timeout pattern • Takes downstream unit offline if calls fail](https://reader035.vdocuments.net/reader035/viewer/2022071213/6041851a660bc722276db0d8/html5/thumbnails/56.jpg)
// Hystrix “Hello world” public class HelloCommand extends HystrixCommand<String> { private static final String COMMAND_GROUP = ”Hello”; // Not important here private final String name; // Request parameters are passed in as constructor parameters public HelloCommand(String name) { super(HystrixCommandGroupKey.Factory.asKey(COMMAND_GROUP)); this.name = name; } @Override protected String run() throws Exception { // Usually here would be the resource call that needs to be guarded return "Hello, " + name; } } // Usage of a Hystrix command – synchronous variant @Test public void shouldGreetWorld() { String result = new HelloCommand("World").execute(); assertEquals("Hello, World", result); }
![Page 57: Resilience? Never heard of it · Circuit Breaker • Probably most often cited resilience pattern • Extension of the timeout pattern • Takes downstream unit offline if calls fail](https://reader035.vdocuments.net/reader035/viewer/2022071213/6041851a660bc722276db0d8/html5/thumbnails/57.jpg)
Source: https://github.com/Netflix/Hystrix/wiki/How-it-Works
![Page 58: Resilience? Never heard of it · Circuit Breaker • Probably most often cited resilience pattern • Extension of the timeout pattern • Takes downstream unit offline if calls fail](https://reader035.vdocuments.net/reader035/viewer/2022071213/6041851a660bc722276db0d8/html5/thumbnails/58.jpg)
Isolation
Latency Control
Circuit Breaker
Timeouts
Bulkheads
Loose Coupling
Asynchronous Communication
Event-Driven
Idempotency
Self-Containment Relaxed Temporal
Constraints
Location Transparency
Stateless
Complete Parameter Checking
Fail Fast
![Page 59: Resilience? Never heard of it · Circuit Breaker • Probably most often cited resilience pattern • Extension of the timeout pattern • Takes downstream unit offline if calls fail](https://reader035.vdocuments.net/reader035/viewer/2022071213/6041851a660bc722276db0d8/html5/thumbnails/59.jpg)
Fail Fast
• “If you know you’re going to fail, you better fail fast”
• Avoid foreseeable failures
• Usually implemented by adding checks in front of costly actions
• Enhances probability of not failing
![Page 60: Resilience? Never heard of it · Circuit Breaker • Probably most often cited resilience pattern • Extension of the timeout pattern • Takes downstream unit offline if calls fail](https://reader035.vdocuments.net/reader035/viewer/2022071213/6041851a660bc722276db0d8/html5/thumbnails/60.jpg)
Isolation
Latency Control
Fail Fast
Circuit Breaker
Timeouts
Bulkheads
Loose Coupling
Asynchronous Communication
Event-Driven
Idempotency
Self-Containment Relaxed Temporal
Constraints
Location Transparency
Stateless
Complete Parameter Checking
Fan out & quickest reply
![Page 61: Resilience? Never heard of it · Circuit Breaker • Probably most often cited resilience pattern • Extension of the timeout pattern • Takes downstream unit offline if calls fail](https://reader035.vdocuments.net/reader035/viewer/2022071213/6041851a660bc722276db0d8/html5/thumbnails/61.jpg)
Fan out & quickest reply
• Send request to multiple workers
• Use quickest reply and discard all other responses
• Reduces probability of latent responses
• Tradeoff is “waste” of resources
![Page 62: Resilience? Never heard of it · Circuit Breaker • Probably most often cited resilience pattern • Extension of the timeout pattern • Takes downstream unit offline if calls fail](https://reader035.vdocuments.net/reader035/viewer/2022071213/6041851a660bc722276db0d8/html5/thumbnails/62.jpg)
Isolation
Latency Control
Fail Fast
Circuit Breaker
Timeouts
Bulkheads
Loose Coupling
Asynchronous Communication
Event-Driven
Idempotency
Self-Containment Relaxed Temporal
Constraints
Location Transparency
Stateless
Complete Parameter Checking
Bounded Queues
Fan out & quickest reply
![Page 63: Resilience? Never heard of it · Circuit Breaker • Probably most often cited resilience pattern • Extension of the timeout pattern • Takes downstream unit offline if calls fail](https://reader035.vdocuments.net/reader035/viewer/2022071213/6041851a660bc722276db0d8/html5/thumbnails/63.jpg)
Bounded Queues
• Limit request queue sizes in front of highly utilized resources
• Avoids latency due to overloaded resources
• Introduces pushback on the callers
• Another variant of the fail fast pattern
![Page 64: Resilience? Never heard of it · Circuit Breaker • Probably most often cited resilience pattern • Extension of the timeout pattern • Takes downstream unit offline if calls fail](https://reader035.vdocuments.net/reader035/viewer/2022071213/6041851a660bc722276db0d8/html5/thumbnails/64.jpg)
Bounded Queue Example // Executor service runs with up to 6 worker threads simultaneously // When thread pool is exhausted, up to 4 tasks will be queued - // additional tasks are rejected triggering the PushbackHandler final int POOL_SIZE = 6; final int QUEUE_SIZE = 4; // Set up a thread pool executor with a bounded queue and a PushbackHandler ExecutorService executor = new ThreadPoolExecutor(POOL_SIZE, POOL_SIZE, // Core pool size, max pool size 0, TimeUnit.SECONDS, // Timeout for unused threads new ArrayBlockingQueue(QUEUE_SIZE), new PushbackHandler); // PushbackHandler - implements the desired pushback behavior public class PushbackHandler implements RejectedExecutionHandler { @Override public void rejectedExecution(Runnable r, ThreadPoolExecutor executor) { // Implement your pushback behavior here } }
![Page 65: Resilience? Never heard of it · Circuit Breaker • Probably most often cited resilience pattern • Extension of the timeout pattern • Takes downstream unit offline if calls fail](https://reader035.vdocuments.net/reader035/viewer/2022071213/6041851a660bc722276db0d8/html5/thumbnails/65.jpg)
Isolation
Latency Control
Fail Fast
Circuit Breaker
Timeouts
Fan out & quickest reply
Bounded Queues
Bulkheads
Loose Coupling
Asynchronous Communication
Event-Driven
Idempotency
Self-Containment Relaxed Temporal
Constraints
Location Transparency
Stateless
Complete Parameter Checking
Shed Load
![Page 66: Resilience? Never heard of it · Circuit Breaker • Probably most often cited resilience pattern • Extension of the timeout pattern • Takes downstream unit offline if calls fail](https://reader035.vdocuments.net/reader035/viewer/2022071213/6041851a660bc722276db0d8/html5/thumbnails/66.jpg)
Shed Load
• Upstream isolation pattern
• Avoid becoming overloaded due to too many requests
• Install a gatekeeper in front of the resource
• Shed requests based on resource load
![Page 67: Resilience? Never heard of it · Circuit Breaker • Probably most often cited resilience pattern • Extension of the timeout pattern • Takes downstream unit offline if calls fail](https://reader035.vdocuments.net/reader035/viewer/2022071213/6041851a660bc722276db0d8/html5/thumbnails/67.jpg)
Isolation
Latency Control
Fail Fast
Circuit Breaker
Timeouts
Fan out & quickest reply
Bounded Queues
Shed Load
Bulkheads
Loose Coupling
Asynchronous Communication
Event-Driven
Idempotency
Self-Containment Relaxed Temporal
Constraints
Location Transparency
Stateless
Complete Parameter Checking
Supervision
![Page 68: Resilience? Never heard of it · Circuit Breaker • Probably most often cited resilience pattern • Extension of the timeout pattern • Takes downstream unit offline if calls fail](https://reader035.vdocuments.net/reader035/viewer/2022071213/6041851a660bc722276db0d8/html5/thumbnails/68.jpg)
Supervision
• Provides failure handling beyond the means of a single failure unit
• Detect unit failures
• Provide means for error escalation
• Different approaches and patterns available
![Page 69: Resilience? Never heard of it · Circuit Breaker • Probably most often cited resilience pattern • Extension of the timeout pattern • Takes downstream unit offline if calls fail](https://reader035.vdocuments.net/reader035/viewer/2022071213/6041851a660bc722276db0d8/html5/thumbnails/69.jpg)
Isolation
Latency Control
Fail Fast
Circuit Breaker
Timeouts
Fan out & quickest reply
Bounded Queues
Shed Load
Bulkheads
Loose Coupling
Asynchronous Communication
Event-Driven
Idempotency
Self-Containment Relaxed Temporal
Constraints
Location Transparency
Stateless
Supervision
Complete Parameter Checking Monitor
![Page 70: Resilience? Never heard of it · Circuit Breaker • Probably most often cited resilience pattern • Extension of the timeout pattern • Takes downstream unit offline if calls fail](https://reader035.vdocuments.net/reader035/viewer/2022071213/6041851a660bc722276db0d8/html5/thumbnails/70.jpg)
Monitor
• Observe unit behavior and interactions from the outside
• Automatically respond to detected failures
• Part of the system – complex failure handling strategies possible
• Outside the system – more robust against system level failures
![Page 71: Resilience? Never heard of it · Circuit Breaker • Probably most often cited resilience pattern • Extension of the timeout pattern • Takes downstream unit offline if calls fail](https://reader035.vdocuments.net/reader035/viewer/2022071213/6041851a660bc722276db0d8/html5/thumbnails/71.jpg)
Isolation
Latency Control
Fail Fast
Circuit Breaker
Timeouts
Fan out & quickest reply
Bounded Queues
Shed Load
Bulkheads
Loose Coupling
Asynchronous Communication
Event-Driven
Idempotency
Self-Containment Relaxed Temporal
Constraints
Location Transparency
Stateless
Supervision
Monitor
Complete Parameter Checking
Error Handler
![Page 72: Resilience? Never heard of it · Circuit Breaker • Probably most often cited resilience pattern • Extension of the timeout pattern • Takes downstream unit offline if calls fail](https://reader035.vdocuments.net/reader035/viewer/2022071213/6041851a660bc722276db0d8/html5/thumbnails/72.jpg)
Error Handler
• Units often don’t have enough time or information to handle errors
• Separate business logic and error handling
• Business logic just focuses on getting the task done (quickly)
• Error handler has sufficient time and information to handle errors
![Page 73: Resilience? Never heard of it · Circuit Breaker • Probably most often cited resilience pattern • Extension of the timeout pattern • Takes downstream unit offline if calls fail](https://reader035.vdocuments.net/reader035/viewer/2022071213/6041851a660bc722276db0d8/html5/thumbnails/73.jpg)
Isolation
Latency Control
Fail Fast
Circuit Breaker
Timeouts
Fan out & quickest reply
Bounded Queues
Shed Load
Bulkheads
Loose Coupling
Asynchronous Communication
Event-Driven
Idempotency
Self-Containment Relaxed Temporal
Constraints
Location Transparency
Stateless
Supervision
Monitor
Error Handler Complete Parameter Checking
Escalation
![Page 74: Resilience? Never heard of it · Circuit Breaker • Probably most often cited resilience pattern • Extension of the timeout pattern • Takes downstream unit offline if calls fail](https://reader035.vdocuments.net/reader035/viewer/2022071213/6041851a660bc722276db0d8/html5/thumbnails/74.jpg)
Escalation
• Units often don’t have enough time or information to handle errors
• Escalation peer with more time and information needed
• Often multi-level hierarchies
• Pure design issue
![Page 75: Resilience? Never heard of it · Circuit Breaker • Probably most often cited resilience pattern • Extension of the timeout pattern • Takes downstream unit offline if calls fail](https://reader035.vdocuments.net/reader035/viewer/2022071213/6041851a660bc722276db0d8/html5/thumbnails/75.jpg)
Escalation implementation using Worker/Supervisor
W
Flow / Process
W W W W W W W
S S S
S
S
Escalation
![Page 76: Resilience? Never heard of it · Circuit Breaker • Probably most often cited resilience pattern • Extension of the timeout pattern • Takes downstream unit offline if calls fail](https://reader035.vdocuments.net/reader035/viewer/2022071213/6041851a660bc722276db0d8/html5/thumbnails/76.jpg)
Isolation
Latency Control
Fail Fast
Circuit Breaker
Timeouts
Fan out & quickest reply
Bounded Queues
Shed Load
Bulkheads
Loose Coupling
Asynchronous Communication
Event-Driven
Idempotency
Self-Containment Relaxed Temporal
Constraints
Location Transparency
Stateless
Supervision
Monitor
Complete Parameter Checking
Error Handler
Escalation
![Page 77: Resilience? Never heard of it · Circuit Breaker • Probably most often cited resilience pattern • Extension of the timeout pattern • Takes downstream unit offline if calls fail](https://reader035.vdocuments.net/reader035/viewer/2022071213/6041851a660bc722276db0d8/html5/thumbnails/77.jpg)
… and there is more
• Recovery & mitigation patterns
• More supervision patterns
• Architectural patterns
• Anti-fragility patterns
• Fault treatment & prevention patterns
A rich pattern family
![Page 78: Resilience? Never heard of it · Circuit Breaker • Probably most often cited resilience pattern • Extension of the timeout pattern • Takes downstream unit offline if calls fail](https://reader035.vdocuments.net/reader035/viewer/2022071213/6041851a660bc722276db0d8/html5/thumbnails/78.jpg)
Wrap-up
• Today’s systems are distributed ...
• … and it’s getting “worse”
• Failures are the normal case
• Failures are not predictable
• Resilient software design needed
• Rich pattern language
• Isolation is a good starting point
![Page 79: Resilience? Never heard of it · Circuit Breaker • Probably most often cited resilience pattern • Extension of the timeout pattern • Takes downstream unit offline if calls fail](https://reader035.vdocuments.net/reader035/viewer/2022071213/6041851a660bc722276db0d8/html5/thumbnails/79.jpg)
Do not avoid failures. Embrace them!
![Page 80: Resilience? Never heard of it · Circuit Breaker • Probably most often cited resilience pattern • Extension of the timeout pattern • Takes downstream unit offline if calls fail](https://reader035.vdocuments.net/reader035/viewer/2022071213/6041851a660bc722276db0d8/html5/thumbnails/80.jpg)
@ufried Uwe Friedrichsen | [email protected] | http://slideshare.net/ufried | http://ufried.tumblr.com
![Page 81: Resilience? Never heard of it · Circuit Breaker • Probably most often cited resilience pattern • Extension of the timeout pattern • Takes downstream unit offline if calls fail](https://reader035.vdocuments.net/reader035/viewer/2022071213/6041851a660bc722276db0d8/html5/thumbnails/81.jpg)