architecting for failure - 4developers 2015

23
architecting for failure building fault-tolerant systems Jakub Derda Warsaw, 2015

Upload: jakub-derda

Post on 12-Apr-2017

243 views

Category:

Software


4 download

TRANSCRIPT

Page 1: Architecting for failure - 4developers 2015

architecting for failure building fault-

tolerant systemsJakub Derda

Warsaw, 2015

Page 2: Architecting for failure - 4developers 2015

‘Tree’ component – overview

Page 3: Architecting for failure - 4developers 2015

‘Tree’ component – detailed view

Page 4: Architecting for failure - 4developers 2015

‘Tree’ component – detailed view

client

network connection

sever

Page 5: Architecting for failure - 4developers 2015

‘Tree’ component – detailed view

human factor software client library

ISP protocol stack network

load balancers OS power source

client

network connection

sever

Page 6: Architecting for failure - 4developers 2015

Your component – detailed viewWhat is a fault?

Page 7: Architecting for failure - 4developers 2015

What is not a fault?

Service is not working on our side*

* Caused by e.g. technical failures, outages, corrupted data, attacks

Page 8: Architecting for failure - 4developers 2015

What is a fault?

The real fault is when we don’t

deliver valueto customers.

Page 9: Architecting for failure - 4developers 2015

Value delivering without working system

Bring your own wine, we’re waiting for license.Last election in Poland

Page 10: Architecting for failure - 4developers 2015

What fault-tolerance is not?

It’s NOT making sure your system never goes down.

It (eventually) will.

Page 11: Architecting for failure - 4developers 2015

What is a fault-tolerance?

It’s making sure that system can quickly recover and/or

client is not impacted.

Page 12: Architecting for failure - 4developers 2015

How to solve it?

Page 13: Architecting for failure - 4developers 2015

Solving – redundancy

Hot/warm replicas

Caches

Geographical distribution, CDNs

Hardware redundancy

Alternative systems and procedures

Page 14: Architecting for failure - 4developers 2015

Solving – design

Stateless

Auditing

Idempotent requests

Uniqueness / randomness

Asynchronous and decoupling

EIPs

Commands, not data

Break the rules

Page 15: Architecting for failure - 4developers 2015

Solving – procedures

Backup creation, cleanup and restore

QA & potential problems

Continuous integration

Deployment

Page 16: Architecting for failure - 4developers 2015

Solving – observe

Dive deep, post-mortems

Identify bottlenecksObserve key metrics

Verify assumptionsPredict traffic

Page 17: Architecting for failure - 4developers 2015

Tradeoffs - simple

cost time

1/scope

QUALITY

Page 18: Architecting for failure - 4developers 2015

Tradeoffs - real

cost durability

time

consistency

trust

audit (traceability)

complexity

security

scalabilityfunctionalitystability

reliability

extensibility

performancemaintainability

manageability

Page 19: Architecting for failure - 4developers 2015

Summary

Learn to live with crashes

Page 20: Architecting for failure - 4developers 2015

Summary

Automate procedures

Page 21: Architecting for failure - 4developers 2015

Summary

Don’t be afraid to cross the line

Page 22: Architecting for failure - 4developers 2015

Fault tolerance is not a property of a design,it’s a process.

Page 23: Architecting for failure - 4developers 2015

Questions?

Contact:

[email protected]