how did i get here? building confidence in a distributed stream processor

270
How Did I Get Here? Building Confidence in a Distributed Stream Processor

Upload: sean-t-allen

Post on 22-Feb-2017

343 views

Category:

Technology


0 download

TRANSCRIPT

Page 1: How did I get here? Building confidence in a distributed stream processor

How Did I Get Here?Building Confidence in a Distributed Stream Processor

Page 2: How did I get here? Building confidence in a distributed stream processor

Sean T. Allen

Page 3: How did I get here? Building confidence in a distributed stream processor

T

Page 4: How did I get here? Building confidence in a distributed stream processor

T

Page 5: How did I get here? Building confidence in a distributed stream processor
Page 6: How did I get here? Building confidence in a distributed stream processor
Page 7: How did I get here? Building confidence in a distributed stream processor
Page 8: How did I get here? Building confidence in a distributed stream processor
Page 9: How did I get here? Building confidence in a distributed stream processor

Experience Report

Page 10: How did I get here? Building confidence in a distributed stream processor

Stream Processor

Page 11: How did I get here? Building confidence in a distributed stream processor
Page 12: How did I get here? Building confidence in a distributed stream processor

PrototypeStarted January 2016

Page 13: How did I get here? Building confidence in a distributed stream processor

PrototypeStarted January 2016

Page 14: How did I get here? Building confidence in a distributed stream processor

ProductionStarted April 2016

Page 15: How did I get here? Building confidence in a distributed stream processor

ProductionStarted April 2016

Page 16: How did I get here? Building confidence in a distributed stream processor

America is all about speed.

Hot, nasty, bad-ass speed. — Eleanor Roosevelt

Page 17: How did I get here? Building confidence in a distributed stream processor

High Throughput

Buffy: Goals

Page 18: How did I get here? Building confidence in a distributed stream processor

Low Latency

Buffy: Goals

Page 19: How did I get here? Building confidence in a distributed stream processor

Less Hardware

Buffy: Goals

Page 20: How did I get here? Building confidence in a distributed stream processor

America is all about data quality.

Quiet, demure data quality. — Andrew Jackson

Page 21: How did I get here? Building confidence in a distributed stream processor

High Fidelity

Buffy: Goals

Page 22: How did I get here? Building confidence in a distributed stream processor

Stream Processing

Page 23: How did I get here? Building confidence in a distributed stream processor

Message at a time

Page 24: How did I get here? Building confidence in a distributed stream processor

Never ending

Page 25: How did I get here? Building confidence in a distributed stream processor

Failure

Page 26: How did I get here? Building confidence in a distributed stream processor

Machine Failure

Page 27: How did I get here? Building confidence in a distributed stream processor

Slow Machine

Page 28: How did I get here? Building confidence in a distributed stream processor

Segfaulting Process

Page 29: How did I get here? Building confidence in a distributed stream processor

GC Pause

Page 30: How did I get here? Building confidence in a distributed stream processor

Network Error

Page 31: How did I get here? Building confidence in a distributed stream processor

Failure Happens

Page 32: How did I get here? Building confidence in a distributed stream processor

Delivery Guarantees

Page 33: How did I get here? Building confidence in a distributed stream processor

At-Most-Once

Page 34: How did I get here? Building confidence in a distributed stream processor

At-Most-OnceBest Effort

Page 35: How did I get here? Building confidence in a distributed stream processor

At-Least-Once

Page 36: How did I get here? Building confidence in a distributed stream processor

At-Least-OnceACK or resend

Page 37: How did I get here? Building confidence in a distributed stream processor

Exactly-Once

Page 38: How did I get here? Building confidence in a distributed stream processor

Exactly-OnceAt-Least-Once + Idempotence

Page 39: How did I get here? Building confidence in a distributed stream processor

Exactly-Once

Page 40: How did I get here? Building confidence in a distributed stream processor
Page 41: How did I get here? Building confidence in a distributed stream processor
Page 42: How did I get here? Building confidence in a distributed stream processor
Page 43: How did I get here? Building confidence in a distributed stream processor
Page 44: How did I get here? Building confidence in a distributed stream processor
Page 45: How did I get here? Building confidence in a distributed stream processor
Page 46: How did I get here? Building confidence in a distributed stream processor
Page 47: How did I get here? Building confidence in a distributed stream processor
Page 48: How did I get here? Building confidence in a distributed stream processor
Page 49: How did I get here? Building confidence in a distributed stream processor
Page 50: How did I get here? Building confidence in a distributed stream processor
Page 51: How did I get here? Building confidence in a distributed stream processor
Page 52: How did I get here? Building confidence in a distributed stream processor
Page 53: How did I get here? Building confidence in a distributed stream processor
Page 54: How did I get here? Building confidence in a distributed stream processor
Page 55: How did I get here? Building confidence in a distributed stream processor
Page 56: How did I get here? Building confidence in a distributed stream processor
Page 57: How did I get here? Building confidence in a distributed stream processor
Page 58: How did I get here? Building confidence in a distributed stream processor
Page 59: How did I get here? Building confidence in a distributed stream processor
Page 60: How did I get here? Building confidence in a distributed stream processor
Page 61: How did I get here? Building confidence in a distributed stream processor
Page 62: How did I get here? Building confidence in a distributed stream processor

Confidence

Page 63: How did I get here? Building confidence in a distributed stream processor

Black Box Testing

Page 64: How did I get here? Building confidence in a distributed stream processor

Black Box Testing

Page 65: How did I get here? Building confidence in a distributed stream processor

Black Box Testing

Page 66: How did I get here? Building confidence in a distributed stream processor

Black Box Testing

Page 67: How did I get here? Building confidence in a distributed stream processor

Black Box Testing

Page 68: How did I get here? Building confidence in a distributed stream processor

System Under Test

Black Box Testing

Page 69: How did I get here? Building confidence in a distributed stream processor

Input Source

Black Box Testing

Page 70: How did I get here? Building confidence in a distributed stream processor

Output Receiver

Black Box Testing

Page 71: How did I get here? Building confidence in a distributed stream processor

Unit Testingbecause

isn't enough

Black Box Testing

Page 72: How did I get here? Building confidence in a distributed stream processor

Integration Testingbecause

isn't enough

Black Box Testing

Page 73: How did I get here? Building confidence in a distributed stream processor

composed componentsbecause

have interesting new failure modes

Black Box Testing

Page 74: How did I get here? Building confidence in a distributed stream processor

Test The Entire System

Black Box Testing

Page 75: How did I get here? Building confidence in a distributed stream processor

Test The Entire Systemend to end

Black Box Testing

Page 76: How did I get here? Building confidence in a distributed stream processor

Test The Entire Systemend to end

Black Box Testing

and verify your expectations

Page 77: How did I get here? Building confidence in a distributed stream processor

WesleyExpectation verification for Buffy

Page 78: How did I get here? Building confidence in a distributed stream processor

Wesley

Page 79: How did I get here? Building confidence in a distributed stream processor

Wesley

Input

Page 80: How did I get here? Building confidence in a distributed stream processor

Wesley

Output

Page 81: How did I get here? Building confidence in a distributed stream processor

Wesley

Input Output

Page 82: How did I get here? Building confidence in a distributed stream processor

Input Source

Wesley

Page 83: How did I get here? Building confidence in a distributed stream processor

Input Source

Wesley

Output Receiver

Page 84: How did I get here? Building confidence in a distributed stream processor

Wesley

Input Source

Records sent data

1,2,3,4

Page 85: How did I get here? Building confidence in a distributed stream processor

Wesley

Input Source

Records sent data Records received data

2,4,6,81,2,3,4

Output Receiver

Page 86: How did I get here? Building confidence in a distributed stream processor

Wesley

Page 87: How did I get here? Building confidence in a distributed stream processor

Wesley

Analyze!

Page 88: How did I get here? Building confidence in a distributed stream processor

Wesley

Page 89: How did I get here? Building confidence in a distributed stream processor

Wesley

Page 90: How did I get here? Building confidence in a distributed stream processor

Wesley

Page 91: How did I get here? Building confidence in a distributed stream processor

Wesley

Page 92: How did I get here? Building confidence in a distributed stream processor

Wesley

Page 93: How did I get here? Building confidence in a distributed stream processor

Wesley

Page 94: How did I get here? Building confidence in a distributed stream processor

Wesley

Page 95: How did I get here? Building confidence in a distributed stream processor

Wesley

Page 96: How did I get here? Building confidence in a distributed stream processor

Wesley

It Works!

Page 97: How did I get here? Building confidence in a distributed stream processor

SpikeFault injection for Buffy

Page 98: How did I get here? Building confidence in a distributed stream processor

Fault Injection

Page 99: How did I get here? Building confidence in a distributed stream processor

Lineage-driven fault injection

Page 100: How did I get here? Building confidence in a distributed stream processor

Start from a good result

Spike: LDFI

Page 101: How did I get here? Building confidence in a distributed stream processor

Input

Spike: LDFI

Page 102: How did I get here? Building confidence in a distributed stream processor

Output

Spike: LDFI

Page 103: How did I get here? Building confidence in a distributed stream processor

Figure out what can go wrong

Spike: LDFI

Page 104: How did I get here? Building confidence in a distributed stream processor

Nemesis

Spike: LDFI

Each "wrong" is a possible

Page 105: How did I get here? Building confidence in a distributed stream processor

The Network

Spike: LDFI

Our first nemesis:

Page 106: How did I get here? Building confidence in a distributed stream processor

Determinism is key

Spike

Page 107: How did I get here? Building confidence in a distributed stream processor

Repeated runs with different results

==

Mostly Useless

Spike

Page 108: How did I get here? Building confidence in a distributed stream processor

Spike

Page 109: How did I get here? Building confidence in a distributed stream processor

Spike

Inject failures as informed by TCP

Page 110: How did I get here? Building confidence in a distributed stream processor

Spike

TCP Guarantees:

Page 111: How did I get here? Building confidence in a distributed stream processor

Spike

TCP Guarantees:

Per connection in order delivery

Page 112: How did I get here? Building confidence in a distributed stream processor

Spike

Per connection in order delivery Per connection duplicate detection

TCP Guarantees:

Page 113: How did I get here? Building confidence in a distributed stream processor

Spike

Per connection in order delivery Per connection duplicate detection

Per connection retransmission of lost data

TCP Guarantees:

Page 114: How did I get here? Building confidence in a distributed stream processor

TCP in Pony: Event Driven

Page 115: How did I get here? Building confidence in a distributed stream processor

TCP in Pony: Event Driven

Page 116: How did I get here? Building confidence in a distributed stream processor

TCP in Pony: Event Driven

Page 117: How did I get here? Building confidence in a distributed stream processor

TCP in Pony: Event Driven

Page 118: How did I get here? Building confidence in a distributed stream processor

TCP in Pony: Event Driven

Page 119: How did I get here? Building confidence in a distributed stream processor

Useless Notifier

Page 120: How did I get here? Building confidence in a distributed stream processor

Useless Notifier

Page 121: How did I get here? Building confidence in a distributed stream processor

Useless Notifier

Page 122: How did I get here? Building confidence in a distributed stream processor

Dropped Connections

Nemesis #1:

Page 123: How did I get here? Building confidence in a distributed stream processor

Spike: Drop Connection

Page 124: How did I get here? Building confidence in a distributed stream processor

Spike: Drop Connection

Page 125: How did I get here? Building confidence in a distributed stream processor

Spike: Drop Connection

Page 126: How did I get here? Building confidence in a distributed stream processor

Spike: Drop Connection

Page 127: How did I get here? Building confidence in a distributed stream processor

Spike: Drop Connection

Page 128: How did I get here? Building confidence in a distributed stream processor

Spike: Drop Connection

• Incoming connection accepted

Page 129: How did I get here? Building confidence in a distributed stream processor

Spike: Drop Connection

• Incoming connection accepted

• Attempting outgoing connection

Page 130: How did I get here? Building confidence in a distributed stream processor

Spike: Drop Connection

• Incoming connection accepted

• Attempting outgoing connection

• Connection established

Page 131: How did I get here? Building confidence in a distributed stream processor

Spike: Drop Connection

• Incoming connection accepted

• Attempting outgoing connection

• Connection established

• Data sent

Page 132: How did I get here? Building confidence in a distributed stream processor

Spike: Drop Connection

• Incoming connection accepted

• Attempting outgoing connection

• Connection established

• Data sent

• Data received

Page 133: How did I get here? Building confidence in a distributed stream processor

Integrating Spike"Double and Halve" app

Page 134: How did I get here? Building confidence in a distributed stream processor

Integrating Spike"Double and Halve" app

Page 135: How did I get here? Building confidence in a distributed stream processor

Integrating Spike"Double and Halve" app

Page 136: How did I get here? Building confidence in a distributed stream processor

Integrating Spike"Double and Halve" app

Page 137: How did I get here? Building confidence in a distributed stream processor

Integrating Spike"Double and Halve" app

Page 138: How did I get here? Building confidence in a distributed stream processor

Integrating Spike"Double and Halve" app

Page 139: How did I get here? Building confidence in a distributed stream processor

Integrating Spike"Double and Halve" app

Page 140: How did I get here? Building confidence in a distributed stream processor

Integrating Spike"Double and Halve" app

Page 141: How did I get here? Building confidence in a distributed stream processor

Integrating Spike"Double and Halve" app

Page 142: How did I get here? Building confidence in a distributed stream processor

Integrating Spike"Double and Halve" app

Page 143: How did I get here? Building confidence in a distributed stream processor

• Easy to verify

Integrating Spike"Double and Halve" app

Page 144: How did I get here? Building confidence in a distributed stream processor

• Easy to verify

• Messages cross process boundary

Integrating Spike"Double and Halve" app

Page 145: How did I get here? Building confidence in a distributed stream processor

• Easy to verify

• Messages cross process boundary

• Messages cross network boundary

Integrating Spike"Double and Halve" app

Page 146: How did I get here? Building confidence in a distributed stream processor

Integrating Spike

• Double and Halve App

Page 147: How did I get here? Building confidence in a distributed stream processor

Integrating Spike

• Double and Halve App

• No Spiking

Page 148: How did I get here? Building confidence in a distributed stream processor

Integrating Spike

• Double and Halve App

• No Spiking

• Test, Test, Test

Page 149: How did I get here? Building confidence in a distributed stream processor

Integrating Spike

• Double and Halve App

• No Spiking

• Test, Test, Test

• Wesley: It passes! It passes! It passes!

Page 150: How did I get here? Building confidence in a distributed stream processor

Integrating Spike

• Double and Halve App

Page 151: How did I get here? Building confidence in a distributed stream processor

Integrating Spike

• Double and Halve App

• Spike with “drop connection”

Page 152: How did I get here? Building confidence in a distributed stream processor

Integrating Spike

• Double and Halve App

• Spike with “drop connection”

• Test, Test, Test

Page 153: How did I get here? Building confidence in a distributed stream processor

Integrating Spike

• Double and Halve App

• Spike with “drop connection”

• Test, Test, Test

• Wesley: It fails! It fails! It fails!

Page 154: How did I get here? Building confidence in a distributed stream processor

Integrating Spike

Page 155: How did I get here? Building confidence in a distributed stream processor

Integrating Spike

== Session Recovery!

Page 156: How did I get here? Building confidence in a distributed stream processor

Integrating Spike

• Double and Halve App

Page 157: How did I get here? Building confidence in a distributed stream processor

Integrating Spike

• Double and Halve App

• Spike with “drop connection”

Page 158: How did I get here? Building confidence in a distributed stream processor

Integrating Spike

• Double and Halve App

• Spike with “drop connection”

• Test, Test, Test

Page 159: How did I get here? Building confidence in a distributed stream processor

Integrating Spike

• Double and Halve App

• Spike with “drop connection”

• Test, Test, Test

• Wesley: It passes! It passes! It passes!

Page 160: How did I get here? Building confidence in a distributed stream processor

Repeated runs with different results

==

Mostly Useless

Spike

Page 161: How did I get here? Building confidence in a distributed stream processor

Determinism & Spike

Page 162: How did I get here? Building confidence in a distributed stream processor

It's easy to get wrong

Determinism & Spike

Page 163: How did I get here? Building confidence in a distributed stream processor

Determinism & Spike

TCP delivery is not deterministic

Page 164: How did I get here? Building confidence in a distributed stream processor

Determinism & Spike

TCP guarantees:

Per connection in order delivery

Page 165: How did I get here? Building confidence in a distributed stream processor

Determinism & Spike

Per connection in order delivery Per connection duplicate detection

TCP guarantees:

Page 166: How did I get here? Building confidence in a distributed stream processor

Determinism & Spike

Per connection in order delivery Per connection duplicate detection

Per connection retransmission of lost data

TCP guarantees:

Page 167: How did I get here? Building confidence in a distributed stream processor

Determinism & Spike

Per connection in order delivery Per connection duplicate detection

Per connection retransmission of lost data

but it doesn't guarantee determinism

TCP guarantees:

Page 168: How did I get here? Building confidence in a distributed stream processor

Determinism & Spike

TCP delivery is not deterministic

Page 169: How did I get here? Building confidence in a distributed stream processor

Determinism & Spike

TCP delivery is not deterministic

Page 170: How did I get here? Building confidence in a distributed stream processor

Determinism & Spike

TCP delivery is not deterministic

Page 171: How did I get here? Building confidence in a distributed stream processor

Determinism & Spike

TCP delivery is not deterministicPer method call Spiking won't work

Page 172: How did I get here? Building confidence in a distributed stream processor

Determinism & Spike

TCP delivery is not deterministicPer method call Spiking won't work unless we make it work…

Page 173: How did I get here? Building confidence in a distributed stream processor

Determinism & Spike

TCP message framing

Page 174: How did I get here? Building confidence in a distributed stream processor

Determinism & Spike

TCP message framing

Page 175: How did I get here? Building confidence in a distributed stream processor

Determinism & Spike

TCP message framing

Page 176: How did I get here? Building confidence in a distributed stream processor

Determinism & Spike

TCP message framing

Page 177: How did I get here? Building confidence in a distributed stream processor

Determinism & Spike

TCP message framing

Page 178: How did I get here? Building confidence in a distributed stream processor

Determinism & Spike

TCP message framing

Page 179: How did I get here? Building confidence in a distributed stream processor

Determinism & Spike

TCP message framing

Page 180: How did I get here? Building confidence in a distributed stream processor

Determinism & Spike

TCP message framing

Page 181: How did I get here? Building confidence in a distributed stream processor

Determinism & Spike

TCP message framing

Page 182: How did I get here? Building confidence in a distributed stream processor

Determinism & SpikeExpect in action

Page 183: How did I get here? Building confidence in a distributed stream processor

Determinism & SpikeExpect in action

Page 184: How did I get here? Building confidence in a distributed stream processor

Determinism & SpikeExpect in action

Page 185: How did I get here? Building confidence in a distributed stream processor

Determinism & SpikeExpect in action

Page 186: How did I get here? Building confidence in a distributed stream processor

Determinism & SpikeExpect in action

Page 187: How did I get here? Building confidence in a distributed stream processor

Determinism & SpikeExpect in action

Page 188: How did I get here? Building confidence in a distributed stream processor

Determinism & SpikeExpect in action

Page 189: How did I get here? Building confidence in a distributed stream processor

Determinism & SpikeExpect in action

Page 190: How did I get here? Building confidence in a distributed stream processor

Determinism & SpikeExpect in action

Page 191: How did I get here? Building confidence in a distributed stream processor

Determinism & SpikeExpect in action

Page 192: How did I get here? Building confidence in a distributed stream processor

Determinism & Spike

Expect makes received deterministic

Page 193: How did I get here? Building confidence in a distributed stream processor

Determinism & Spike

Expect makes received deterministic

Page 194: How did I get here? Building confidence in a distributed stream processor

Determinism & Spike

Expect makes received deterministic

Page 195: How did I get here? Building confidence in a distributed stream processor

Determinism & Spike

Expect makes received deterministic

Page 196: How did I get here? Building confidence in a distributed stream processor

Determinism & Spike

Expect makes received deterministic

Page 197: How did I get here? Building confidence in a distributed stream processor

Determinism & Spike

Expect makes received deterministic

Page 198: How did I get here? Building confidence in a distributed stream processor

Determinism & Spike

Expect makes received deterministic

Page 199: How did I get here? Building confidence in a distributed stream processor

Determinism & Spike

Received gets called with

Page 200: How did I get here? Building confidence in a distributed stream processor

Determinism & Spike

then…

Page 201: How did I get here? Building confidence in a distributed stream processor

Determinism & Spike

and then another…

Page 202: How did I get here? Building confidence in a distributed stream processor

Determinism & Spike

and finally…

Page 203: How did I get here? Building confidence in a distributed stream processor

Same number of notifier method calls

Determinism & Spike

no matter how the data arrives

Page 204: How did I get here? Building confidence in a distributed stream processor

Drop Connection & Expect fast deterministic friends

Determinism & SpikeDeterminism & Spike

Page 205: How did I get here? Building confidence in a distributed stream processor

Slow Connections

Nemesis #1:

Page 206: How did I get here? Building confidence in a distributed stream processor

Spike: Delay

Page 207: How did I get here? Building confidence in a distributed stream processor

Spike: Delay

Page 208: How did I get here? Building confidence in a distributed stream processor

Spike: Delay

Page 209: How did I get here? Building confidence in a distributed stream processor

Spike: Delay

Page 210: How did I get here? Building confidence in a distributed stream processor

Spike: Delay

Delay overrides expect

Page 211: How did I get here? Building confidence in a distributed stream processor

Spike: Delay

Delay overrides expectand controls the flow of bytes

Page 212: How did I get here? Building confidence in a distributed stream processor

Spike: Delay

Delay overrides expectand controls the flow of bytes

to maintain determinism

Page 213: How did I get here? Building confidence in a distributed stream processor

Spike: Delay

Page 214: How did I get here? Building confidence in a distributed stream processor

Spike: Delay

Page 215: How did I get here? Building confidence in a distributed stream processor

Spike: Delay

Page 216: How did I get here? Building confidence in a distributed stream processor

Spike: Delay

Page 217: How did I get here? Building confidence in a distributed stream processor

Spike: Delay

r TCP

Spike

Page 218: How did I get here? Building confidence in a distributed stream processor

Spike: Delay

r TCP

Spike

Page 219: How did I get here? Building confidence in a distributed stream processor

Spike: Delay

r TCP

Spike

Page 220: How did I get here? Building confidence in a distributed stream processor

Spike: DelayTCP

Page 221: How did I get here? Building confidence in a distributed stream processor

Spike: DelayTCP

TCP

Spike

Page 222: How did I get here? Building confidence in a distributed stream processor

Spike: DelayTCP

TCP

TCP

Spike

Spike

Page 223: How did I get here? Building confidence in a distributed stream processor

Early Results

Page 224: How did I get here? Building confidence in a distributed stream processor

Early Results

• Bugs in Session Recovery

Found…

Page 225: How did I get here? Building confidence in a distributed stream processor

Early Results

• Bugs in Session Recovery

• Bug in Pony standard library

Found…

Page 226: How did I get here? Building confidence in a distributed stream processor

Early Results

• Bugs in Session Recovery

• Bug in Pony standard library

• Bugs in Spike

Found…

Page 227: How did I get here? Building confidence in a distributed stream processor

Early Results

• Bugs in Session Recovery

• Bug in Pony standard library

• Bugs in Spike

• And more bugs…

Found…

Page 228: How did I get here? Building confidence in a distributed stream processor

Determinism is key

Early ResultsFound…

Page 229: How did I get here? Building confidence in a distributed stream processor

Determinism is key

Early Results

but hard to achieve

Found…

Page 230: How did I get here? Building confidence in a distributed stream processor

Data Lineage

Page 231: How did I get here? Building confidence in a distributed stream processor

WARNING!!!Vaporware ahead

Page 232: How did I get here? Building confidence in a distributed stream processor

Output

Data Lineage

How did I get here?

Page 233: How did I get here? Building confidence in a distributed stream processor

Output

Data Lineage

Page 234: How did I get here? Building confidence in a distributed stream processor

Data LineageInput: 1,2,3

Page 235: How did I get here? Building confidence in a distributed stream processor

Data LineageInput: 1,2,3

Expect: 2,4,6

Page 236: How did I get here? Building confidence in a distributed stream processor

Data LineageInput: 1,2,3

Expect: 2,4,6

Get: 4,6

Page 237: How did I get here? Building confidence in a distributed stream processor

Data LineageInput: 1,2,3

Expect: 2,4,6

Get: 4,6

How did we get here? these are not our beautiful results

Page 238: How did I get here? Building confidence in a distributed stream processor

Data LineageInput: 1,2,3

Page 239: How did I get here? Building confidence in a distributed stream processor

Data LineageInput: 1,2,3

Expect: 2,4,6

Page 240: How did I get here? Building confidence in a distributed stream processor

Data LineageInput: 1,2,3

Expect: 2,4,6

Get: 2,6,12

Page 241: How did I get here? Building confidence in a distributed stream processor

Data LineageInput: 1,2,3

Expect: 2,4,6

Get: 2,6,12

¯\_( )_/¯

Page 242: How did I get here? Building confidence in a distributed stream processor

Data Lineage to the Rescue!

Page 243: How did I get here? Building confidence in a distributed stream processor

Data Lineage

Externally verify determinism

Page 244: How did I get here? Building confidence in a distributed stream processor

Data Lineage

Externally verify determinismis it REALLY deterministic?

Page 245: How did I get here? Building confidence in a distributed stream processor

Data Lineage

Find incorrect executions

Page 246: How did I get here? Building confidence in a distributed stream processor

Data Lineage

Find incorrect executionsbugs in Buffy

Page 247: How did I get here? Building confidence in a distributed stream processor

Data LineageInput: 1

Expected: 2

Got: 4

¯\_( )_/¯

Page 248: How did I get here? Building confidence in a distributed stream processor

Data Lineage

Execution path was…

when it should have been

Page 249: How did I get here? Building confidence in a distributed stream processor

Data Lineage

when it should have been

Execution path was…

Page 250: How did I get here? Building confidence in a distributed stream processor

Data Lineage

Useful outside of development

Page 251: How did I get here? Building confidence in a distributed stream processor

Data Lineage

Production Debugging

Page 252: How did I get here? Building confidence in a distributed stream processor

Data Lineage

Production Debugginghow did I get here?

Page 253: How did I get here? Building confidence in a distributed stream processor

Data Lineage

Audit Log

Page 254: How did I get here? Building confidence in a distributed stream processor

Data Lineage

Audit Logwhy did you do that?

Page 255: How did I get here? Building confidence in a distributed stream processor

Data Lineage

Hindsight Machine

Page 256: How did I get here? Building confidence in a distributed stream processor

Building Confidence is difficult

Page 257: How did I get here? Building confidence in a distributed stream processor

and frustrating

Page 258: How did I get here? Building confidence in a distributed stream processor
Page 259: How did I get here? Building confidence in a distributed stream processor

Don't be this dog

Page 260: How did I get here? Building confidence in a distributed stream processor

Be this dog

Page 261: How did I get here? Building confidence in a distributed stream processor
Page 262: How did I get here? Building confidence in a distributed stream processor

Peter Alvaro

http://www.cs.berkeley.edu/~palvaro/molly.pdf

@palvaro

https://www.youtube.com/watch?v=ggCffvKEJmQ

Lineage-driven Fault Injection:

Outwards from the Middle of the Maze:

Page 263: How did I get here? Building confidence in a distributed stream processor

Kyle Kingsbury

https://aphyr.com/tags/Jepsen

@aphyr

Jepsen:

Page 264: How did I get here? Building confidence in a distributed stream processor

Will Wilson

https://www.youtube.com/watch?v=4fFDFbi3tocTesting Distributed Systems w/ Deterministic Simulation:

Page 265: How did I get here? Building confidence in a distributed stream processor

Catie McCaffrey

http://queue.acm.org/detail.cfm?ref=rss&id=2889274

@caitie

The Verification of a Distributed System

The Verification of a Distributed System: A practitioner's guide to increasing confidence in system correctness

2:55 PM Tomorrow in Salon E

Page 266: How did I get here? Building confidence in a distributed stream processor

Inés Sombra

https://www.youtube.com/watch?v=KSdNYi55kjgTesting in a Distributed World:

@randommood

Page 267: How did I get here? Building confidence in a distributed stream processor

http://principlesofchaos.orgPrinciples of Chaos Engineering:

Chaos Engineering

Page 268: How did I get here? Building confidence in a distributed stream processor
Page 269: How did I get here? Building confidence in a distributed stream processor

Thanks

Peter Alvaro Sylvan Clebsch

Zeeshan Lakhani John Mumm Rob Roland

Andrew Turley

Page 270: How did I get here? Building confidence in a distributed stream processor

@SeanTAllenNote:

The 'T' is very important