not less, not more: exactly once, large-scale stream processing in action

91
Exactly Once, Large-Scale Stream Processing in Action Not Less, Not More Paris Carbone Committer @ Apache Flink PhD Candidate @ KTH @FOSDEM 2017 @SenorCarbone

Upload: paris-carbone

Post on 22-Jan-2018

548 views

Category:

Data & Analytics


0 download

TRANSCRIPT

Page 1: Not Less, Not More: Exactly Once, Large-Scale Stream Processing in Action

Exactly Once, Large-Scale Stream Processing in Action

Not Less, Not More

Paris CarboneCommitter @ Apache Flink

PhD Candidate @ KTH

@FOSDEM 2017

@SenorCarbone

Page 2: Not Less, Not More: Exactly Once, Large-Scale Stream Processing in Action

Data Stream Processors

Data Stream Processor

can set up any data pipeline for you

http://edge.alluremedia.com.au/m/l/2014/10/CoolingPipes.jpg

Page 3: Not Less, Not More: Exactly Once, Large-Scale Stream Processing in Action

With Data Stream Processors

Page 4: Not Less, Not More: Exactly Once, Large-Scale Stream Processing in Action

• sub-second latency and high throughput finally coexist

With Data Stream Processors

Page 5: Not Less, Not More: Exactly Once, Large-Scale Stream Processing in Action

• sub-second latency and high throughput finally coexist

• late data* is handled gracefully

With Data Stream Processors

* http://dl.acm.org/citation.cfm?id=2824076

Page 6: Not Less, Not More: Exactly Once, Large-Scale Stream Processing in Action

• sub-second latency and high throughput finally coexist

• late data* is handled gracefully

• and btw data stream pipelines run 365/24/7 consistently without any issues in general.

With Data Stream Processors

* http://dl.acm.org/citation.cfm?id=2824076

Page 7: Not Less, Not More: Exactly Once, Large-Scale Stream Processing in Action

• sub-second latency and high throughput finally coexist

• late data* is handled gracefully

• and btw data stream pipelines run 365/24/7 consistently without any issues in general.

• wait…what?

With Data Stream Processors

* http://dl.acm.org/citation.cfm?id=2824076

Page 8: Not Less, Not More: Exactly Once, Large-Scale Stream Processing in Action

So…what about

Page 9: Not Less, Not More: Exactly Once, Large-Scale Stream Processing in Action

So…what about

handling failures!

application updates

reconfiguring/upgrading the system

adding more/less workers

Page 10: Not Less, Not More: Exactly Once, Large-Scale Stream Processing in Action

So…what about

handling failures!

application updates

reconfiguring/upgrading the system

adding more/less workers

is it realistic to expect the thing run forever correctly?

Page 11: Not Less, Not More: Exactly Once, Large-Scale Stream Processing in Action

we cannot eliminate entropy

Page 12: Not Less, Not More: Exactly Once, Large-Scale Stream Processing in Action

we cannot eliminate entropy

but in a fail-recovery model

…we can turn back time and try again

Page 13: Not Less, Not More: Exactly Once, Large-Scale Stream Processing in Action

Let’s talk about Guarantees

Page 14: Not Less, Not More: Exactly Once, Large-Scale Stream Processing in Action

Let’s talk about Guaranteesguaranteed tuple processing

exactly once!

transactional processing

processing

output

idempotent writes

end-to-end

resilient statehigh availability

deterministic processing

delivery

at-least once

at-most once

Page 15: Not Less, Not More: Exactly Once, Large-Scale Stream Processing in Action

What Guarantees

Page 16: Not Less, Not More: Exactly Once, Large-Scale Stream Processing in Action

1) Processing• Output• Delivery• End-to-End

2)

system

outside world

What Guarantees

Page 17: Not Less, Not More: Exactly Once, Large-Scale Stream Processing in Action

Processing Guarantees

Page 18: Not Less, Not More: Exactly Once, Large-Scale Stream Processing in Action

Processing Guarantees

• Why should we care?

• Processing creates side effects inside the system’s internal state.

• Less or more processing sometimes means incorrect internal state.

Page 19: Not Less, Not More: Exactly Once, Large-Scale Stream Processing in Action

Processing Guarantees

Page 20: Not Less, Not More: Exactly Once, Large-Scale Stream Processing in Action

Processing Guarantees

• At-Most Once: the system might process less (e.g., ignoring data if overloaded)

• At-Least Once: the system might process more (e.g., replaying input)

• Exactly Once: the system behaves as if input data are processed exactly once

Page 21: Not Less, Not More: Exactly Once, Large-Scale Stream Processing in Action

At-Least Once Processing

• Useful when repetition can be tolerated.

• Already offered by logs (e.g., Kafka, Kinesis)

• Manual Logging & Bookkeeping (Storm < v.2)

Page 22: Not Less, Not More: Exactly Once, Large-Scale Stream Processing in Action

At-Least Once Processing

Page 23: Not Less, Not More: Exactly Once, Large-Scale Stream Processing in Action

At-Least Once Processing

1 1 1

1

Page 24: Not Less, Not More: Exactly Once, Large-Scale Stream Processing in Action

At-Least Once Processing

1 1

1

Page 25: Not Less, Not More: Exactly Once, Large-Scale Stream Processing in Action

At-Least Once Processing

1 1

1

Page 26: Not Less, Not More: Exactly Once, Large-Scale Stream Processing in Action

At-Least Once Processing

12 2

2

Page 27: Not Less, Not More: Exactly Once, Large-Scale Stream Processing in Action

Exactly Once Processing

Page 28: Not Less, Not More: Exactly Once, Large-Scale Stream Processing in Action

Exactly Once Processing

It is a bit trickier. We need to make sure that

1. Data leaves sides effects only once

2. Failure Recovery/Re-Scaling do not impact the correct execution of the system

Page 29: Not Less, Not More: Exactly Once, Large-Scale Stream Processing in Action

Exactly Once ProcessingA fine-grained solution…

Maintain a log for each operation*

persistentstore

* http://dl.acm.org/citation.cfm?id=2536229

1

1 2

1 2

3

Page 30: Not Less, Not More: Exactly Once, Large-Scale Stream Processing in Action

Exactly Once ProcessingA fine-grained solution…

Maintain a log for each operation*

Page 31: Not Less, Not More: Exactly Once, Large-Scale Stream Processing in Action

Exactly Once ProcessingA fine-grained solution…

Maintain a log for each operation*

• It allows for fine grained failure recovery and trivial reconfiguration.

• Can be optimised to batch writes

However:

• It requires a finely-tuned performant store

• Can cause aggressive write/append congestion

Page 32: Not Less, Not More: Exactly Once, Large-Scale Stream Processing in Action

Remember cassettes?

Page 33: Not Less, Not More: Exactly Once, Large-Scale Stream Processing in Action

Remember cassettes?

Page 34: Not Less, Not More: Exactly Once, Large-Scale Stream Processing in Action

Remember cassettes?

Page 35: Not Less, Not More: Exactly Once, Large-Scale Stream Processing in Action

Remember cassettes?

durable logs similarly allow you to rollback input

Page 36: Not Less, Not More: Exactly Once, Large-Scale Stream Processing in Action

Remember cassettes?

durable logs similarly allow you to rollback input

Page 37: Not Less, Not More: Exactly Once, Large-Scale Stream Processing in Action

Remember cassettes?

durable logs similarly allow you to rollback inputin parallel

Page 38: Not Less, Not More: Exactly Once, Large-Scale Stream Processing in Action

Remember cassettes?

durable logs similarly allow you to rollback inputin parallel

from specific offsets

Page 39: Not Less, Not More: Exactly Once, Large-Scale Stream Processing in Action

Remember cassettes?

durable logs similarly allow you to rollback inputin parallel

from specific offsets

Page 40: Not Less, Not More: Exactly Once, Large-Scale Stream Processing in Action

Remember cassettes?

durable logs similarly allow you to rollback inputin parallel

from specific offsets

Page 41: Not Less, Not More: Exactly Once, Large-Scale Stream Processing in Action

Exactly Once ProcessingNow a more coarse-grained approach…Turn continuous computation into a series of transactions

Page 42: Not Less, Not More: Exactly Once, Large-Scale Stream Processing in Action

Exactly Once ProcessingNow a more coarse-grained approach…Turn continuous computation into a series of transactions

part 1

Page 43: Not Less, Not More: Exactly Once, Large-Scale Stream Processing in Action

Exactly Once ProcessingNow a more coarse-grained approach…Turn continuous computation into a series of transactions

part 1 part 2

Page 44: Not Less, Not More: Exactly Once, Large-Scale Stream Processing in Action

Exactly Once ProcessingNow a more coarse-grained approach…Turn continuous computation into a series of transactions

part 1 part 2 part 3 part 4

Page 45: Not Less, Not More: Exactly Once, Large-Scale Stream Processing in Action

Exactly Once ProcessingNow a more coarse-grained approach…Turn continuous computation into a series of transactions

part 1 part 2 part 3 part 4

each part either completes or repeats

Page 46: Not Less, Not More: Exactly Once, Large-Scale Stream Processing in Action

Exactly Once ProcessingNow a more coarse-grained approach…Turn continuous computation into a series of transactions

part 1 part 2 part 3 part 4

each part either completes or repeats

also got to capture the global state of the system( ) after processing each part to resume completely if needed

Page 47: Not Less, Not More: Exactly Once, Large-Scale Stream Processing in Action

Coarse Grained Fault Tolerance - Illustrated

Type 1Discrete Execution

(micro-batch)

System State Store

snap

Snap after each part has being processed

Page 48: Not Less, Not More: Exactly Once, Large-Scale Stream Processing in Action

Coarse Grained Fault Tolerance - Illustrated

Type 1Discrete Execution

(micro-batch)prepare part 1

System State Store

snap

Snap after each part has being processed

Page 49: Not Less, Not More: Exactly Once, Large-Scale Stream Processing in Action

Coarse Grained Fault Tolerance - Illustrated

Type 1Discrete Execution

(micro-batch)prepare part 1

System State Store

snap

Snap after each part has being processed

Page 50: Not Less, Not More: Exactly Once, Large-Scale Stream Processing in Action

Coarse Grained Fault Tolerance - Illustrated

Type 1Discrete Execution

(micro-batch)prepare part 1

System State Store

snap

Snap after each part has being processed

Page 51: Not Less, Not More: Exactly Once, Large-Scale Stream Processing in Action

Coarse Grained Fault Tolerance - Illustrated

Type 1Discrete Execution

(micro-batch)prepare part 2

System State Store

snap

Snap after each part has being processed

Page 52: Not Less, Not More: Exactly Once, Large-Scale Stream Processing in Action

Coarse Grained Fault Tolerance - Illustrated

Type 1Discrete Execution

(micro-batch)prepare part 2

System State Store

snap

Snap after each part has being processed

Page 53: Not Less, Not More: Exactly Once, Large-Scale Stream Processing in Action

Coarse Grained Fault Tolerance - Illustrated

Type 1Discrete Execution

(micro-batch)prepare part 2

System State Store

snap

Snap after each part has being processed

Page 54: Not Less, Not More: Exactly Once, Large-Scale Stream Processing in Action

Exactly Once Processing

*http://shivaram.org/drafts/drizzle.pdf

Page 55: Not Less, Not More: Exactly Once, Large-Scale Stream Processing in Action

Exactly Once Processing

Micro-batching:

• A fine example of discretely emulating continuous processing as a series of transactions.

• However:

• It enforces a somewhat inconvenient think-like-a-batch logic for a continuous processing programming model.

• Causes unnecessarily high periodic scheduling latency (can be traded over higher reconfiguration latency by pre-scheduling multiple micro-batches* )

*http://shivaram.org/drafts/drizzle.pdf

Page 56: Not Less, Not More: Exactly Once, Large-Scale Stream Processing in Action

Type 2Long-Running Synchronous

System State Store

snap

Coarse Grained Fault Tolerance - Illustrated

Snap while each part is being processed

Page 57: Not Less, Not More: Exactly Once, Large-Scale Stream Processing in Action

Type 2Long-Running Synchronous

System State Store

snap

halt & snap!

Coarse Grained Fault Tolerance - Illustrated

Snap while each part is being processed

Page 58: Not Less, Not More: Exactly Once, Large-Scale Stream Processing in Action

Type 2Long-Running Synchronous

System State Store

snap

+

halt & snap!

in-transitevents

to replay

Coarse Grained Fault Tolerance - Illustrated

Snap while each part is being processed

Page 59: Not Less, Not More: Exactly Once, Large-Scale Stream Processing in Action

Type 2Long-Running Synchronous

System State Store

snap

+

halt & snap!

in-transitevents

to replay

Coarse Grained Fault Tolerance - Illustrated

Snap while each part is being processed

Page 60: Not Less, Not More: Exactly Once, Large-Scale Stream Processing in Action
Page 61: Not Less, Not More: Exactly Once, Large-Scale Stream Processing in Action

we want to capturedistributed state without

✴ enforcing it in the API or✴ disrupting the execution

Page 62: Not Less, Not More: Exactly Once, Large-Scale Stream Processing in Action

we want to capturedistributed state without

✴ enforcing it in the API or✴ disrupting the execution

also, do we really need those in-transit events?

Page 63: Not Less, Not More: Exactly Once, Large-Scale Stream Processing in Action

“The global-state-detection algorithm is to be superimposed on the underlying computation:

it must run concurrently with, but not alter, this underlying computation”

Leslie Lamport

Page 64: Not Less, Not More: Exactly Once, Large-Scale Stream Processing in Action

Type 3Long Running Pipelined*

System State Store

snap

* https://arxiv.org/abs/1506.08603

Coarse Grained Fault Tolerance - Illustrated

Snap just in time!

Page 65: Not Less, Not More: Exactly Once, Large-Scale Stream Processing in Action

Type 3Long Running Pipelined*

System State Store

snap

insert markers

* https://arxiv.org/abs/1506.08603

Coarse Grained Fault Tolerance - Illustrated

Snap just in time!

Page 66: Not Less, Not More: Exactly Once, Large-Scale Stream Processing in Action

Type 3Long Running Pipelined*

System State Store

snap

* https://arxiv.org/abs/1506.08603

Coarse Grained Fault Tolerance - Illustrated

Snap just in time!

Page 67: Not Less, Not More: Exactly Once, Large-Scale Stream Processing in Action

Type 3Long Running Pipelined*

System State Store

snap

* https://arxiv.org/abs/1506.08603

Coarse Grained Fault Tolerance - Illustrated

Snap just in time!

Page 68: Not Less, Not More: Exactly Once, Large-Scale Stream Processing in Action

Type 3Long Running Pipelined*

System State Store

snap

* https://arxiv.org/abs/1506.08603

align to prioritise records of part 1

Coarse Grained Fault Tolerance - Illustrated

Snap just in time!

Page 69: Not Less, Not More: Exactly Once, Large-Scale Stream Processing in Action

Type 3Long Running Pipelined*

System State Store

snap

* https://arxiv.org/abs/1506.08603

align to prioritise records of part 1

Coarse Grained Fault Tolerance - Illustrated

Snap just in time!

Page 70: Not Less, Not More: Exactly Once, Large-Scale Stream Processing in Action

Type 3Long Running Pipelined*

System State Store

snap

* https://arxiv.org/abs/1506.08603

Coarse Grained Fault Tolerance - Illustrated

Snap just in time!

Page 71: Not Less, Not More: Exactly Once, Large-Scale Stream Processing in Action

Type 3Long Running Pipelined*

System State Store

snap

* https://arxiv.org/abs/1506.08603

Coarse Grained Fault Tolerance - Illustrated

Snap just in time!

Page 72: Not Less, Not More: Exactly Once, Large-Scale Stream Processing in Action

Type 3Long Running Pipelined*

System State Store

snapgot a full snapshot!

with no records in-transit

* https://arxiv.org/abs/1506.08603

Coarse Grained Fault Tolerance - Illustrated

Snap just in time!

Page 73: Not Less, Not More: Exactly Once, Large-Scale Stream Processing in Action

Facts about Flink’s Snapshotting

*http://lamport.azurewebsites.net/pubs/chandy.pdf

Page 74: Not Less, Not More: Exactly Once, Large-Scale Stream Processing in Action

Facts about Flink’s Snapshotting• It pipelines naturally with the data-flow (respecting

back-pressure etc.)

• We can get at-least-once processing guarantees by simply dropping aligning (try it)

• Tailors Chandy-Lamport’s original approach* to dataflow graphs (with minimal snapshot state & messages)

• It can also work for cycles (with a minor modification)

*http://lamport.azurewebsites.net/pubs/chandy.pdf

Page 75: Not Less, Not More: Exactly Once, Large-Scale Stream Processing in Action

Supporting Cycles

Page 76: Not Less, Not More: Exactly Once, Large-Scale Stream Processing in Action

Supporting CyclesProblem: we cannot wait indefinitely for records in cycles

Page 77: Not Less, Not More: Exactly Once, Large-Scale Stream Processing in Action

Supporting CyclesProblem: we cannot wait indefinitely for records in cycles

Solution: log those records as part of the snapshot. Replay upon recovery.

https://github.com/apache/flink/pull/1668

Page 78: Not Less, Not More: Exactly Once, Large-Scale Stream Processing in Action

Output Guarantees

Page 79: Not Less, Not More: Exactly Once, Large-Scale Stream Processing in Action

Is this a thing?

Output Guarantees

Page 80: Not Less, Not More: Exactly Once, Large-Scale Stream Processing in Action

1. Can’t, it’s distributed

Is this a thing?

Output Guarantees

Page 81: Not Less, Not More: Exactly Once, Large-Scale Stream Processing in Action

1. Can’t, it’s distributed

2. Yep easy

Is this a thing?

Output Guarantees

Page 82: Not Less, Not More: Exactly Once, Large-Scale Stream Processing in Action

1. Can’t, it’s distributed

2. Yep easy

3. It depends ;)

Is this a thing?

Output Guarantees

Page 83: Not Less, Not More: Exactly Once, Large-Scale Stream Processing in Action

1. Can’t, it’s distributed

2. Yep easy

3. It depends ;)

Is this a thing?

Output Guarantees

Page 84: Not Less, Not More: Exactly Once, Large-Scale Stream Processing in Action

• Idempontency ~ repeated operations give the same output result. (e.g., Flink’s Cassandra sink*)

• Rolling Files ~ Pipeline output is bucketed and committed when a checkpoint is complete otherwise we roll it back. (see Flink’s HDFS RollingSink**)

*https://ci.apache.org/projects/flink/flink-docs-release-1.2/dev/connectors/cassandra.html**https://ci.apache.org/projects/flink/flink-docs-release-1.2/dev/connectors/filesystem_sink.html

in-progress pending pending committed

Exactly Once Output

Page 85: Not Less, Not More: Exactly Once, Large-Scale Stream Processing in Action

so no design flaws possible… right?

Sir, about that Job Manager…

shootheretodetonate

Page 86: Not Less, Not More: Exactly Once, Large-Scale Stream Processing in Action

>AbortMission!TheyhaveHA

Page 87: Not Less, Not More: Exactly Once, Large-Scale Stream Processing in Action

High Availability

ZookeeperState

• Current Leader (elected) • Pending Pipeline Metadata • State Snapshot Metadata

zab

JM JM JM

Page 88: Not Less, Not More: Exactly Once, Large-Scale Stream Processing in Action

Perks of using Flink today (v1.2)

• Key-space partitioning and key group allocation

• Job Rescaling - from snapshots ;)

• Async state snapshots in Rocksdb

• Managed State Structures - ListState (append only), ValueState, ReducingState

• Externalised Checkpoints for custom cherry picking to rollback.

• Adhoc checkpoints (savepoints)

Page 89: Not Less, Not More: Exactly Once, Large-Scale Stream Processing in Action

Coming up next

Autoscaling Incremental Snapshots

Durable Iterative Processing

Page 90: Not Less, Not More: Exactly Once, Large-Scale Stream Processing in Action

Acknowledgements

• Stephan Ewen, Ufuk Celebi, Aljoscha Krettek (and more folks at dataArtisans)

• Gyula Fóra (King.com)

and all contributors who have put code, effort and thought to build this unique state management system.

Page 91: Not Less, Not More: Exactly Once, Large-Scale Stream Processing in Action

Exactly Once, Large-Scale Stream Processing in Action

Not Less, Not More

@SenorCarbone