understand storm in pictures

100
Storm In Pictures http://zqhxuyuan.github.io/ 2016-7-15

Upload: zqhxuyuan

Post on 11-Apr-2017

68 views

Category:

Technology


5 download

TRANSCRIPT

Page 1: understand Storm in pictures

Storm In Pictures

http://zqhxuyuan.github.io/

2016-7-15

Page 2: understand Storm in pictures
Page 3: understand Storm in pictures
Page 4: understand Storm in pictures

Storm基本构件(What Makes Storm)

DAG

Tuple Tuple Tuple Tuple Tuple

Stream

Spout Bolt

Page 5: understand Storm in pictures

Topology、Stream、Spout、Boltnetwork of spouts and bolts

DAG

Page 6: understand Storm in pictures

Topology、Stream、Spout、Boltunbounded sequence of tuples

Tuple Tuple Tuple Tuple Tuple Tuple Tuple Tuple Tuple

Page 7: understand Storm in pictures

Topology、Stream、Spout、BoltSource of Stream

Page 8: understand Storm in pictures

Topology、Stream、Spout、BoltProcesses input streams,Produces new streams Sink

Page 9: understand Storm in pictures

Topology、Stream、Spout、BoltProcesses input streams,Produces new streams

Page 10: understand Storm in pictures

Message/Tuple Transform

Page 11: understand Storm in pictures

Tuple

Page 12: understand Storm in pictures

Tuple

Page 13: understand Storm in pictures

Tuple

Page 14: understand Storm in pictures

Tuple

Page 15: understand Storm in pictures

Tuple

Page 16: understand Storm in pictures

Tuple

Page 17: understand Storm in pictures

Tuple

Tuple

Tuple

Tuple

Page 18: understand Storm in pictures

Tuple

Tuple

Tuple

Page 19: understand Storm in pictures

Tuple

Tuple

Tuple

Page 20: understand Storm in pictures

Tuple

Tuple

Page 21: understand Storm in pictures

Tuple

TupleTuple

Tuple

Tuple

Tuple

⼀一个Tuple的⽣生命周期1. Spout发射出去 2. 在Stream中流动 3. 被Bolt处理计算 4. 由Bolt再次发送 5. 再次进⼊入消息流 6. 直到被完全处理

②③

Page 22: understand Storm in pictures

Tuple

TupleTuple

Tuple

Tuple

Tuple

✖️

✖️

✖️✖️

✖️

Guaranteeing Message Processing1. At Least Once: Acker 2. Exactly Once: Trident

如果消息处理失败,Storm如何做到消息被重新处理?

Page 23: understand Storm in pictures

Sentence Spout

Split Sentence

Bolt

Word Count Bolt

["the cow jumped over the moon"]

[“cow”]

[“the”]

["jumped”]

["over”]

["the”]

["moon”]

["the”,1]

["jumped”,1]

["cow”,1]

["the”,2]

["over”,1]

["moon”,1]

Storm considers a tuple coming off a spout "fully processed" when the tuple tree has been exhausted

and every message in the tree has been processed

tuple tree🐂⽓气冲天

Page 24: understand Storm in pictures

Sentence Spout

Split Sentence

Bolt

Word Count Bolt

["the cow jumped over the moon"]

[“cow”]

[“the”]

["jumped”]

["over”]

["the”]

["moon”]

["the”,1]

["jumped”,1]

["cow”,1]

["the”,2]

["over”,1]

["moon”,1]

collector.emit("split", new Values("the cow jumped over the moon"), 1)

msgIdstream-idused for identify tuple lateremit a tuple to one of output streams

Tuple Lifecycle(API Layer)

a tuple coming off of a spout

Page 25: understand Storm in pictures

Sentence Spout

Split Sentence

Bolt

Word Count Bolt

["the cow jumped over the moon"]

[“cow”]

[“the”]

["jumped”]

["over”]

["the”]

["moon”]

["the”,1]

["jumped”,1]

["cow”,1]

["the”,2]

["over”,1]

["moon”,1]

collector.emit("split", new Values("the cow jumped over the moon"), 1)

tuple tree fully processed

Tuple Lifecycle(API Layer)

w’ll talk about later

Page 26: understand Storm in pictures

Sentence Spout

Split Sentence

Bolt

Word Count Bolt

["the cow jumped over the moon"]

[“cow”]

[“the”]

["jumped”]

["over”]

["the”]

["moon”]

["the”,1]

["jumped”,1]

["cow”,1]

["the”,2]

["over”,1]

["moon”,1]

collector.emit("split", new Values("the cow jumped over the moon"), 1)

tuple tree failed(time-out)

×

×

Tuple Lifecycle(API Layer)

Page 27: understand Storm in pictures

Sentence Spout

Split Sentence

Bolt

Word Count Bolt

Kestrel /Kafka

Sentence Spout

Split Sentence

Bolt

Word Count Bolt

Kestrel /Kafka

Sentence Spout

Split Sentence

Bolt

Word Count Bolt

Kestrel /Kafka

Sentence Spout

Split Sentence

Bolt

Word Count Bolt

Kestrel /Kafka

Sentence Spout

Split Sentence

Bolt

Word Count Bolt

Kestrel /Kafka

ack(1)

tuple’s mesgId=1

take the message off the queue

Tuple Lifecycle(State Machine)

Page 28: understand Storm in pictures

Sentence Spout

Split Sentence

Bolt

Word Count Bolt

Kestrel /Kafka

Sentence Spout

Split Sentence

Bolt

Word Count Bolt

Kestrel /Kafka

Sentence Spout

Split Sentence

Bolt

Word Count Bolt

Kestrel /Kafka

Sentence Spout

Split Sentence

Bolt

Word Count Bolt

Kestrel /Kafka

Sentence Spout

Split Sentence

Bolt

Word Count Bolt

Kestrel /Kafka

×

put the message back on the queue fail(1)

tuple’s mesgId=1

Tuple Lifecycle(State Machine)

Page 29: understand Storm in pictures

1. tell Storm whenever you're creating a new link in the tree of tuples 2. tell Storm when you have finished processing an individual tuple1. can detect when the tree of tuples is fully processed 2. can ack or fail the spout tuple appropriately.

Sentence Spout

Split Sentence

Bolt

Word Count Bolt

["the cow jumped over the moon"]

[“cow”]

[“the”]

["jumped”]

["over”]

["the”]

["moon”]

["the”,1]

["jumped”,1]

["cow”,1]

["the”,2]

["over”,1]

["moon”,1]

anchored

anchored

anchored

anchored

anchored

anchored

each word tuple is anchored by sentence tuple

Storm:

YOU:

spout tuple

word tuple

Tuple Lifecycle(Program Layer)

Kestrel /Kafka

["the cow jumped over the moon"]

input tuple

output tuple

input tuple

output tuple

Page 30: understand Storm in pictures

1. tell Storm whenever you're creating a new link in the tree of tuples 2. tell Storm when you have finished processing an individual tuple1. can detect when the tree of tuples is fully processed 2. can ack or fail the spout tuple appropriately.

Sentence Spout

Split Sentence

Bolt

Word Count Bolt

[“cow”]

[“the”]

["jumped”]

["over”]

["the”]

["moon”]

["the”,1]

["jumped”,1]

["cow”,1]

["the”,2]

["over”,1]

["moon”,1]

anchored

anchored

anchored

anchored

anchored

anchored

each word-count tuple is anchored by word tuple

Storm:

YOU:

anchored

anchored

anchored

anchored

anchored

anchored

["the cow jumped over the moon"]

word-count tuple

Tuple Lifecycle(Program Layer)

Kestrel /Kafka

["the cow jumped over the moon"]

word tupleinput tuple output tuple

input tuple

output tuple

Page 31: understand Storm in pictures

1. tell Storm whenever you're creating a new link in the tree of tuples 2. tell Storm when you have finished processing an individual tuple1. can detect when the tree of tuples is fully processed 2. can ack or fail the spout tuple appropriately.

Sentence Spout

Split Sentence

Bolt

Word Count Bolt

[“cow”]

[“the”]

["jumped”]

["over”]

["the”]

["moon”]

["the”,1]

["jumped”,1]

["cow”,1]

["the”,2]

["over”,1]

["moon”,1]

anchored

anchored

anchored

anchored

anchored

anchored

Storm:

YOU:

anchored

anchored

anchored

anchored

anchored

anchored

["the cow jumped over the moon"]

Tuple Lifecycle(Program Layer)

Kestrel /Kafka

["the cow jumped over the moon"]

ack word tuple: [“the”]

Page 32: understand Storm in pictures

1. tell Storm whenever you're creating a new link in the tree of tuples 2. tell Storm when you have finished processing an individual tuple1. can detect when the tree of tuples is fully processed 2. can ack or fail the spout tuple appropriately.

Sentence Spout

Split Sentence

Bolt

Word Count Bolt

[“cow”]

[“the”]

["jumped”]

["over”]

["the”]

["moon”]

["the”,1]

["jumped”,1]

["cow”,1]

["the”,2]

["over”,1]

["moon”,1]

anchored

anchored

anchored

anchored

anchored

anchored

Storm:

YOU:

anchored

anchored

anchored

anchored

anchored

anchored

["the cow jumped over the moon"]

Tuple Lifecycle(Program Layer)

Kestrel /Kafka

["the cow jumped over the moon"]

ack word tuple: [“cow”]

Page 33: understand Storm in pictures

1. tell Storm whenever you're creating a new link in the tree of tuples 2. tell Storm when you have finished processing an individual tuple1. can detect when the tree of tuples is fully processed 2. can ack or fail the spout tuple appropriately.

Sentence Spout

Split Sentence

Bolt

Word Count Bolt

[“cow”]

[“the”]

["jumped”]

["over”]

["the”]

["moon”]

["the”,1]

["jumped”,1]

["cow”,1]

["the”,2]

["over”,1]

["moon”,1]

anchored

anchored

anchored

anchored

anchored

anchored

Storm:

YOU:

anchored

anchored

anchored

anchored

anchored

anchored

["the cow jumped over the moon"]

Tuple Lifecycle(Program Layer)

Kestrel /Kafka

["the cow jumped over the moon"]

ack word tuple: [“moon”]

Page 34: understand Storm in pictures

1. tell Storm whenever you're creating a new link in the tree of tuples 2. tell Storm when you have finished processing an individual tuple1. can detect when the tree of tuples is fully processed 2. can ack or fail the spout tuple appropriately.

Sentence Spout

Split Sentence

Bolt

Word Count Bolt

[“cow”]

[“the”]

["jumped”]

["over”]

["the”]

["moon”]

["the”,1]

["jumped”,1]

["cow”,1]

["the”,2]

["over”,1]

["moon”,1]

anchored

anchored

anchored

anchored

anchored

anchored

anchored

anchored

anchored

anchored

anchored

anchored

["the cow jumped over the moon"]

Storm:

YOU:

Tuple Lifecycle(Program Layer)

Kestrel /Kafka

["the cow jumped over the moon"]

ack sentence tuple: [“the cow jumped over the moon”]

the input tuple is acked after all the word tuples are emitted

input tuple

word tuples

Page 35: understand Storm in pictures

1. tell Storm whenever you're creating a new link in the tree of tuples 2. tell Storm when you have finished processing an individual tuple1. can detect when the tree of tuples is fully processed 2. can ack or fail the spout tuple appropriately.

Sentence Spout

Split Sentence

Bolt

Word Count Bolt

[“cow”]

[“the”]

["jumped”]

["over”]

["the”]

["moon”]

["the”,1]

["jumped”,1]

["cow”,1]

["the”,2]

["over”,1]

["moon”,1]

anchored

anchored

anchored

anchored

anchored

anchored

anchored

anchored

anchored

anchored

anchored

anchored

["the cow jumped over the moon"]

Storm:

YOU:

Kestrel /Kafka

tuple tree full processed

ack(msgId=1)

Tuple Lifecycle(Program Layer)

["the cow jumped over the moon"]

Page 36: understand Storm in pictures

1. tell Storm whenever you're creating a new link in the tree of tuples 2. tell Storm when you have finished processing an individual tuple1. can detect when the tree of tuples is fully processed 2. can ack or fail the spout tuple appropriately.

Sentence Spout

Split Sentence

Bolt

Word Count Bolt

[“cow”]

[“the”]

["jumped”]

["over”]

["the”]

["moon”]

["the”,1]

["jumped”,1]

["cow”,1]

["the”,2]

["over”,1]

["moon”,1]

anchored

anchored

anchored

anchored

anchored

anchored

anchored

anchored

anchored

anchored

anchored

anchored

["the cow jumped over the moon"]

Storm:

YOU:

Kestrel /Kafka

tuple tree full processed

ack(msgId=1)

Tuple Lifecycle(Program Layer)

Page 37: understand Storm in pictures

1. tell Storm whenever you're creating a new link in the tree of tuples 2. tell Storm when you have finished processing an individual tuple1. can detect when the tree of tuples is fully processed 2. can ack or fail the spout tuple appropriately.

Sentence Spout

Split Sentence

Bolt

Word Count Bolt

[“cow”]

[“the”]

["jumped”]

["over”]

["the”]

["moon”]

["the”,1]

["jumped”,1]

["cow”,1]

["the”,2]

["over”,1]

["moon”,1]

anchored

anchored

anchored

anchored

anchored

anchored

anchored

anchored

anchored

anchored

anchored

anchored

["the cow jumped over the moon"]

Storm:

YOU:

Kestrel /Kafka

Tuple Lifecycle(Program Layer)

Since the word tuple is anchored, the spout tuple at the root of the tree

w’be replayed later on if the word tuple failed to be processed downstream

["the cow jumped over the moon"]

tuple tree failed

fail(msgId=1)

××

this.collector.fail(tuple)

Page 38: understand Storm in pictures

Sentence Spout

Split Sentence

Bolt

Word Count Bolt

[“cow”]

[“the”]

["jumped”]

["over”]

["the”]

["moon”]

["the”,1]

["jumped”,1]

["cow”,1]

["the”,2]

["over”,1]

["moon”,1]

["the cow jumped over the moon"]

Kestrel /Kafka

["the cow jumped over the moon"]

Page 39: understand Storm in pictures

Sentence Spout

Split Sentence

Bolt

Word Count Bolt

[“cow”]

[“the”]

["jumped”]

["over”]

["the”]

["moon”]

["the”,1]

["jumped”,1]

["cow”,1]

["the”,2]

["over”,1]

["moon”,1]

["the cow jumped over the moon"]

Kestrel /Kafka

×

×

×

Page 40: understand Storm in pictures

tuple1

tuple2

tuple3

input tuple

output tuple

multi-anchored tuple

tuple1

tuple2

tuple3×tuple1

tuple2

tuple3

replay…tuple3 failed

Page 41: understand Storm in pictures

ONE MORE THING

+ reading an input tuple, + emitting tuples based on it + and then acking the tuple at the end of the execute()

Every tuple you process must be acked or failed. Storm uses memory to track each tuple, so if you don't ack/fail every tuple, the task will eventually run OOM.

STORM DO IT FOR YOU!

YOU DON’T NEED Attention Anchor & Ack Anymore ✅

Page 42: understand Storm in pictures

Acker

Page 43: understand Storm in pictures

Spout数据源发射⼀一个Tuple,怎么算被完全处理?

Spout Bolt1 Bolt2 Bolt3tuple1

Page 44: understand Storm in pictures

tuple1

SentenceSpout

tuple1 tuple3

SplitBolt

["the cow jumped..”] tuple4

tuple2

tuple6

tuple7

tuple5

tuple3

tuple4

tuple2

[“the”]

["cow”]

["jumped”]

["cow”,1]

["the”,1]

["jumped”,1]

["the cow jumped.”]

tuple6

tuple7

tuple5

WordCountBolt PrintBolt

Tuple Tree🌲

Page 45: understand Storm in pictures

在Spout中发射⼀一个新的源Tuple时, 可以为该源Tuple指定⼀一个MessageId。 多个源Tuple可以共⽤用同⼀一个MessageId, 表⽰示多个源Tuple组成同⼀一个消息单元, 它们会被放到同⼀一棵Tuple树中

tuple1

tuple2

Spout

tuple1 tuple3

Bolt1

tuple2 tuple4

Bolt2

tuple3 tuple5

tuple4 tuple6

Bolt3

Bolt4

tuple5

tuple6

Bolt5

collector.emit(new Values(tuple1), Message1); collector.emit(new Values(tuple2), Message1);

collector.emit(new Values(tuple1), Message1); collector.emit(new Values(tuple2), Message2);

Tuple Tree🌲🌲🌲

Message1

Page 46: understand Storm in pictures

1. Spout中Message1绑定了tuple1和tuple2(同⼀一个MessageId) 2. tuple1发送给Bolt1处理,tuple2发送给Bolt2处理 3. Bolt1处理tuple1⽣生成tuple3,Bolt2处理tuple2⽣生成tuple4 4. Bolt1⽣生成的tuple3流向Bolt3,Bolt2⽣生成的tuple4流向Bolt4 5. Bolt3处理tuple3⽣生成tuple5,Bolt4处理tuple4⽣生成tuple6 6. Bolt3⽣生成的tuple5和Bolt4⽣生成的tuple6都流向了同⼀一个Bolt5 7. Bolt5处理完tuple5和tuple6,表⽰示Message1被完全处理了

tuple1

tuple2

Spout

tuple1 tuple3

Bolt1

tuple2 tuple4

Bolt2

tuple3 tuple5

tuple4 tuple6

Bolt3

Bolt4

tuple5

tuple6

Bolt5

Page 47: understand Storm in pictures

1. Spout中Message1绑定了tuple1和tuple2(同⼀一个MessageId) 2. tuple1发送给Bolt1处理,tuple2发送给Bolt2处理 3. Bolt1处理tuple1⽣生成tuple3,Bolt2处理tuple2⽣生成tuple4 4. Bolt1⽣生成的tuple3流向Bolt3,Bolt2⽣生成的tuple4流向Bolt4 5. Bolt3处理tuple3⽣生成tuple5,Bolt4处理tuple4⽣生成tuple6 6. Bolt3⽣生成的tuple5和Bolt4⽣生成的tuple6都流向了同⼀一个Bolt5 7. Bolt5处理完tuple5和tuple6,表⽰示Message1被完全处理了

tuple1

tuple2

Spout

tuple1 tuple3

Bolt1

tuple2 tuple4

Bolt2

tuple3 tuple5

tuple4 tuple6

Bolt3

Bolt4

tuple5

tuple6

Bolt5

Page 48: understand Storm in pictures

tuple1

tuple2

Spout

tuple1 tuple3

Bolt1

tuple2 tuple4

Bolt2

tuple3 tuple5

tuple4 tuple6

Bolt3

Bolt4

tuple5

tuple6

Bolt5

1. Spout中Message1绑定了tuple1和tuple2(同⼀一个MessageId) 2. tuple1发送给Bolt1处理,tuple2发送给Bolt2处理 3. Bolt1处理tuple1⽣生成tuple3,Bolt2处理tuple2⽣生成tuple4 4. Bolt1⽣生成的tuple3流向Bolt3,Bolt2⽣生成的tuple4流向Bolt4 5. Bolt3处理tuple3⽣生成tuple5,Bolt4处理tuple4⽣生成tuple6 6. Bolt3⽣生成的tuple5和Bolt4⽣生成的tuple6都流向了同⼀一个Bolt5 7. Bolt5处理完tuple5和tuple6,表⽰示Message1被完全处理了

Page 49: understand Storm in pictures

tuple1

tuple2

Spout

tuple1 tuple3

Bolt1

tuple2 tuple4

Bolt2

tuple3 tuple5

tuple4 tuple6

Bolt3

Bolt4

tuple5

tuple6

Bolt5

1. Spout中Message1绑定了tuple1和tuple2(同⼀一个MessageId) 2. tuple1发送给Bolt1处理,tuple2发送给Bolt2处理 3. Bolt1处理tuple1⽣生成tuple3,Bolt2处理tuple2⽣生成tuple4 4. Bolt1⽣生成的tuple3流向Bolt3,Bolt2⽣生成的tuple4流向Bolt4 5. Bolt3处理tuple3⽣生成tuple5,Bolt4处理tuple4⽣生成tuple6 6. Bolt3⽣生成的tuple5和Bolt4⽣生成的tuple6都流向了同⼀一个Bolt5 7. Bolt5处理完tuple5和tuple6,表⽰示Message1被完全处理了

Page 50: understand Storm in pictures

tuple1

tuple2

Spout

tuple1 tuple3

Bolt1

tuple2 tuple4

Bolt2

tuple3 tuple5

tuple4 tuple6

Bolt3

Bolt4

tuple5

tuple6

Bolt5

1. Spout中Message1绑定了tuple1和tuple2(同⼀一个MessageId) 2. tuple1发送给Bolt1处理,tuple2发送给Bolt2处理 3. Bolt1处理tuple1⽣生成tuple3,Bolt2处理tuple2⽣生成tuple4 4. Bolt1⽣生成的tuple3流向Bolt3,Bolt2⽣生成的tuple4流向Bolt4 5. Bolt3处理tuple3⽣生成tuple5,Bolt4处理tuple4⽣生成tuple6 6. Bolt3⽣生成的tuple5和Bolt4⽣生成的tuple6都流向了同⼀一个Bolt5 7. Bolt5处理完tuple5和tuple6,表⽰示Message1被完全处理了

Page 51: understand Storm in pictures

tuple1

tuple2

Spout

tuple1 tuple3

Bolt1

tuple2 tuple4

Bolt2

tuple3 tuple5

tuple4 tuple6

Bolt3

Bolt4

tuple5

tuple6

Bolt5

1. Spout中Message1绑定了tuple1和tuple2(同⼀一个MessageId) 2. tuple1发送给Bolt1处理,tuple2发送给Bolt2处理 3. Bolt1处理tuple1⽣生成tuple3,Bolt2处理tuple2⽣生成tuple4 4. Bolt1⽣生成的tuple3流向Bolt3,Bolt2⽣生成的tuple4流向Bolt4 5. Bolt3处理tuple3⽣生成tuple5,Bolt4处理tuple4⽣生成tuple6 6. Bolt3⽣生成的tuple5和Bolt4⽣生成的tuple6都流向了同⼀一个Bolt5 7. Bolt5处理完tuple5和tuple6,表⽰示Message1被完全处理了

Page 52: understand Storm in pictures

tuple1

tuple2

Spout

tuple1 tuple3

Bolt1

tuple2 tuple4

Bolt2

tuple3 tuple5

tuple4 tuple6

Bolt3

Bolt4

tuple5

tuple6

Bolt5

1. Spout中Message1绑定了tuple1和tuple2(同⼀一个MessageId) 2. tuple1发送给Bolt1处理,tuple2发送给Bolt2处理 3. Bolt1处理tuple1⽣生成tuple3,Bolt2处理tuple2⽣生成tuple4 4. Bolt1⽣生成的tuple3流向Bolt3,Bolt2⽣生成的tuple4流向Bolt4 5. Bolt3处理tuple3⽣生成tuple5,Bolt4处理tuple4⽣生成tuple6 6. Bolt3⽣生成的tuple5和Bolt4⽣生成的tuple6都流向了同⼀一个Bolt5 7. Bolt5处理完tuple5和tuple6,表⽰示Message1被完全处理了

Message1✅

Page 53: understand Storm in pictures
Page 54: understand Storm in pictures

Spout Bolt1 Bolt2 Bolt3tuple1 tuple2 tuple3

完全处理: 源Tuple以及由该源Tuple衍⽣生的所有Tuple都经过了Topology中每⼀一个应该到达的Bolt的处理

tuple1

tuple1 tuple2

tuple2 tuple3

tuple3

Spout发射TupleBolt1接收Tuple1 Bolt1处理Tuple1 Bolt1发射Tuple2

Bolt2接收Tuple2 Bolt2处理Tuple2 Bolt2发射Tuple3

Bolt3接收Tuple3 Bolt3处理Tuple3

spout-tuple-1 processed table:只有全部为Y,才表⽰示完全处理

Spout Bolt1 Bolt2 Bolt3tuple1 tuple2tuple1

tuple1

tuple2

Spout Bolt1 Bolt2 Bolt3tuple1 tuple2 tuple3tuple1

tuple1 tuple2

tuple2 tuple3

✅ × ×

×

×

Page 55: understand Storm in pictures

Spout Bolt1 Bolt2 Bolt3tuple1 tuple24tuple1

tuple23

tuple25

tuple26

tuple22

tuple21

tuple27

……

……

tuple33

tuple32

tuple34

tuple35

tuple31

What would spout-tuple-1 processing table like?

A REALLY LARGE/HUGE TABLE!!!

Page 56: understand Storm in pictures

Spout Bolt1 Bolt2 Bolt3

AckerBolt

tuple1 tuple2 tuple3

ack_value

tuple1 tuple2 tuple3

ack()/fail()?

emit

ack

emit em

it

ackack

Acker组件:跟踪Spout发出的每⼀一个Tuple的Tuple🌲

🌲

1. emit(tuple, …) 2. ack(tuple)

Page 57: understand Storm in pictures

Solution1:拉链式

Page 58: understand Storm in pictures

Spout Bolt1 Bolt2 Bolt3

AckerBolt

tuple1 tuple2 tuple3

ack_value

tuple1 tuple2 tuple3

ack()/fail()?

emit

ack

emit em

it

ackack

🌲

Page 59: understand Storm in pictures

Spout Bolt1 Bolt2 Bolt3

AckerBolt

tuple1 tuple2 tuple3

ack_value

tuple1 tuple2 tuple3

ack()/fail()?

emit

ack

emit em

it

ackack

🌲

Page 60: understand Storm in pictures

Spout Bolt1 Bolt2 Bolt3

AckerBolt

tuple1 tuple2 tuple3

ack_value

tuple1 tuple2 tuple3

ack()/fail()?

emit

ack

emit em

it

ackack

🌲

Page 61: understand Storm in pictures

Spout Bolt1 Bolt2 Bolt3

AckerBolt

tuple1 tuple2 tuple3

ack_value

tuple1 tuple2 tuple3

ack()/fail()?

emit

ack

emit em

it

ackack

🌲

Page 62: understand Storm in pictures

Spout Bolt1 Bolt2 Bolt3

AckerBolt

tuple1 tuple2 tuple3

ack_value

tuple1 tuple2 tuple3

ack()/fail()?

emit

ack

emit em

it

ackack

🌲

Page 63: understand Storm in pictures

Spout Bolt1 Bolt2 Bolt3

AckerBolt

tuple1 tuple2 tuple3

ack_value

tuple1 tuple2 tuple3

ack()/fail()?

emit

ack

emit em

it

ackack

🌲

Page 64: understand Storm in pictures

Spout Bolt1 Bolt2 Bolt3

AckerBolt

tuple1 tuple2 tuple3

ack_value

tuple1 tuple2 tuple3

ack()/fail()?

emit

ack

emit em

it

ackack

🌲

Page 65: understand Storm in pictures

Solution1:渐进式

Page 66: understand Storm in pictures

Spout Bolt1 Bolt2 Bolt3

AckerBolt

tuple1 tuple2 tuple3

ack_value

tuple1 tuple2 tuple3

ack()/fail()?

emit

ack

emit em

it

ackack

🌲

Page 67: understand Storm in pictures

Spout Bolt1 Bolt2 Bolt3

AckerBolt

tuple1 tuple2 tuple3

ack_value

tuple1 tuple2 tuple3

ack()/fail()?

emit

ack

emit em

it

ackack

🌲

Page 68: understand Storm in pictures

Spout Bolt1 Bolt2 Bolt3

AckerBolt

tuple1 tuple2 tuple3

ack_value

tuple1 tuple2 tuple3

ack()/fail()?

emit

ack

emit em

it

ackack

🌲

Page 69: understand Storm in pictures

Spout Bolt1 Bolt2 Bolt3

AckerBolt

tuple1 tuple2 tuple3

ack_value

tuple1 tuple2 tuple3

ack()/fail()?

emit

ack

emit em

it

ackack

🌲

Page 70: understand Storm in pictures

Spout Bolt1 Bolt2 Bolt3

AckerBolt

tuple1 tuple2 tuple3

ack_value

tuple1 tuple2 tuple3

ack()/fail()?

emit

ack

emit em

it

ackack

🌲

Page 71: understand Storm in pictures

Spout Bolt1 Bolt2 Bolt3

AckerBolt

tuple1 tuple2 tuple3

ack_value

tuple1 tuple2 tuple3

ack()/fail()?

emit

ack

emit em

it

ackack

🌲

Page 72: understand Storm in pictures

Spout Bolt1 Bolt2 Bolt3

AckerBolt

tuple1 tuple2 tuple3

ack_value

tuple1 tuple2 tuple3

ack()/fail()?

emit

ack

emit em

it

ackack

🌲

Page 73: understand Storm in pictures

How Storm Implements Acker…How does Storm implement reliability in an efficient way?

Page 74: understand Storm in pictures

A Storm topology has a set of special "acker" tasks that track the DAG of tuples for every spout tuple. When an acker sees that a DAG is complete, it sends a message to the spout task that created the spout tuple to ack the message.

1. Acker can have many tasks just like Spout/Bolt 2. DAG of tuples is a Tuple Tree which 3. generate by Spout #tuple(by one of Spout task) 4. The Spout #tuple associated with a MessageId 5. When all tuples on Tuple Tree are full processed 6. Acker send a message to the Spout task on #3 7. Spout can ack the Message along with #tuple

Page 75: understand Storm in pictures

理解Storm可靠性的最好的⽅方法是来看看tuple和tuple树的⽣生命周期,当⼀一个tuple被创建,不管是spout还是bolt创建的,它会被赋予⼀一个64位的id,⽽而acker就是利⽤用这个id去跟踪所有tuple的。每个tuple知道它的祖宗的id(从spout发出来的那个tuple的id,⼀一棵tuple树的root tuple-id是固定的), 每当你新发射⼀一个tuple, 它的祖宗id都会传给这个新的tuple。当⼀一个tuple被ack的时候,会发⼀一个消息给acker,告诉acker这个tuple树发⽣生了怎么样的变化。 具体来说就是它告诉acker: 我已经完成了,我有这些⼉儿⼦子tuple, 你跟踪⼀一下他们吧。

The best way to understand Storm's reliability implement is to look at the lifecycle of tuples and tuple DAGs. When a tuple is created in a topology, whether in a spout or a bolt, it is given a random 64 bit id. These ids are used by ackers to track the tuple DAG for every spout tuple.

Every tuple knows the ids of all the spout tuples for which it exists in their tuple trees. When you emit a new tuple in a bolt, the spout tuple ids from the tuple's anchors are copied into the new tuple. When a tuple is acked, it sends a message to the appropriate acker tasks with information about how the tuple tree changed. In particular it tells the acker "I am now completed within the tree for this spout tuple, and here are the new tuples in the tree that were anchored to me".

Page 76: understand Storm in pictures

When a tuple is acked, it sends a message to the appropriate acker tasks with information about how the tuple tree changed. In particular it tells the acker "I am now completed within the tree for this spout tuple, and here are the new tuples in the tree that were anchored to me"

For example, if tuples "D" and "E" were created based on tuple "C", here's how the tuple tree changes when "C" is acked: Since "C" is removed from the tree at the same time that "D" and "E" are added to it, the tree can never be prematurely completed.

1. Bolt emit 时不会向Acker发送消息,Bolt ack 时才会向Acker发送消息 2. ack时知道要ack的input tuple的id和emit时产⽣生的所有output tuple的ids 3. 所以ack时可以把input tuple id和emit的所有output tuple ids先计算好后 才向Acker发送消息 4. Acker收到Bolt的ack消息,将当前的ack val和收到的ack消息进⾏行计算, 得到的结果表⽰示tuple树的变化情况

5. Bolt⼀一旦对input tuple进⾏行ack后,从当前input tuple⼀一直回溯到 root tuple都不再需要保存相关信息 只需要在Acker中保存最新emit出来的output tuples

为什么不需要记录祖先tuple-id(不仅仅是spout tuple id,也包括上游输⼊入tuple)

Page 77: understand Storm in pictures

Spout Bolt1 Bolt2 Bolt3

AckerBolt

tuple1 tuple2 tuple3

ack_value

tuple1 tuple2 tuple3

ack()/fail()?

emit

ack ackack

Acker组件:跟踪Spout发出的每⼀一个Tuple的Tuple🌲

🌲

1. emit(tuple, …) 2. ack(tuple)

emit emit

Page 78: understand Storm in pictures

Spout Bolt1 Bolt2 Bolt3

AckerBolt

tuple1 tuple2 tuple3

ack_value

tuple1 tuple2 tuple3

ack()/fail()?

emit

ack ackack

🌲

emit emit

tuple1

Page 79: understand Storm in pictures

Spout Bolt1 Bolt2 Bolt3

AckerBolt

tuple1 tuple2 tuple3

ack_value

tuple1 tuple2 tuple3

ack()/fail()?

emit

ack ackack

🌲

emit emit

tuple1 tuple2×

× ×

×:表⽰示⽗父tuple已经完成,Acker需要跟踪⼦子tuples

Page 80: understand Storm in pictures

Spout Bolt1 Bolt2 Bolt3

AckerBolt

tuple1 tuple2 tuple3

ack_value

tuple1 tuple2 tuple3

ack()/fail()?

emit

ack ackack

🌲

emit emit

tuple1 tuple2× × tuple3

× ×× ×

Page 81: understand Storm in pictures

Spout Bolt1 Bolt2 Bolt3

AckerBolt

tuple1 tuple2 tuple3

ack_value

tuple1 tuple2 tuple3

ack()/fail()?

emit

ack ackack

🌲

emit emit

tuple1 tuple2× × tuple3

× × ×

×

× ××

Page 82: understand Storm in pictures

Spout Bolt1 Bolt2 Bolt3

AckerBolt

tuple1 tuple2 tuple3

ack_value

tuple1 tuple2 tuple3

ack()/fail()?

emit

ack ackack

🌲

emit emit

tuple1 tuple2× × tuple3

× × ×

×

× ××

Page 83: understand Storm in pictures

⼀一点代数知识

⾃自⼰己和⾃自⼰己^异或^⼀一定等于0 0000 ^ 0000 ———

0000

0

0

1

1

0

1

1

0^

100 1

0001 ^ 0001 ———

0000

0010 ^ 0010 ———

0000

0011 ^ 0011 ———

0000

0100 ^ 0100 ———

0000

010100110110010011 ^ 010100110110010011 ——————————— 000000000000000000

两个不相同(不是⾃自⼰己和⾃自⼰己)异或不为0 0000 ^ 0001 ———

0001

0001 ^ 1001 ———

1000

0010 ^ 0110 ———

0100

0011 ^ 0010 ———

0001

1100 ^ 0100 ———

1000

010100110110010011 ^ 010100111110010011 ——————————— 000000001000000000

0

1

0

1

0

1

1

0

0

1

Page 84: understand Storm in pictures

那么有没有办法得到0呢?

0000 ^ 0001 ———

0001

0001 ^ 1100 ———

1101

1101 ^ 0010 ——— 1111

1111 ^ 1001 ———

0110

0110 ^ 0110 ———

0000

0^X1=X1 X1^X2=X3 X3^X4=X5 X5^X6=X7 X7^X7= 0

X1

X1

X2

X3

X4

X5

X6

X7

X7

⾃自⼰己和⾃自⼰己异或⼀一定等于0

0001 1100

0000

0001 1101

0010

11010001

1111

1111

1001

0110

0110

0110

0000

X1 X2 X4 X6 X7

Page 85: understand Storm in pictures

Spout Bolt10001 Bolt21010 Bolt30011

Spout/Bolt发射Tuple时都会为Tuple⽣生成⼀一个ID Spout/Bolt有往下游发射Tuple,必须有Bolt接收 最后⼀一个Bolt没有发射Tuple,表⽰示Topology结束

0001 1010 0011

发射 接收 发射 接收 发射 接收

0001 1010 00110001 1010 0011^ ^ ^ ^ ^

( )0001 1010 00110001 1010 0011^ ^ ^ ^ ^( ) ( )

0000 0000 0000^ ^

0000

tuple1 tuple1 tuple2 tuple3tuple2 tuple3

Page 86: understand Storm in pictures

0001 1010 00110001 1010 0011^ ^ ^ ^ ^

( )0001 1010 00110001 1010 0011^ ^ ^ ^ ^( ) ( )

0000 0000 0000^ ^

0000

Spout Bolt10001 Bolt21010 Bolt300110001 1010 0011

Spout发射⼀一个Tuple,id=0001,Acker跟踪此spout tuple

tuple1 tuple1 tuple2 tuple3tuple2 tuple3

Page 87: understand Storm in pictures

0001 1010 00110001 1010 0011^ ^ ^ ^ ^

( )0001 1010 00110001 1010 0011^ ^ ^ ^ ^( ) ( )

0000 0000 0000^ ^

0000

Spout Bolt10001 Bolt21010 Bolt300110001 1010 0011

Bolt1接收到Spout发射的input tuple,但还没有处理,不会和Acker通信

tuple1 tuple1 tuple2 tuple3tuple2 tuple3

Page 88: understand Storm in pictures

0001 1010 00110001 1010 0011^ ^ ^ ^ ^

( )0001 1010 00110001 1010 0011^ ^ ^ ^ ^( ) ( )

0000 0000 0000^ ^

0000

Spout Bolt10001 Bolt21010 Bolt300110001 1010 0011

Bolt1发射新的Tuple:1010,并且对input tuple=tuple1进⾏行ack,会和Acker通信

tuple1 tuple1 tuple2 tuple3tuple2 tuple3

Page 89: understand Storm in pictures

0001 1010 00110001 1010 0011^ ^ ^ ^ ^

( )0001 1010 00110001 1010 0011^ ^ ^ ^ ^( ) ( )

0000 0000 0000^ ^

0000

Spout Bolt10001 Bolt21010 Bolt300110001 1010 0011

Acker中只会保留新⽣生成的⼦子tuple=tuple2的id,祖先tuple ids不会记录

tuple1 tuple1 tuple2 tuple3tuple2 tuple3

Page 90: understand Storm in pictures

0001 1010 00110001 1010 0011^ ^ ^ ^ ^

( )0001 1010 00110001 1010 0011^ ^ ^ ^ ^( ) ( )

0000 0000 0000^ ^

0000

Spout Bolt10001 Bolt21010 Bolt300110001 1010 0011

Bolt2接收tuple2,处理tuple2,发射⼦子tuple=tuple3,ack(tuple2)

tuple1 tuple1 tuple2 tuple3tuple2 tuple3

Page 91: understand Storm in pictures

0001 1010 00110001 1010 0011^ ^ ^ ^ ^

( )0001 1010 00110001 1010 0011^ ^ ^ ^ ^( ) ( )

0000 0000 0000^ ^

0000

Spout Bolt10001 Bolt21010 Bolt300110001 1010 0011

Acker中只会保留新⽣生成的⼦子tuple=tuple3的id,祖先tuple ids不会记录

tuple1 tuple1 tuple2 tuple3tuple2 tuple3

Page 92: understand Storm in pictures

0001 1010 00110001 1010 0011^ ^ ^ ^ ^

( )0001 1010 00110001 1010 0011^ ^ ^ ^ ^( ) ( )

0000 0000 0000^ ^

0000

Spout Bolt10001 Bolt21010 Bolt300110001 1010 0011

Bolt3接收tuple3,处理tuple3,不再发射新tuple,ack(tuple3)

tuple1 tuple1 tuple2 tuple3tuple2 tuple3

Page 93: understand Storm in pictures

0001 1010 00110001 1010 0011^ ^ ^ ^ ^

( )0001 1010 00110001 1010 0011^ ^ ^ ^ ^( ) ( )

0000 0000 0000^ ^

0000

Spout Bolt10001 Bolt21010 Bolt300110001 1010 0011tuple1 tuple1 tuple2 tuple3tuple2 tuple3

没有新⽣生成的tuple,Acker的ack_val=0,表⽰示TupleTree完全处理✅

Page 94: understand Storm in pictures

(spout-tuple-id, tmp-ack-val) tmp-ack-val = spout-tuple-id ^ (child-tuple-id1 ^ child-tuple-id2 ... ) tmp-ack-val是要ack的tuple的id与由它新创建的所有的tuple的id异或的结果

以spout产⽣生spout-tuple-id为例(tuple1),Bolt1产⽣生bolt1-tuple-id(tuple2), Bolt2产⽣生bolt2-tuple-id(tuple3),Bolt3不产⽣生tuple。

Spout发射Tuple1,Acker记录tuple1的id,⽤用于跟踪spout-tuple

tmp-ack-val = spout-tuple-id

Bolt1处理Spout的tuple1,发射tuple2,并ack Spout的tuple1

tmp-ack-val = spout-tuple-id ^ (spout-tuple-id ^ bolt1-tuple-id) = (spout-tuple-id ^ spout-tuple-id) ^ bolt1-tuple-id = 0 ^ bolt1-tuple-id = bolt1-tuple-id

Bolt2处理Bolt1的tuple2,发射tuple3,并ack Bolt1的tuple2

tmp-ack-val = spout-tuple-id ^ (spout-tuple-id ^ bolt1-tuple-id) ^ (bolt1-tuple-id ^ bolt2-tuple-id) = (spout-tuple-id ^ spout-tuple-id) ^ (bolt1-tuple-id ^ bolt1-tuple-id) ^ bolt2-tuple-id = 0 ^ 0 ^ bolt2-tuple-id = bolt2-tuple-id

Bolt3处理Bolt2的tuple3,不发射tuple,并ack Bolt2的tuple3

tmp-ack-val = spout-tuple-id ^ (spout-tuple-id ^ bolt1-tuple-id) ^ (bolt1-tuple-id ^ bolt2-tuple-id) ^ bolt2-tuple-id = (spout-tuple-id ^ spout-tuple-id) ^ (bolt1-tuple-id ^ bolt1-tuple-id) ^ (bolt2-tuple-id ^ bolt2-tuple-id) = 0 ^ 0 ^ 0 = 0

Page 95: understand Storm in pictures

Spout Bolt1 Bolt2 Bolt3

Acker Task1

tuple11 tuple12 tuple13

ack_value

tuple11 tuple12 tuple13

ack()/fail()?

emit

ack ackack

🌲

emit emit

Acker Task2

Acker Task3

Page 96: understand Storm in pictures

Spout Bolt1 Bolt2 Bolt3

Acker Task2

tuple21 tuple22 tuple23

ack_value

tuple21 tuple22 tuple23

ack()/fail()?

emit

ack ackack

🌲

emit emit

Acker Task1

Acker Task3

Page 97: understand Storm in pictures

Spout Bolt1 Bolt2 Bolt3

Acker Task3

tuple31 tuple32 tuple33

ack_value

tuple31 tuple32 tuple23

ack()/fail()?

emit

ack ackack

🌲

emit emit

Acker Task1

Acker Task2

Page 98: understand Storm in pictures

Spout Bolt1 Bolt2 Bolt3

Acker Task1

tuple11 tuple12 tuple13

ack_value

tuple11 tuple12 tuple13

ack()/fail()?

emit

ack ackack

🌲

emit emit

Acker Task2

Acker Task2

1. 当⼀一个tuple需要ack时,它到底应该选择哪个Acker来发送这个信息 2. Acker是怎么知道每⼀一个spout tuple应该交给哪个Spout task来处理

Page 99: understand Storm in pictures

1. 设置Config.TOPOLOGY_ACKERS=1或者更⼤大,默认⼀一个Worker⼀一个Acker 2. 在发射tuple的时候指定messageId来达到跟踪某个特定的Spout tuple的⺫⽬目的 3. 对⼀一个tuple树的所有Tuple执⾏行成功都很关⼼心,发射这些tuple时anchor它们

Spout ack(msgId) different from Bolt ack(tuple)

What We Should Do When We Want Use Reliability Of Storm Acker

Page 100: understand Storm in pictures

参考⽂文档

http://blog.csdn.net/zhangzhebjut/article/details/38467145

http://storm.apache.org/releases/1.0.1/Guaranteeing-message-processing.html

http://www.cnblogs.com/foreach-break/p/storm_at_least_once.html

http://blog.jassassin.com/2014/10/22/storm/storm-ack/