understand storm in pictures
TRANSCRIPT
Storm基本构件(What Makes Storm)
DAG
Tuple Tuple Tuple Tuple Tuple
Stream
Spout Bolt
Topology、Stream、Spout、Boltnetwork of spouts and bolts
DAG
Topology、Stream、Spout、Boltunbounded sequence of tuples
Tuple Tuple Tuple Tuple Tuple Tuple Tuple Tuple Tuple
Topology、Stream、Spout、BoltSource of Stream
Topology、Stream、Spout、BoltProcesses input streams,Produces new streams Sink
Topology、Stream、Spout、BoltProcesses input streams,Produces new streams
Message/Tuple Transform
Tuple
Tuple
Tuple
Tuple
Tuple
Tuple
Tuple
Tuple
Tuple
Tuple
Tuple
Tuple
Tuple
Tuple
Tuple
Tuple
Tuple
Tuple
Tuple
TupleTuple
Tuple
Tuple
Tuple
⼀一个Tuple的⽣生命周期1. Spout发射出去 2. 在Stream中流动 3. 被Bolt处理计算 4. 由Bolt再次发送 5. 再次进⼊入消息流 6. 直到被完全处理
①
②③
④
⑤
⑥
Tuple
TupleTuple
Tuple
Tuple
Tuple
✖️
✖️
✖️✖️
✖️
Guaranteeing Message Processing1. At Least Once: Acker 2. Exactly Once: Trident
如果消息处理失败,Storm如何做到消息被重新处理?
Sentence Spout
Split Sentence
Bolt
Word Count Bolt
["the cow jumped over the moon"]
[“cow”]
[“the”]
["jumped”]
["over”]
["the”]
["moon”]
["the”,1]
["jumped”,1]
["cow”,1]
["the”,2]
["over”,1]
["moon”,1]
Storm considers a tuple coming off a spout "fully processed" when the tuple tree has been exhausted
and every message in the tree has been processed
tuple tree🐂⽓气冲天
Sentence Spout
Split Sentence
Bolt
Word Count Bolt
["the cow jumped over the moon"]
[“cow”]
[“the”]
["jumped”]
["over”]
["the”]
["moon”]
["the”,1]
["jumped”,1]
["cow”,1]
["the”,2]
["over”,1]
["moon”,1]
collector.emit("split", new Values("the cow jumped over the moon"), 1)
msgIdstream-idused for identify tuple lateremit a tuple to one of output streams
Tuple Lifecycle(API Layer)
a tuple coming off of a spout
Sentence Spout
Split Sentence
Bolt
Word Count Bolt
["the cow jumped over the moon"]
[“cow”]
[“the”]
["jumped”]
["over”]
["the”]
["moon”]
["the”,1]
["jumped”,1]
["cow”,1]
["the”,2]
["over”,1]
["moon”,1]
collector.emit("split", new Values("the cow jumped over the moon"), 1)
tuple tree fully processed
Tuple Lifecycle(API Layer)
w’ll talk about later
Sentence Spout
Split Sentence
Bolt
Word Count Bolt
["the cow jumped over the moon"]
[“cow”]
[“the”]
["jumped”]
["over”]
["the”]
["moon”]
["the”,1]
["jumped”,1]
["cow”,1]
["the”,2]
["over”,1]
["moon”,1]
collector.emit("split", new Values("the cow jumped over the moon"), 1)
tuple tree failed(time-out)
×
×
Tuple Lifecycle(API Layer)
Sentence Spout
Split Sentence
Bolt
Word Count Bolt
Kestrel /Kafka
Sentence Spout
Split Sentence
Bolt
Word Count Bolt
Kestrel /Kafka
Sentence Spout
Split Sentence
Bolt
Word Count Bolt
Kestrel /Kafka
Sentence Spout
Split Sentence
Bolt
Word Count Bolt
Kestrel /Kafka
Sentence Spout
Split Sentence
Bolt
Word Count Bolt
Kestrel /Kafka
ack(1)
tuple’s mesgId=1
take the message off the queue
Tuple Lifecycle(State Machine)
Sentence Spout
Split Sentence
Bolt
Word Count Bolt
Kestrel /Kafka
Sentence Spout
Split Sentence
Bolt
Word Count Bolt
Kestrel /Kafka
Sentence Spout
Split Sentence
Bolt
Word Count Bolt
Kestrel /Kafka
Sentence Spout
Split Sentence
Bolt
Word Count Bolt
Kestrel /Kafka
Sentence Spout
Split Sentence
Bolt
Word Count Bolt
Kestrel /Kafka
×
put the message back on the queue fail(1)
tuple’s mesgId=1
Tuple Lifecycle(State Machine)
1. tell Storm whenever you're creating a new link in the tree of tuples 2. tell Storm when you have finished processing an individual tuple1. can detect when the tree of tuples is fully processed 2. can ack or fail the spout tuple appropriately.
Sentence Spout
Split Sentence
Bolt
Word Count Bolt
["the cow jumped over the moon"]
[“cow”]
[“the”]
["jumped”]
["over”]
["the”]
["moon”]
["the”,1]
["jumped”,1]
["cow”,1]
["the”,2]
["over”,1]
["moon”,1]
anchored
anchored
anchored
anchored
anchored
anchored
each word tuple is anchored by sentence tuple
Storm:
YOU:
spout tuple
word tuple
Tuple Lifecycle(Program Layer)
Kestrel /Kafka
["the cow jumped over the moon"]
input tuple
output tuple
input tuple
output tuple
1. tell Storm whenever you're creating a new link in the tree of tuples 2. tell Storm when you have finished processing an individual tuple1. can detect when the tree of tuples is fully processed 2. can ack or fail the spout tuple appropriately.
Sentence Spout
Split Sentence
Bolt
Word Count Bolt
[“cow”]
[“the”]
["jumped”]
["over”]
["the”]
["moon”]
["the”,1]
["jumped”,1]
["cow”,1]
["the”,2]
["over”,1]
["moon”,1]
anchored
anchored
anchored
anchored
anchored
anchored
each word-count tuple is anchored by word tuple
Storm:
YOU:
anchored
anchored
anchored
anchored
anchored
anchored
["the cow jumped over the moon"]
word-count tuple
Tuple Lifecycle(Program Layer)
Kestrel /Kafka
["the cow jumped over the moon"]
word tupleinput tuple output tuple
input tuple
output tuple
1. tell Storm whenever you're creating a new link in the tree of tuples 2. tell Storm when you have finished processing an individual tuple1. can detect when the tree of tuples is fully processed 2. can ack or fail the spout tuple appropriately.
Sentence Spout
Split Sentence
Bolt
Word Count Bolt
[“cow”]
[“the”]
["jumped”]
["over”]
["the”]
["moon”]
["the”,1]
["jumped”,1]
["cow”,1]
["the”,2]
["over”,1]
["moon”,1]
anchored
anchored
anchored
anchored
anchored
anchored
Storm:
YOU:
anchored
anchored
anchored
anchored
anchored
anchored
["the cow jumped over the moon"]
✅
Tuple Lifecycle(Program Layer)
Kestrel /Kafka
["the cow jumped over the moon"]
ack word tuple: [“the”]
1. tell Storm whenever you're creating a new link in the tree of tuples 2. tell Storm when you have finished processing an individual tuple1. can detect when the tree of tuples is fully processed 2. can ack or fail the spout tuple appropriately.
Sentence Spout
Split Sentence
Bolt
Word Count Bolt
[“cow”]
[“the”]
["jumped”]
["over”]
["the”]
["moon”]
["the”,1]
["jumped”,1]
["cow”,1]
["the”,2]
["over”,1]
["moon”,1]
anchored
anchored
anchored
anchored
anchored
anchored
Storm:
YOU:
anchored
anchored
anchored
anchored
anchored
anchored
["the cow jumped over the moon"]
✅
✅
Tuple Lifecycle(Program Layer)
Kestrel /Kafka
["the cow jumped over the moon"]
ack word tuple: [“cow”]
1. tell Storm whenever you're creating a new link in the tree of tuples 2. tell Storm when you have finished processing an individual tuple1. can detect when the tree of tuples is fully processed 2. can ack or fail the spout tuple appropriately.
Sentence Spout
Split Sentence
Bolt
Word Count Bolt
[“cow”]
[“the”]
["jumped”]
["over”]
["the”]
["moon”]
["the”,1]
["jumped”,1]
["cow”,1]
["the”,2]
["over”,1]
["moon”,1]
anchored
anchored
anchored
anchored
anchored
anchored
Storm:
YOU:
anchored
anchored
anchored
anchored
anchored
anchored
["the cow jumped over the moon"]
✅
✅
✅
✅
✅
✅
Tuple Lifecycle(Program Layer)
Kestrel /Kafka
["the cow jumped over the moon"]
ack word tuple: [“moon”]
1. tell Storm whenever you're creating a new link in the tree of tuples 2. tell Storm when you have finished processing an individual tuple1. can detect when the tree of tuples is fully processed 2. can ack or fail the spout tuple appropriately.
Sentence Spout
Split Sentence
Bolt
Word Count Bolt
[“cow”]
[“the”]
["jumped”]
["over”]
["the”]
["moon”]
["the”,1]
["jumped”,1]
["cow”,1]
["the”,2]
["over”,1]
["moon”,1]
anchored
anchored
anchored
anchored
anchored
anchored
anchored
anchored
anchored
anchored
anchored
anchored
["the cow jumped over the moon"]
✅
✅
✅
✅
✅
✅
Storm:
YOU:
Tuple Lifecycle(Program Layer)
Kestrel /Kafka
["the cow jumped over the moon"]
✅
ack sentence tuple: [“the cow jumped over the moon”]
the input tuple is acked after all the word tuples are emitted
input tuple
word tuples
1. tell Storm whenever you're creating a new link in the tree of tuples 2. tell Storm when you have finished processing an individual tuple1. can detect when the tree of tuples is fully processed 2. can ack or fail the spout tuple appropriately.
Sentence Spout
Split Sentence
Bolt
Word Count Bolt
[“cow”]
[“the”]
["jumped”]
["over”]
["the”]
["moon”]
["the”,1]
["jumped”,1]
["cow”,1]
["the”,2]
["over”,1]
["moon”,1]
anchored
anchored
anchored
anchored
anchored
anchored
anchored
anchored
anchored
anchored
anchored
anchored
["the cow jumped over the moon"]
✅
✅
✅
✅
✅
✅
Storm:
YOU:
Kestrel /Kafka
tuple tree full processed
ack(msgId=1)
Tuple Lifecycle(Program Layer)
["the cow jumped over the moon"]
✅
1. tell Storm whenever you're creating a new link in the tree of tuples 2. tell Storm when you have finished processing an individual tuple1. can detect when the tree of tuples is fully processed 2. can ack or fail the spout tuple appropriately.
Sentence Spout
Split Sentence
Bolt
Word Count Bolt
[“cow”]
[“the”]
["jumped”]
["over”]
["the”]
["moon”]
["the”,1]
["jumped”,1]
["cow”,1]
["the”,2]
["over”,1]
["moon”,1]
anchored
anchored
anchored
anchored
anchored
anchored
anchored
anchored
anchored
anchored
anchored
anchored
["the cow jumped over the moon"]
✅
✅
✅
✅
✅
✅
Storm:
YOU:
Kestrel /Kafka
tuple tree full processed
ack(msgId=1)
Tuple Lifecycle(Program Layer)
✅
1. tell Storm whenever you're creating a new link in the tree of tuples 2. tell Storm when you have finished processing an individual tuple1. can detect when the tree of tuples is fully processed 2. can ack or fail the spout tuple appropriately.
Sentence Spout
Split Sentence
Bolt
Word Count Bolt
[“cow”]
[“the”]
["jumped”]
["over”]
["the”]
["moon”]
["the”,1]
["jumped”,1]
["cow”,1]
["the”,2]
["over”,1]
["moon”,1]
anchored
anchored
anchored
anchored
anchored
anchored
anchored
anchored
anchored
anchored
anchored
anchored
["the cow jumped over the moon"]
✅
✅
✅
✅
✅
Storm:
YOU:
Kestrel /Kafka
Tuple Lifecycle(Program Layer)
Since the word tuple is anchored, the spout tuple at the root of the tree
w’be replayed later on if the word tuple failed to be processed downstream
["the cow jumped over the moon"]
tuple tree failed
fail(msgId=1)
××
this.collector.fail(tuple)
Sentence Spout
Split Sentence
Bolt
Word Count Bolt
[“cow”]
[“the”]
["jumped”]
["over”]
["the”]
["moon”]
["the”,1]
["jumped”,1]
["cow”,1]
["the”,2]
["over”,1]
["moon”,1]
["the cow jumped over the moon"]
Kestrel /Kafka
["the cow jumped over the moon"]
Sentence Spout
Split Sentence
Bolt
Word Count Bolt
[“cow”]
[“the”]
["jumped”]
["over”]
["the”]
["moon”]
["the”,1]
["jumped”,1]
["cow”,1]
["the”,2]
["over”,1]
["moon”,1]
["the cow jumped over the moon"]
Kestrel /Kafka
×
×
×
tuple1
tuple2
tuple3
input tuple
output tuple
multi-anchored tuple
tuple1
tuple2
tuple3×tuple1
tuple2
tuple3
replay…tuple3 failed
ONE MORE THING
+ reading an input tuple, + emitting tuples based on it + and then acking the tuple at the end of the execute()
Every tuple you process must be acked or failed. Storm uses memory to track each tuple, so if you don't ack/fail every tuple, the task will eventually run OOM.
STORM DO IT FOR YOU!
YOU DON’T NEED Attention Anchor & Ack Anymore ✅
Acker
Spout数据源发射⼀一个Tuple,怎么算被完全处理?
Spout Bolt1 Bolt2 Bolt3tuple1
tuple1
SentenceSpout
tuple1 tuple3
SplitBolt
["the cow jumped..”] tuple4
tuple2
tuple6
tuple7
tuple5
tuple3
tuple4
tuple2
[“the”]
["cow”]
["jumped”]
["cow”,1]
["the”,1]
["jumped”,1]
["the cow jumped.”]
tuple6
tuple7
tuple5
WordCountBolt PrintBolt
Tuple Tree🌲
在Spout中发射⼀一个新的源Tuple时, 可以为该源Tuple指定⼀一个MessageId。 多个源Tuple可以共⽤用同⼀一个MessageId, 表⽰示多个源Tuple组成同⼀一个消息单元, 它们会被放到同⼀一棵Tuple树中
tuple1
tuple2
Spout
tuple1 tuple3
Bolt1
tuple2 tuple4
Bolt2
tuple3 tuple5
tuple4 tuple6
Bolt3
Bolt4
tuple5
tuple6
Bolt5
collector.emit(new Values(tuple1), Message1); collector.emit(new Values(tuple2), Message1);
collector.emit(new Values(tuple1), Message1); collector.emit(new Values(tuple2), Message2);
Tuple Tree🌲🌲🌲
Message1
1. Spout中Message1绑定了tuple1和tuple2(同⼀一个MessageId) 2. tuple1发送给Bolt1处理,tuple2发送给Bolt2处理 3. Bolt1处理tuple1⽣生成tuple3,Bolt2处理tuple2⽣生成tuple4 4. Bolt1⽣生成的tuple3流向Bolt3,Bolt2⽣生成的tuple4流向Bolt4 5. Bolt3处理tuple3⽣生成tuple5,Bolt4处理tuple4⽣生成tuple6 6. Bolt3⽣生成的tuple5和Bolt4⽣生成的tuple6都流向了同⼀一个Bolt5 7. Bolt5处理完tuple5和tuple6,表⽰示Message1被完全处理了
tuple1
tuple2
Spout
tuple1 tuple3
Bolt1
tuple2 tuple4
Bolt2
tuple3 tuple5
tuple4 tuple6
Bolt3
Bolt4
tuple5
tuple6
Bolt5
1. Spout中Message1绑定了tuple1和tuple2(同⼀一个MessageId) 2. tuple1发送给Bolt1处理,tuple2发送给Bolt2处理 3. Bolt1处理tuple1⽣生成tuple3,Bolt2处理tuple2⽣生成tuple4 4. Bolt1⽣生成的tuple3流向Bolt3,Bolt2⽣生成的tuple4流向Bolt4 5. Bolt3处理tuple3⽣生成tuple5,Bolt4处理tuple4⽣生成tuple6 6. Bolt3⽣生成的tuple5和Bolt4⽣生成的tuple6都流向了同⼀一个Bolt5 7. Bolt5处理完tuple5和tuple6,表⽰示Message1被完全处理了
tuple1
tuple2
Spout
tuple1 tuple3
Bolt1
tuple2 tuple4
Bolt2
tuple3 tuple5
tuple4 tuple6
Bolt3
Bolt4
tuple5
tuple6
Bolt5
tuple1
tuple2
Spout
tuple1 tuple3
Bolt1
tuple2 tuple4
Bolt2
tuple3 tuple5
tuple4 tuple6
Bolt3
Bolt4
tuple5
tuple6
Bolt5
1. Spout中Message1绑定了tuple1和tuple2(同⼀一个MessageId) 2. tuple1发送给Bolt1处理,tuple2发送给Bolt2处理 3. Bolt1处理tuple1⽣生成tuple3,Bolt2处理tuple2⽣生成tuple4 4. Bolt1⽣生成的tuple3流向Bolt3,Bolt2⽣生成的tuple4流向Bolt4 5. Bolt3处理tuple3⽣生成tuple5,Bolt4处理tuple4⽣生成tuple6 6. Bolt3⽣生成的tuple5和Bolt4⽣生成的tuple6都流向了同⼀一个Bolt5 7. Bolt5处理完tuple5和tuple6,表⽰示Message1被完全处理了
tuple1
tuple2
Spout
tuple1 tuple3
Bolt1
tuple2 tuple4
Bolt2
tuple3 tuple5
tuple4 tuple6
Bolt3
Bolt4
tuple5
tuple6
Bolt5
1. Spout中Message1绑定了tuple1和tuple2(同⼀一个MessageId) 2. tuple1发送给Bolt1处理,tuple2发送给Bolt2处理 3. Bolt1处理tuple1⽣生成tuple3,Bolt2处理tuple2⽣生成tuple4 4. Bolt1⽣生成的tuple3流向Bolt3,Bolt2⽣生成的tuple4流向Bolt4 5. Bolt3处理tuple3⽣生成tuple5,Bolt4处理tuple4⽣生成tuple6 6. Bolt3⽣生成的tuple5和Bolt4⽣生成的tuple6都流向了同⼀一个Bolt5 7. Bolt5处理完tuple5和tuple6,表⽰示Message1被完全处理了
tuple1
tuple2
Spout
tuple1 tuple3
Bolt1
tuple2 tuple4
Bolt2
tuple3 tuple5
tuple4 tuple6
Bolt3
Bolt4
tuple5
tuple6
Bolt5
1. Spout中Message1绑定了tuple1和tuple2(同⼀一个MessageId) 2. tuple1发送给Bolt1处理,tuple2发送给Bolt2处理 3. Bolt1处理tuple1⽣生成tuple3,Bolt2处理tuple2⽣生成tuple4 4. Bolt1⽣生成的tuple3流向Bolt3,Bolt2⽣生成的tuple4流向Bolt4 5. Bolt3处理tuple3⽣生成tuple5,Bolt4处理tuple4⽣生成tuple6 6. Bolt3⽣生成的tuple5和Bolt4⽣生成的tuple6都流向了同⼀一个Bolt5 7. Bolt5处理完tuple5和tuple6,表⽰示Message1被完全处理了
tuple1
tuple2
Spout
tuple1 tuple3
Bolt1
tuple2 tuple4
Bolt2
tuple3 tuple5
tuple4 tuple6
Bolt3
Bolt4
tuple5
tuple6
Bolt5
1. Spout中Message1绑定了tuple1和tuple2(同⼀一个MessageId) 2. tuple1发送给Bolt1处理,tuple2发送给Bolt2处理 3. Bolt1处理tuple1⽣生成tuple3,Bolt2处理tuple2⽣生成tuple4 4. Bolt1⽣生成的tuple3流向Bolt3,Bolt2⽣生成的tuple4流向Bolt4 5. Bolt3处理tuple3⽣生成tuple5,Bolt4处理tuple4⽣生成tuple6 6. Bolt3⽣生成的tuple5和Bolt4⽣生成的tuple6都流向了同⼀一个Bolt5 7. Bolt5处理完tuple5和tuple6,表⽰示Message1被完全处理了
tuple1
tuple2
Spout
tuple1 tuple3
Bolt1
tuple2 tuple4
Bolt2
tuple3 tuple5
tuple4 tuple6
Bolt3
Bolt4
tuple5
tuple6
Bolt5
1. Spout中Message1绑定了tuple1和tuple2(同⼀一个MessageId) 2. tuple1发送给Bolt1处理,tuple2发送给Bolt2处理 3. Bolt1处理tuple1⽣生成tuple3,Bolt2处理tuple2⽣生成tuple4 4. Bolt1⽣生成的tuple3流向Bolt3,Bolt2⽣生成的tuple4流向Bolt4 5. Bolt3处理tuple3⽣生成tuple5,Bolt4处理tuple4⽣生成tuple6 6. Bolt3⽣生成的tuple5和Bolt4⽣生成的tuple6都流向了同⼀一个Bolt5 7. Bolt5处理完tuple5和tuple6,表⽰示Message1被完全处理了
Message1✅
Spout Bolt1 Bolt2 Bolt3tuple1 tuple2 tuple3
完全处理: 源Tuple以及由该源Tuple衍⽣生的所有Tuple都经过了Topology中每⼀一个应该到达的Bolt的处理
tuple1
tuple1 tuple2
tuple2 tuple3
tuple3
Spout发射TupleBolt1接收Tuple1 Bolt1处理Tuple1 Bolt1发射Tuple2
Bolt2接收Tuple2 Bolt2处理Tuple2 Bolt2发射Tuple3
Bolt3接收Tuple3 Bolt3处理Tuple3
spout-tuple-1 processed table:只有全部为Y,才表⽰示完全处理
Spout Bolt1 Bolt2 Bolt3tuple1 tuple2tuple1
tuple1
tuple2
Spout Bolt1 Bolt2 Bolt3tuple1 tuple2 tuple3tuple1
tuple1 tuple2
tuple2 tuple3
✅ × ×
×
×
✅
Spout Bolt1 Bolt2 Bolt3tuple1 tuple24tuple1
tuple23
tuple25
tuple26
tuple22
tuple21
tuple27
……
……
tuple33
tuple32
tuple34
tuple35
tuple31
What would spout-tuple-1 processing table like?
A REALLY LARGE/HUGE TABLE!!!
Spout Bolt1 Bolt2 Bolt3
AckerBolt
tuple1 tuple2 tuple3
ack_value
tuple1 tuple2 tuple3
ack()/fail()?
emit
ack
emit em
it
ackack
Acker组件:跟踪Spout发出的每⼀一个Tuple的Tuple🌲
🌲
1. emit(tuple, …) 2. ack(tuple)
Solution1:拉链式
Spout Bolt1 Bolt2 Bolt3
AckerBolt
tuple1 tuple2 tuple3
ack_value
tuple1 tuple2 tuple3
ack()/fail()?
emit
ack
emit em
it
ackack
🌲
Spout Bolt1 Bolt2 Bolt3
AckerBolt
tuple1 tuple2 tuple3
ack_value
tuple1 tuple2 tuple3
ack()/fail()?
emit
ack
emit em
it
ackack
🌲
Spout Bolt1 Bolt2 Bolt3
AckerBolt
tuple1 tuple2 tuple3
ack_value
tuple1 tuple2 tuple3
ack()/fail()?
emit
ack
emit em
it
ackack
🌲
Spout Bolt1 Bolt2 Bolt3
AckerBolt
tuple1 tuple2 tuple3
ack_value
tuple1 tuple2 tuple3
ack()/fail()?
emit
ack
emit em
it
ackack
🌲
Spout Bolt1 Bolt2 Bolt3
AckerBolt
tuple1 tuple2 tuple3
ack_value
tuple1 tuple2 tuple3
ack()/fail()?
emit
ack
emit em
it
ackack
🌲
Spout Bolt1 Bolt2 Bolt3
AckerBolt
tuple1 tuple2 tuple3
ack_value
tuple1 tuple2 tuple3
ack()/fail()?
emit
ack
emit em
it
ackack
🌲
Spout Bolt1 Bolt2 Bolt3
AckerBolt
tuple1 tuple2 tuple3
ack_value
tuple1 tuple2 tuple3
ack()/fail()?
emit
ack
emit em
it
ackack
🌲
Solution1:渐进式
Spout Bolt1 Bolt2 Bolt3
AckerBolt
tuple1 tuple2 tuple3
ack_value
tuple1 tuple2 tuple3
ack()/fail()?
emit
ack
emit em
it
ackack
🌲
Spout Bolt1 Bolt2 Bolt3
AckerBolt
tuple1 tuple2 tuple3
ack_value
tuple1 tuple2 tuple3
ack()/fail()?
emit
ack
emit em
it
ackack
🌲
Spout Bolt1 Bolt2 Bolt3
AckerBolt
tuple1 tuple2 tuple3
ack_value
tuple1 tuple2 tuple3
ack()/fail()?
emit
ack
emit em
it
ackack
🌲
Spout Bolt1 Bolt2 Bolt3
AckerBolt
tuple1 tuple2 tuple3
ack_value
tuple1 tuple2 tuple3
ack()/fail()?
emit
ack
emit em
it
ackack
🌲
Spout Bolt1 Bolt2 Bolt3
AckerBolt
tuple1 tuple2 tuple3
ack_value
tuple1 tuple2 tuple3
ack()/fail()?
emit
ack
emit em
it
ackack
🌲
Spout Bolt1 Bolt2 Bolt3
AckerBolt
tuple1 tuple2 tuple3
ack_value
tuple1 tuple2 tuple3
ack()/fail()?
emit
ack
emit em
it
ackack
🌲
Spout Bolt1 Bolt2 Bolt3
AckerBolt
tuple1 tuple2 tuple3
ack_value
tuple1 tuple2 tuple3
ack()/fail()?
emit
ack
emit em
it
ackack
🌲
How Storm Implements Acker…How does Storm implement reliability in an efficient way?
A Storm topology has a set of special "acker" tasks that track the DAG of tuples for every spout tuple. When an acker sees that a DAG is complete, it sends a message to the spout task that created the spout tuple to ack the message.
1. Acker can have many tasks just like Spout/Bolt 2. DAG of tuples is a Tuple Tree which 3. generate by Spout #tuple(by one of Spout task) 4. The Spout #tuple associated with a MessageId 5. When all tuples on Tuple Tree are full processed 6. Acker send a message to the Spout task on #3 7. Spout can ack the Message along with #tuple
理解Storm可靠性的最好的⽅方法是来看看tuple和tuple树的⽣生命周期,当⼀一个tuple被创建,不管是spout还是bolt创建的,它会被赋予⼀一个64位的id,⽽而acker就是利⽤用这个id去跟踪所有tuple的。每个tuple知道它的祖宗的id(从spout发出来的那个tuple的id,⼀一棵tuple树的root tuple-id是固定的), 每当你新发射⼀一个tuple, 它的祖宗id都会传给这个新的tuple。当⼀一个tuple被ack的时候,会发⼀一个消息给acker,告诉acker这个tuple树发⽣生了怎么样的变化。 具体来说就是它告诉acker: 我已经完成了,我有这些⼉儿⼦子tuple, 你跟踪⼀一下他们吧。
The best way to understand Storm's reliability implement is to look at the lifecycle of tuples and tuple DAGs. When a tuple is created in a topology, whether in a spout or a bolt, it is given a random 64 bit id. These ids are used by ackers to track the tuple DAG for every spout tuple.
Every tuple knows the ids of all the spout tuples for which it exists in their tuple trees. When you emit a new tuple in a bolt, the spout tuple ids from the tuple's anchors are copied into the new tuple. When a tuple is acked, it sends a message to the appropriate acker tasks with information about how the tuple tree changed. In particular it tells the acker "I am now completed within the tree for this spout tuple, and here are the new tuples in the tree that were anchored to me".
When a tuple is acked, it sends a message to the appropriate acker tasks with information about how the tuple tree changed. In particular it tells the acker "I am now completed within the tree for this spout tuple, and here are the new tuples in the tree that were anchored to me"
For example, if tuples "D" and "E" were created based on tuple "C", here's how the tuple tree changes when "C" is acked: Since "C" is removed from the tree at the same time that "D" and "E" are added to it, the tree can never be prematurely completed.
1. Bolt emit 时不会向Acker发送消息,Bolt ack 时才会向Acker发送消息 2. ack时知道要ack的input tuple的id和emit时产⽣生的所有output tuple的ids 3. 所以ack时可以把input tuple id和emit的所有output tuple ids先计算好后 才向Acker发送消息 4. Acker收到Bolt的ack消息,将当前的ack val和收到的ack消息进⾏行计算, 得到的结果表⽰示tuple树的变化情况
5. Bolt⼀一旦对input tuple进⾏行ack后,从当前input tuple⼀一直回溯到 root tuple都不再需要保存相关信息 只需要在Acker中保存最新emit出来的output tuples
为什么不需要记录祖先tuple-id(不仅仅是spout tuple id,也包括上游输⼊入tuple)
Spout Bolt1 Bolt2 Bolt3
AckerBolt
tuple1 tuple2 tuple3
ack_value
tuple1 tuple2 tuple3
ack()/fail()?
emit
ack ackack
Acker组件:跟踪Spout发出的每⼀一个Tuple的Tuple🌲
🌲
1. emit(tuple, …) 2. ack(tuple)
emit emit
Spout Bolt1 Bolt2 Bolt3
AckerBolt
tuple1 tuple2 tuple3
ack_value
tuple1 tuple2 tuple3
ack()/fail()?
emit
ack ackack
🌲
emit emit
tuple1
Spout Bolt1 Bolt2 Bolt3
AckerBolt
tuple1 tuple2 tuple3
ack_value
tuple1 tuple2 tuple3
ack()/fail()?
emit
ack ackack
🌲
emit emit
tuple1 tuple2×
× ×
×:表⽰示⽗父tuple已经完成,Acker需要跟踪⼦子tuples
Spout Bolt1 Bolt2 Bolt3
AckerBolt
tuple1 tuple2 tuple3
ack_value
tuple1 tuple2 tuple3
ack()/fail()?
emit
ack ackack
🌲
emit emit
tuple1 tuple2× × tuple3
× ×× ×
Spout Bolt1 Bolt2 Bolt3
AckerBolt
tuple1 tuple2 tuple3
ack_value
tuple1 tuple2 tuple3
ack()/fail()?
emit
ack ackack
🌲
emit emit
tuple1 tuple2× × tuple3
× × ×
×
× ××
Spout Bolt1 Bolt2 Bolt3
AckerBolt
tuple1 tuple2 tuple3
ack_value
tuple1 tuple2 tuple3
ack()/fail()?
emit
ack ackack
🌲
emit emit
tuple1 tuple2× × tuple3
× × ×
×
× ××
✅
⼀一点代数知识
⾃自⼰己和⾃自⼰己^异或^⼀一定等于0 0000 ^ 0000 ———
0000
0
0
1
1
0
1
1
0^
100 1
0001 ^ 0001 ———
0000
0010 ^ 0010 ———
0000
0011 ^ 0011 ———
0000
0100 ^ 0100 ———
0000
010100110110010011 ^ 010100110110010011 ——————————— 000000000000000000
两个不相同(不是⾃自⼰己和⾃自⼰己)异或不为0 0000 ^ 0001 ———
0001
0001 ^ 1001 ———
1000
0010 ^ 0110 ———
0100
0011 ^ 0010 ———
0001
1100 ^ 0100 ———
1000
010100110110010011 ^ 010100111110010011 ——————————— 000000001000000000
0
1
0
1
0
1
1
0
0
1
那么有没有办法得到0呢?
0000 ^ 0001 ———
0001
0001 ^ 1100 ———
1101
1101 ^ 0010 ——— 1111
1111 ^ 1001 ———
0110
0110 ^ 0110 ———
0000
0^X1=X1 X1^X2=X3 X3^X4=X5 X5^X6=X7 X7^X7= 0
X1
X1
X2
X3
X4
X5
X6
X7
X7
⾃自⼰己和⾃自⼰己异或⼀一定等于0
0001 1100
0000
0001 1101
0010
11010001
1111
1111
1001
0110
0110
0110
0000
X1 X2 X4 X6 X7
Spout Bolt10001 Bolt21010 Bolt30011
Spout/Bolt发射Tuple时都会为Tuple⽣生成⼀一个ID Spout/Bolt有往下游发射Tuple,必须有Bolt接收 最后⼀一个Bolt没有发射Tuple,表⽰示Topology结束
0001 1010 0011
发射 接收 发射 接收 发射 接收
0001 1010 00110001 1010 0011^ ^ ^ ^ ^
( )0001 1010 00110001 1010 0011^ ^ ^ ^ ^( ) ( )
0000 0000 0000^ ^
0000
tuple1 tuple1 tuple2 tuple3tuple2 tuple3
0001 1010 00110001 1010 0011^ ^ ^ ^ ^
( )0001 1010 00110001 1010 0011^ ^ ^ ^ ^( ) ( )
0000 0000 0000^ ^
0000
Spout Bolt10001 Bolt21010 Bolt300110001 1010 0011
Spout发射⼀一个Tuple,id=0001,Acker跟踪此spout tuple
tuple1 tuple1 tuple2 tuple3tuple2 tuple3
0001 1010 00110001 1010 0011^ ^ ^ ^ ^
( )0001 1010 00110001 1010 0011^ ^ ^ ^ ^( ) ( )
0000 0000 0000^ ^
0000
Spout Bolt10001 Bolt21010 Bolt300110001 1010 0011
Bolt1接收到Spout发射的input tuple,但还没有处理,不会和Acker通信
tuple1 tuple1 tuple2 tuple3tuple2 tuple3
0001 1010 00110001 1010 0011^ ^ ^ ^ ^
( )0001 1010 00110001 1010 0011^ ^ ^ ^ ^( ) ( )
0000 0000 0000^ ^
0000
Spout Bolt10001 Bolt21010 Bolt300110001 1010 0011
Bolt1发射新的Tuple:1010,并且对input tuple=tuple1进⾏行ack,会和Acker通信
tuple1 tuple1 tuple2 tuple3tuple2 tuple3
0001 1010 00110001 1010 0011^ ^ ^ ^ ^
( )0001 1010 00110001 1010 0011^ ^ ^ ^ ^( ) ( )
0000 0000 0000^ ^
0000
Spout Bolt10001 Bolt21010 Bolt300110001 1010 0011
Acker中只会保留新⽣生成的⼦子tuple=tuple2的id,祖先tuple ids不会记录
tuple1 tuple1 tuple2 tuple3tuple2 tuple3
0001 1010 00110001 1010 0011^ ^ ^ ^ ^
( )0001 1010 00110001 1010 0011^ ^ ^ ^ ^( ) ( )
0000 0000 0000^ ^
0000
Spout Bolt10001 Bolt21010 Bolt300110001 1010 0011
Bolt2接收tuple2,处理tuple2,发射⼦子tuple=tuple3,ack(tuple2)
tuple1 tuple1 tuple2 tuple3tuple2 tuple3
0001 1010 00110001 1010 0011^ ^ ^ ^ ^
( )0001 1010 00110001 1010 0011^ ^ ^ ^ ^( ) ( )
0000 0000 0000^ ^
0000
Spout Bolt10001 Bolt21010 Bolt300110001 1010 0011
Acker中只会保留新⽣生成的⼦子tuple=tuple3的id,祖先tuple ids不会记录
tuple1 tuple1 tuple2 tuple3tuple2 tuple3
0001 1010 00110001 1010 0011^ ^ ^ ^ ^
( )0001 1010 00110001 1010 0011^ ^ ^ ^ ^( ) ( )
0000 0000 0000^ ^
0000
Spout Bolt10001 Bolt21010 Bolt300110001 1010 0011
Bolt3接收tuple3,处理tuple3,不再发射新tuple,ack(tuple3)
tuple1 tuple1 tuple2 tuple3tuple2 tuple3
0001 1010 00110001 1010 0011^ ^ ^ ^ ^
( )0001 1010 00110001 1010 0011^ ^ ^ ^ ^( ) ( )
0000 0000 0000^ ^
0000
Spout Bolt10001 Bolt21010 Bolt300110001 1010 0011tuple1 tuple1 tuple2 tuple3tuple2 tuple3
没有新⽣生成的tuple,Acker的ack_val=0,表⽰示TupleTree完全处理✅
(spout-tuple-id, tmp-ack-val) tmp-ack-val = spout-tuple-id ^ (child-tuple-id1 ^ child-tuple-id2 ... ) tmp-ack-val是要ack的tuple的id与由它新创建的所有的tuple的id异或的结果
以spout产⽣生spout-tuple-id为例(tuple1),Bolt1产⽣生bolt1-tuple-id(tuple2), Bolt2产⽣生bolt2-tuple-id(tuple3),Bolt3不产⽣生tuple。
Spout发射Tuple1,Acker记录tuple1的id,⽤用于跟踪spout-tuple
tmp-ack-val = spout-tuple-id
Bolt1处理Spout的tuple1,发射tuple2,并ack Spout的tuple1
tmp-ack-val = spout-tuple-id ^ (spout-tuple-id ^ bolt1-tuple-id) = (spout-tuple-id ^ spout-tuple-id) ^ bolt1-tuple-id = 0 ^ bolt1-tuple-id = bolt1-tuple-id
Bolt2处理Bolt1的tuple2,发射tuple3,并ack Bolt1的tuple2
tmp-ack-val = spout-tuple-id ^ (spout-tuple-id ^ bolt1-tuple-id) ^ (bolt1-tuple-id ^ bolt2-tuple-id) = (spout-tuple-id ^ spout-tuple-id) ^ (bolt1-tuple-id ^ bolt1-tuple-id) ^ bolt2-tuple-id = 0 ^ 0 ^ bolt2-tuple-id = bolt2-tuple-id
Bolt3处理Bolt2的tuple3,不发射tuple,并ack Bolt2的tuple3
tmp-ack-val = spout-tuple-id ^ (spout-tuple-id ^ bolt1-tuple-id) ^ (bolt1-tuple-id ^ bolt2-tuple-id) ^ bolt2-tuple-id = (spout-tuple-id ^ spout-tuple-id) ^ (bolt1-tuple-id ^ bolt1-tuple-id) ^ (bolt2-tuple-id ^ bolt2-tuple-id) = 0 ^ 0 ^ 0 = 0
Spout Bolt1 Bolt2 Bolt3
Acker Task1
tuple11 tuple12 tuple13
ack_value
tuple11 tuple12 tuple13
ack()/fail()?
emit
ack ackack
🌲
emit emit
Acker Task2
Acker Task3
Spout Bolt1 Bolt2 Bolt3
Acker Task2
tuple21 tuple22 tuple23
ack_value
tuple21 tuple22 tuple23
ack()/fail()?
emit
ack ackack
🌲
emit emit
Acker Task1
Acker Task3
Spout Bolt1 Bolt2 Bolt3
Acker Task3
tuple31 tuple32 tuple33
ack_value
tuple31 tuple32 tuple23
ack()/fail()?
emit
ack ackack
🌲
emit emit
Acker Task1
Acker Task2
Spout Bolt1 Bolt2 Bolt3
Acker Task1
tuple11 tuple12 tuple13
ack_value
tuple11 tuple12 tuple13
ack()/fail()?
emit
ack ackack
🌲
emit emit
Acker Task2
Acker Task2
1. 当⼀一个tuple需要ack时,它到底应该选择哪个Acker来发送这个信息 2. Acker是怎么知道每⼀一个spout tuple应该交给哪个Spout task来处理
1. 设置Config.TOPOLOGY_ACKERS=1或者更⼤大,默认⼀一个Worker⼀一个Acker 2. 在发射tuple的时候指定messageId来达到跟踪某个特定的Spout tuple的⺫⽬目的 3. 对⼀一个tuple树的所有Tuple执⾏行成功都很关⼼心,发射这些tuple时anchor它们
Spout ack(msgId) different from Bolt ack(tuple)
What We Should Do When We Want Use Reliability Of Storm Acker
参考⽂文档
http://blog.csdn.net/zhangzhebjut/article/details/38467145
http://storm.apache.org/releases/1.0.1/Guaranteeing-message-processing.html
http://www.cnblogs.com/foreach-break/p/storm_at_least_once.html
http://blog.jassassin.com/2014/10/22/storm/storm-ack/