storm 2012 03-29
DESCRIPTION
Detailed design for a robust counter as well as design for a completely on-line multi-armed bandit implementation that uses the new Bayesian Bandit algorithm - by Ted DunningTRANSCRIPT
![Page 1: Storm 2012 03-29](https://reader036.vdocuments.net/reader036/viewer/2022062300/5581d449d8b42ae06c8b54f2/html5/thumbnails/1.jpg)
Real-time and long-time
Fun with Hadoop + Storm
![Page 2: Storm 2012 03-29](https://reader036.vdocuments.net/reader036/viewer/2022062300/5581d449d8b42ae06c8b54f2/html5/thumbnails/2.jpg)
The Challenge
• Hadoop is great of processing vats of data– But sucks for real-time (by design!)
• Storm is great for real-time processing– But lacks any way to deal with batch processing
• It sounds like there isn’t a solution– Neither fashionable solution handles everything
![Page 3: Storm 2012 03-29](https://reader036.vdocuments.net/reader036/viewer/2022062300/5581d449d8b42ae06c8b54f2/html5/thumbnails/3.jpg)
This is not a problem.
It’s an opportunity!
![Page 4: Storm 2012 03-29](https://reader036.vdocuments.net/reader036/viewer/2022062300/5581d449d8b42ae06c8b54f2/html5/thumbnails/4.jpg)
t
now
Hadoop is Not Very Real-time
UnprocessedData
Fully processed
Latest full period
Hadoop job takes this long for this data
![Page 5: Storm 2012 03-29](https://reader036.vdocuments.net/reader036/viewer/2022062300/5581d449d8b42ae06c8b54f2/html5/thumbnails/5.jpg)
Need to Plug the Hole in Hadoop
• We have real-time data with limited state– Exactly what Storm does– And what Hadoop does not
• Can Storm and Hadoop be combined?
![Page 6: Storm 2012 03-29](https://reader036.vdocuments.net/reader036/viewer/2022062300/5581d449d8b42ae06c8b54f2/html5/thumbnails/6.jpg)
t
now
Hadoop works great back here
Storm workshere
Real-time and Long-time together
Blended view
Blended view
Blended View
![Page 7: Storm 2012 03-29](https://reader036.vdocuments.net/reader036/viewer/2022062300/5581d449d8b42ae06c8b54f2/html5/thumbnails/7.jpg)
An Example
• I want to know how many queries I get– Per second, minute, day, week
• Results should be available– within <2 seconds 99.9+% of the time– within 30 seconds almost always
• History should last >3 years• Should work for 0.001 q/s up to 100,000 q/s• Failure tolerant, yadda, yadda
![Page 8: Storm 2012 03-29](https://reader036.vdocuments.net/reader036/viewer/2022062300/5581d449d8b42ae06c8b54f2/html5/thumbnails/8.jpg)
Rough Design – Data Flow
Search Engine
Query Event Spout
Logger Bolt
Counter Bolt
Raw Logs
LoggerBolt
Semi Agg
Hadoop Aggregator
Snap
Long agg
Query Event Spout
Counter Bolt
Logger Bolt
![Page 9: Storm 2012 03-29](https://reader036.vdocuments.net/reader036/viewer/2022062300/5581d449d8b42ae06c8b54f2/html5/thumbnails/9.jpg)
Counter Bolt Detail
• Input: Labels to count• Output: Short-term semi-aggregated counts– (time-window, label, count)
• Non-zero counts emitted if– event count reaches threshold (typical 100K)– time since last count reaches threshold (typical 1s)
• Tuples acked when counts emitted• Double count probability is > 0 but very small
![Page 10: Storm 2012 03-29](https://reader036.vdocuments.net/reader036/viewer/2022062300/5581d449d8b42ae06c8b54f2/html5/thumbnails/10.jpg)
Counter Bolt Counterintuitivity
• Counts are emitted for same label, same time window many times– these are semi-aggregated– this is a feature– tuples can be acked within 1s– time windows can be much longer than 1s
• No need to send same label to same bolt– speeds failure recovery
![Page 11: Storm 2012 03-29](https://reader036.vdocuments.net/reader036/viewer/2022062300/5581d449d8b42ae06c8b54f2/html5/thumbnails/11.jpg)
Design Flexibility
• Counter can persist short-term transaction log– counter can recover state on failure– log is normally burn after write
• Count flush interval can be extended without extending tuple timeout– Decreases currency of counts– System is still real-time at a longer time-scale
• Total bandwidth for log is typically not huge
![Page 12: Storm 2012 03-29](https://reader036.vdocuments.net/reader036/viewer/2022062300/5581d449d8b42ae06c8b54f2/html5/thumbnails/12.jpg)
Counter Bolt No-nos
• Cannot accumulate entire period in-memory– Tuples must be ack’ed much sooner– State must be persisted before ack’ing– State can easily grow too large to handle without
disk access• Cannot persist entire count table at once – Incremental persistence required
![Page 13: Storm 2012 03-29](https://reader036.vdocuments.net/reader036/viewer/2022062300/5581d449d8b42ae06c8b54f2/html5/thumbnails/13.jpg)
Guarantees
• Counter output volume is small-ish– the greater of k tuples per 100K inputs or k tuple/s– 1 tuple/s/label/bolt for this exercise
• Persistence layer must provide guarantees– distributed against node failure– must have either readable flush or closed-append– HDFS is distributed, but no guarantees– MapRfs is distributed, provides both guarantees
![Page 14: Storm 2012 03-29](https://reader036.vdocuments.net/reader036/viewer/2022062300/5581d449d8b42ae06c8b54f2/html5/thumbnails/14.jpg)
Failure Modes
• Bolt failure– buffered tuples will go un’acked– after timeout, tuples will be resent– timeout ≈ 10s– if failure occurs after persistence, before acking, then double-
counting is possible• Storage (with MapR)– most failures invisible– a few continue within 0-2s, some take 10s– catastrophic cluster restart can take 2-3 min– logger can buffer this much easily
![Page 15: Storm 2012 03-29](https://reader036.vdocuments.net/reader036/viewer/2022062300/5581d449d8b42ae06c8b54f2/html5/thumbnails/15.jpg)
Presentation Layer
• Presentation must– read recent output of Logger bolt– read relevant output of Hadoop jobs– combine semi-aggregated records
• User will see– counts that increment within 0-2 s of events– seamless meld of short and long-term data
![Page 16: Storm 2012 03-29](https://reader036.vdocuments.net/reader036/viewer/2022062300/5581d449d8b42ae06c8b54f2/html5/thumbnails/16.jpg)
16
Mobile Network MonitorTransaction
data
Batch aggregation
Map
Real-time dashboard and alerts
Geo-dispersed ingest servers
Retro-analysisinterface
![Page 17: Storm 2012 03-29](https://reader036.vdocuments.net/reader036/viewer/2022062300/5581d449d8b42ae06c8b54f2/html5/thumbnails/17.jpg)
Example 2 – Real-time learning
• My system has to– learn a response model
and
– select training data– in real-time
• Data rate up to 100K queries per second
![Page 18: Storm 2012 03-29](https://reader036.vdocuments.net/reader036/viewer/2022062300/5581d449d8b42ae06c8b54f2/html5/thumbnails/18.jpg)
Door Number 3
• I have 15 versions of my landing page• Each visitor is assigned to a version– Which version?
• A conversion or sale or whatever can happen– How long to wait?
• Some versions of the landing page are horrible– Don’t want to give them traffic
![Page 19: Storm 2012 03-29](https://reader036.vdocuments.net/reader036/viewer/2022062300/5581d449d8b42ae06c8b54f2/html5/thumbnails/19.jpg)
Real-time Constraints
• Selection must happen in <20 ms almost all the time
• Training events must be handled in <20 ms• Failover must happen within 5 seconds• Client should timeout and back-off– no need for an answer after 500ms
• State persistence required
![Page 20: Storm 2012 03-29](https://reader036.vdocuments.net/reader036/viewer/2022062300/5581d449d8b42ae06c8b54f2/html5/thumbnails/20.jpg)
Rough Design
DRPC Spout Query Event Spout
Logger Bolt
Counter Bolt
Raw Logs
Model State
Timed Join Model
Logger Bolt
Conversion Detector
Selector Layer
![Page 21: Storm 2012 03-29](https://reader036.vdocuments.net/reader036/viewer/2022062300/5581d449d8b42ae06c8b54f2/html5/thumbnails/21.jpg)
A Quick Diversion
• You see a coin– What is the probability of heads?– Could it be larger or smaller than that?
• I flip the coin and while it is in the air ask again• I catch the coin and ask again• I look at the coin (and you don’t) and ask again• Why does the answer change?– And did it ever have a single value?
![Page 22: Storm 2012 03-29](https://reader036.vdocuments.net/reader036/viewer/2022062300/5581d449d8b42ae06c8b54f2/html5/thumbnails/22.jpg)
A First Conclusion
• Probability as expressed by humans is subjective and depends on information and experience
![Page 23: Storm 2012 03-29](https://reader036.vdocuments.net/reader036/viewer/2022062300/5581d449d8b42ae06c8b54f2/html5/thumbnails/23.jpg)
A Second Conclusion
• A single number is a bad way to express uncertain knowledge
• A distribution of values might be better
![Page 24: Storm 2012 03-29](https://reader036.vdocuments.net/reader036/viewer/2022062300/5581d449d8b42ae06c8b54f2/html5/thumbnails/24.jpg)
I Dunno
![Page 25: Storm 2012 03-29](https://reader036.vdocuments.net/reader036/viewer/2022062300/5581d449d8b42ae06c8b54f2/html5/thumbnails/25.jpg)
5 and 5
![Page 26: Storm 2012 03-29](https://reader036.vdocuments.net/reader036/viewer/2022062300/5581d449d8b42ae06c8b54f2/html5/thumbnails/26.jpg)
2 and 10
![Page 27: Storm 2012 03-29](https://reader036.vdocuments.net/reader036/viewer/2022062300/5581d449d8b42ae06c8b54f2/html5/thumbnails/27.jpg)
Bayesian Bandit
• Compute distributions based on data• Sample p1 and p2 from these distributions
• Put a coin in bandit 1 if p1 > p2
• Else, put the coin in bandit 2
![Page 28: Storm 2012 03-29](https://reader036.vdocuments.net/reader036/viewer/2022062300/5581d449d8b42ae06c8b54f2/html5/thumbnails/28.jpg)
And it works!
![Page 29: Storm 2012 03-29](https://reader036.vdocuments.net/reader036/viewer/2022062300/5581d449d8b42ae06c8b54f2/html5/thumbnails/29.jpg)
The Basic Idea
• We can encode a distribution by sampling• Sampling allows unification of exploration and
exploitation
• Can be extended to more general response models
![Page 30: Storm 2012 03-29](https://reader036.vdocuments.net/reader036/viewer/2022062300/5581d449d8b42ae06c8b54f2/html5/thumbnails/30.jpg)
• Contact:– [email protected]– @ted_dunning
• Slides and such:– http://info.mapr.com/ted-storm-2012-03