buzzwords 2012 real time & long time
DESCRIPTION
This is the talk that Ted Dunning gave at Buzzwords. it is a heavily condensed version of the real-time learning talk I have given previously. The associated video is available at http://www.youtube.com/watch?v=1nQBnyaI6s0TRANSCRIPT
![Page 1: Buzzwords 2012 Real time & long time](https://reader033.vdocuments.net/reader033/viewer/2022061218/54b757b54a7959380b8b4573/html5/thumbnails/1.jpg)
1©MapR Technologies - Confidential
Real-time and Long-time with Storm and Hadoop
![Page 2: Buzzwords 2012 Real time & long time](https://reader033.vdocuments.net/reader033/viewer/2022061218/54b757b54a7959380b8b4573/html5/thumbnails/2.jpg)
2©MapR Technologies - Confidential
Real-time and Long-time with Storm and Hadoop MapR
![Page 3: Buzzwords 2012 Real time & long time](https://reader033.vdocuments.net/reader033/viewer/2022061218/54b757b54a7959380b8b4573/html5/thumbnails/3.jpg)
3©MapR Technologies - Confidential
The Challenge
Hadoop is great of processing vats of data– But sucks for real-time (by design!)
Storm is great for real-time processing– But lacks any way to deal with batch processing
It sounds like there isn’t a solution– Neither fashionable solution handles everything
![Page 4: Buzzwords 2012 Real time & long time](https://reader033.vdocuments.net/reader033/viewer/2022061218/54b757b54a7959380b8b4573/html5/thumbnails/4.jpg)
4©MapR Technologies - Confidential
This is not a problem.
It’s an opportunity!
![Page 5: Buzzwords 2012 Real time & long time](https://reader033.vdocuments.net/reader033/viewer/2022061218/54b757b54a7959380b8b4573/html5/thumbnails/5.jpg)
5©MapR Technologies - Confidential
t
now
Hadoop is Not Very Real-time
UnprocessedData
Fully processed
Latest full period
Hadoop job takes this long for this data
![Page 6: Buzzwords 2012 Real time & long time](https://reader033.vdocuments.net/reader033/viewer/2022061218/54b757b54a7959380b8b4573/html5/thumbnails/6.jpg)
6©MapR Technologies - Confidential
t
now
Hadoop works great back here
Storm workshere
Real-time and Long-time together
Blended view
Blended view
Blended View
![Page 7: Buzzwords 2012 Real time & long time](https://reader033.vdocuments.net/reader033/viewer/2022061218/54b757b54a7959380b8b4573/html5/thumbnails/7.jpg)
7©MapR Technologies - Confidential
Rough Design – Data Flow
Search Engine
Query Event Spout
Logger Bolt
Counter Bolt
Raw Logs
LoggerBolt
Semi Agg
Hadoop Aggregator
Snap
Long agg
Query Event Spout
Counter Bolt
Logger Bolt
![Page 8: Buzzwords 2012 Real time & long time](https://reader033.vdocuments.net/reader033/viewer/2022061218/54b757b54a7959380b8b4573/html5/thumbnails/8.jpg)
8©MapR Technologies - Confidential
Guarantees
Counter output volume is small-ish– the greater of k tuples per 100K inputs or k tuple/s– 1 tuple/s/label/bolt for this exercise
Persistence layer must provide guarantees– distributed against node failure– must have either readable flush or closed-append
HDFS is distributed, but provides no guarantees and strange semantics
MapRfs is distributed, provides all necessary guarantees
![Page 9: Buzzwords 2012 Real time & long time](https://reader033.vdocuments.net/reader033/viewer/2022061218/54b757b54a7959380b8b4573/html5/thumbnails/9.jpg)
9©MapR Technologies - Confidential
Presentation Layer
Presentation must– read recent output of Logger bolt– read relevant output of Hadoop jobs– combine semi-aggregated records
User will see– counts that increment within 0-2 s of events– seamless meld of short and long-term data
![Page 10: Buzzwords 2012 Real time & long time](https://reader033.vdocuments.net/reader033/viewer/2022061218/54b757b54a7959380b8b4573/html5/thumbnails/10.jpg)
10©MapR Technologies - Confidential
Example 2 – AB testing in real-time
I have 15 versions of my landing page Each visitor is assigned to a version– Which version?
A conversion or sale or whatever can happen– How long to wait?
Some versions of the landing page are horrible– Don’t want to give them traffic
![Page 11: Buzzwords 2012 Real time & long time](https://reader033.vdocuments.net/reader033/viewer/2022061218/54b757b54a7959380b8b4573/html5/thumbnails/11.jpg)
11©MapR Technologies - Confidential
A Quick Diversion
You see a coin– What is the probability of heads?– Could it be larger or smaller than that?
I flip the coin and while it is in the air ask again I catch the coin and ask again I look at the coin (and you don’t) and ask again Why does the answer change?– And did it ever have a single value?
![Page 12: Buzzwords 2012 Real time & long time](https://reader033.vdocuments.net/reader033/viewer/2022061218/54b757b54a7959380b8b4573/html5/thumbnails/12.jpg)
12©MapR Technologies - Confidential
A Philosophical Conclusion
Probability as expressed by humans is subjective and depends on information and experience
![Page 13: Buzzwords 2012 Real time & long time](https://reader033.vdocuments.net/reader033/viewer/2022061218/54b757b54a7959380b8b4573/html5/thumbnails/13.jpg)
13©MapR Technologies - Confidential
I Dunno
![Page 14: Buzzwords 2012 Real time & long time](https://reader033.vdocuments.net/reader033/viewer/2022061218/54b757b54a7959380b8b4573/html5/thumbnails/14.jpg)
14©MapR Technologies - Confidential
5 heads out of 10 throws
![Page 15: Buzzwords 2012 Real time & long time](https://reader033.vdocuments.net/reader033/viewer/2022061218/54b757b54a7959380b8b4573/html5/thumbnails/15.jpg)
15©MapR Technologies - Confidential
2 heads out of 12 throws
![Page 16: Buzzwords 2012 Real time & long time](https://reader033.vdocuments.net/reader033/viewer/2022061218/54b757b54a7959380b8b4573/html5/thumbnails/16.jpg)
16©MapR Technologies - Confidential
Bayesian Bandit
Compute distributions based on data Sample p1 and p2 from these distributions
Put a coin in bandit 1 if p1 > p2
Else, put the coin in bandit 2
![Page 17: Buzzwords 2012 Real time & long time](https://reader033.vdocuments.net/reader033/viewer/2022061218/54b757b54a7959380b8b4573/html5/thumbnails/17.jpg)
17©MapR Technologies - Confidential
And it works!
![Page 18: Buzzwords 2012 Real time & long time](https://reader033.vdocuments.net/reader033/viewer/2022061218/54b757b54a7959380b8b4573/html5/thumbnails/18.jpg)
18©MapR Technologies - Confidential
Video Demo
![Page 19: Buzzwords 2012 Real time & long time](https://reader033.vdocuments.net/reader033/viewer/2022061218/54b757b54a7959380b8b4573/html5/thumbnails/19.jpg)
19©MapR Technologies - Confidential
The Code
Select an alternative
Select and learn
But we already know how to count!
n = dim(k)[1] p0 = rep(0, length.out=n) for (i in 1:n) { p0[i] = rbeta(1, k[i,2]+1, k[i,1]+1) } return (which(p0 == max(p0)))
for (z in 1:steps) { i = select(k) j = test(i) k[i,j] = k[i,j]+1 } return (k)
![Page 20: Buzzwords 2012 Real time & long time](https://reader033.vdocuments.net/reader033/viewer/2022061218/54b757b54a7959380b8b4573/html5/thumbnails/20.jpg)
20©MapR Technologies - Confidential
The Basic Idea
We can encode a distribution by sampling Sampling allows unification of exploration and exploitation
Can be extended to more general response models
![Page 21: Buzzwords 2012 Real time & long time](https://reader033.vdocuments.net/reader033/viewer/2022061218/54b757b54a7959380b8b4573/html5/thumbnails/21.jpg)
21©MapR Technologies - Confidential
Contact:– [email protected]– @ted_dunning
Slides and such (available late tonight):– http://info.mapr.com/ted-bbuzz-2012
Hash tags: #mapr #bbuzz
Collective notes: http://bit.ly/JDCRhc
![Page 22: Buzzwords 2012 Real time & long time](https://reader033.vdocuments.net/reader033/viewer/2022061218/54b757b54a7959380b8b4573/html5/thumbnails/22.jpg)
22©MapR Technologies - Confidential
Thank You