improving experimentation velocity via multi-armed bandits
TRANSCRIPT
Improving experimentation velocity via Multi-Armed Bandits
Dr Ilias Flaounas Senior Data Scientist
Growth Hacking Meetup, Sydney, 20 June 2016
http://www.nancydixonblog.com/2012/05/-why-knowledge-management-didnt-save-general-motors-addressing-complex-issues-by-convening-conversat.html
Conversion rate
•In a classic A/B we pick where to assign the next user randomly. •In MAB we actively choose the cohort.
Pick black to exploit
Pick green (or red) to explore
Let’s run it for a bit longer… Again, win for variation “d”.
Classic A/B/C/D/E: ~2.5K samples Multi-armed bandit: ~1K samples
60% Less samples
No winner after 1K iterations
Classic A/B/C: ~5K samples Multi-armed bandit: ~1K samples
80% Less samples
No winner after 1K iterations
Classic A/B/C: ~2.8K samples Multi-armed bandit: ~1K samples
64% Less samples
Win for variation “a”.
Classic A/B/C: ~1.8K samples Multi-armed bandit: ~1K samples
45% Less samples
Disadvantages
• Reaching significance for non-winning arms takes longer
• Unclear stopping criteria - App-specific heuristics
• Hard to order non-winning arms and assess reliably their impact
Advantages
• Reaching significance for the winning arm is faster
• Best arm can change over time
• There are no false positives in the long term
• How can we locate the city of Bristol from tweets?
• 10K candidate locations organised in a 100x100 grid
• At every step we get tweets from one location and count the number of mentions of the word “Bristol”
• Challenge: find the target in sub-linear time complexity!
• Contextual bandits can tackle this problem
• We proposed the KernelUCB, a non-linear & contextual flavour of MAB.
• The last few steps of the algorithm before it locates Bristol.
Technical description: M. Valko, N. Korda, R. Munos, I. Flaounas, N. Cristianini, “Finite-Time Analysis of Kernelised Contextual Bandits”, UAI, 2013.
Target is the red dot.
KernelUCB Matlab code: http://www.complacs.org/pmwiki.php/CompLACS/KernelUCB
KernelUCB with RBF kernel converges after ~300 iterations (instead of >>10K).