n=10^9: automated experimentation at scale

46
N=10 9 Automated Experimenta5on at Scale Wojciech Galuba Decision Tools Lead, Facebook @wgaluba

Upload: optimizely

Post on 20-Aug-2015

532 views

Category:

Technology


1 download

TRANSCRIPT

Page 1: N=10^9: Automated Experimentation at Scale

N=109  Automated  

Experimenta5on  at  Scale  Wojciech  Galuba  Decision  Tools  Lead,  

Facebook  @wgaluba  

Page 2: N=10^9: Automated Experimentation at Scale
Page 3: N=10^9: Automated Experimentation at Scale

N=109: Automated Experimentation at Scale

Wojtek Galuba (wgaluba@fb) Decision Tools Team Lead Data Science Infrastructure Facebook

Page 4: N=10^9: Automated Experimentation at Scale

History of Data Science Infra at FB •  Founded April 2012 •  A group of data scientists and software engineers •  Experienced first hand the need for better infrastructure •  Need continues to grow •  Team doubled over the past year •  Expect continued rapid growth this year

Page 5: N=10^9: Automated Experimentation at Scale

Why do we experiment?

Page 6: N=10^9: Automated Experimentation at Scale

Experimentation

Product changes

Experiment to study this

Metrics

Page 7: N=10^9: Automated Experimentation at Scale

Experiment to:

Catch problems before they arise

Page 8: N=10^9: Automated Experimentation at Scale

Experiment to:

Choose between multiple options

Page 9: N=10^9: Automated Experimentation at Scale

Experiment to:

Challenge intuitions about product

Page 10: N=10^9: Automated Experimentation at Scale

Experiment to:

Not only evaluate ideas but generate new ones

Page 11: N=10^9: Automated Experimentation at Scale

Challenges

Page 12: N=10^9: Automated Experimentation at Scale

Many experiments

• Experiments running in parallel • Modifying many different aspects of the product • Overlaps are possible and may conflict

Page 13: N=10^9: Automated Experimentation at Scale

Many metric dimensions • Different contexts of user actions • Thousands of device types • Geography • Demographics • Time • Enormous space of possible questions

Page 14: N=10^9: Automated Experimentation at Scale

Many teams • Many ways to run an experiment • Diverse audience for results • Huge set of results from every experiment • Many ways to interpret results

Page 15: N=10^9: Automated Experimentation at Scale

Experimentation at Facebook

Page 16: N=10^9: Automated Experimentation at Scale

An experiment

Page 17: N=10^9: Automated Experimentation at Scale

QuickExperiment

Div

ide

peop

le ra

ndom

ly color: blue

size: medium"

color: blue"size: big"

color: green"size: medium"

Page 18: N=10^9: Automated Experimentation at Scale

QuickExperiment • Centralized experiment management • Purely config-level: no code pushes to iterate • Automatic exposure logging

Page 19: N=10^9: Automated Experimentation at Scale

PlanOut

Page 20: N=10^9: Automated Experimentation at Scale

PlanOut • Open sourced: http://facebook.github.io/planout/ • Flexible experimental design • Full, programmatic control over param values

Page 21: N=10^9: Automated Experimentation at Scale

Experiment evaluation

Exposures

Metrics

% change from control to test -1 0 1 2 -2 3 -3

posts

99.9 % 99 % 95 % Confidence:

Page 22: N=10^9: Automated Experimentation at Scale

Assess decision risk

99.9 % 99 % 95 % Confidence:

Page 23: N=10^9: Automated Experimentation at Scale

Lessons learned

Page 24: N=10^9: Automated Experimentation at Scale

Computing answers to exponential number of possible questions

Pre-compute • low specificity • low dimensionality • long-term

Compute on-the-fly • high specificity • high dimensionality • short-term

A balancing act

Page 25: N=10^9: Automated Experimentation at Scale

Tackling many dimensions Two sets of tools

For exploration For extraction

Page 26: N=10^9: Automated Experimentation at Scale

Automated exploration

Page 27: N=10^9: Automated Experimentation at Scale

Enforce a lifecycle; In particular:

clear experiment end dates

Page 28: N=10^9: Automated Experimentation at Scale

Why lifecycle policy? • Unifies methodology across teams • Prevents tech debt buildup • Minimizes bad impact on product

Page 29: N=10^9: Automated Experimentation at Scale

Ease of rapid iteration; Safe and scientifically valid iteration

Page 30: N=10^9: Automated Experimentation at Scale

Fast, but not too fast • Novelty effect vs. top engaged users bump • Understand if waiting helps

Page 31: N=10^9: Automated Experimentation at Scale

Ensure mutual exclusion; Across platforms, features and infra

Page 32: N=10^9: Automated Experimentation at Scale

Why mutual exclusion? • Fewer experiment conflicts • Lower metrics variance

Page 33: N=10^9: Automated Experimentation at Scale

Exposure log everything • Measure effects on the exposed only • Conditioning analyses on the time since last exposure

Page 34: N=10^9: Automated Experimentation at Scale

The culture

Experimentation gives focus; But watch out for tunnel vision!

Page 35: N=10^9: Automated Experimentation at Scale

The culture

Cultivate sound practices; Safe and low-impact experimentation

Page 36: N=10^9: Automated Experimentation at Scale

The culture

Educate on data interpretation; Uniform decision-making

across teams

Page 37: N=10^9: Automated Experimentation at Scale

Understanding uncertainty

“Robust misinterpretation of confidence intervals” Rink Hoekstra et al. Psychonomic Bulletin & Review

• Only 3% of scientists got all 6 answers right...

• How do we educate the users of the tools?

Page 38: N=10^9: Automated Experimentation at Scale

The three stages of experimentation

infrastructure

Page 39: N=10^9: Automated Experimentation at Scale

Stage 1: Artisanal

Photo credit: Abhisek Sarda

Page 40: N=10^9: Automated Experimentation at Scale

Stage 2: Power tools

Page 41: N=10^9: Automated Experimentation at Scale

Stage 2: Power tools

Page 42: N=10^9: Automated Experimentation at Scale

Stage 3: Industrialized

Photo credit: Steve Jurvetson

Page 43: N=10^9: Automated Experimentation at Scale

Conclusions

Empower, but don’t overwhelm

Page 44: N=10^9: Automated Experimentation at Scale

Conclusions

Filter and automate, but maintain broad focus

Page 45: N=10^9: Automated Experimentation at Scale

Conclusions Clean data and powerful tools are great, but

building the right experimentation culture is equally important

Page 46: N=10^9: Automated Experimentation at Scale

N=109  Automated  Experimenta5on  at  

Scale  Wojciech  Galuba  

Decision  Tools  Lead,  Facebook  @wgaluba