netflix_controlled experimentation_panel_the hive
TRANSCRIPT
Experimentation at Netflix
Core to our culture
Goal is to maximize our customers’ viewing
enjoyment
New and existing global members participate in
multiple tests
We experiment in all areas (personalization
algorithms, product features, acquisition,
streaming optimization, etc.)
2
Clarity on key metric(s) is critical
Netflix’s goal with our members: Continually
improve member enjoyment
Retention
Netflix’s goal with our visitors: Optimize visitor
experience to entice people to try Netflix
Free trial conversion
3
Determining the appropriate use of a metric
5
Predictive
modeling
(of core
metric)
Vet any “winners”
with PMs and past
experiments
Productize
successful
metrics
Brainstorm
potential metrics,
collect new data
Example ranking of some possible metrics
6
0
0.002
0.004
0.006
0.008
0.01
0.012
Variable
Im
port
ance M
easure
Streaming hours is a key secondary metric
0 1 2 3 4 5 6 7 8 9 1011121314151617181920212223242526272829303132333435363738394041424344454647484950
Vo
lun
tary
Can
cel
Rate
Customers‟ Stream Hours in the past 28 days
We predict customer tenure from streaming
hours
8
Total hours consumed during 22 days of membership
Probability of retaining
at each future billing cycle
based on streaming S hours
at N days of tenure
Re
ten
tio
n
Leverage the retention-hours
curves above to measure
the full distribution of hours
in each test cell and predict tenure
Streaming Hours
Cu
me
% i
n T
es
t C
ell
Filtered measurement
Activity filtering: Filter to a subset of activity –e.g. streaming hours from one row
Controversial for decision-making; risk increases as the interaction potential (or cannibalization potential) increases
Allocation filtering: Filter to a subset of members in the test – e.g streaming hours for the subset of customers who performed a search
Good for decision-making as long as:
1. The segment incorporates the full set of members who were exposed to the experience being tested
2. Segment is large enough to care about (or strategically important)
3. The segment holds up to a controlled experiment (members comprising the segment are not selected in a way that could have been influenced by the test experience) 11
Unintended threats to controlled experiment
Engineering bug (A and B don’t work as
intended)
Control cell is not engineered like a true test cell
(“fixed”), and instead uses the standard
production experience
Unplanned interaction with other
experiments, campaigns, etc. that is differential
across test cells
12