netflix_controlled experimentation_panel_the hive

1

Some Insights from Netflix

Experimentation

Experimentation Panel 3-20-13

Experimentation at Netflix

Core to our culture

Goal is to maximize our customers’ viewing

enjoyment

New and existing global members participate in

multiple tests

We experiment in all areas (personalization

algorithms, product features, acquisition,

streaming optimization, etc.)

2

Clarity on key metric(s) is critical

Netflix’s goal with our members: Continually

improve member enjoyment

Retention

Netflix’s goal with our visitors: Optimize visitor

experience to entice people to try Netflix

Free trial conversion

3

4

What about other great metrics

that you believe to be a positive measure?

Determining the appropriate use of a metric

5

Predictive

modeling

(of core

metric)

Vet any “winners”

with PMs and past

experiments

Productize

successful

metrics

Brainstorm

potential metrics,

collect new data

Example ranking of some possible metrics

6

0

0.002

0.004

0.006

0.008

0.01

0.012

Variable

Im

port

ance M

easure

Streaming hours is a key secondary metric

0 1 2 3 4 5 6 7 8 9 1011121314151617181920212223242526272829303132333435363738394041424344454647484950

Vo

lun

tary

Can

cel

Rate

Customers‟ Stream Hours in the past 28 days

We predict customer tenure from streaming

hours

8

Total hours consumed during 22 days of membership

Probability of retaining

at each future billing cycle

based on streaming S hours

at N days of tenure

Re

ten

tio

n

Leverage the retention-hours

curves above to measure

the full distribution of hours

in each test cell and predict tenure

Streaming Hours

Cu

me

% i

n T

es

t C

ell

„Search-based Rows‟ Experiment

9

Percent of streaming hours from search-

based rows

10

Filtered measurement

Activity filtering: Filter to a subset of activity –e.g. streaming hours from one row

Controversial for decision-making; risk increases as the interaction potential (or cannibalization potential) increases

Allocation filtering: Filter to a subset of members in the test – e.g streaming hours for the subset of customers who performed a search

Good for decision-making as long as:

1. The segment incorporates the full set of members who were exposed to the experience being tested

2. Segment is large enough to care about (or strategically important)

3. The segment holds up to a controlled experiment (members comprising the segment are not selected in a way that could have been influenced by the test experience) 11

Unintended threats to controlled experiment

Engineering bug (A and B don’t work as

intended)

Control cell is not engineered like a true test cell

(“fixed”), and instead uses the standard

production experience

Unplanned interaction with other

experiments, campaigns, etc. that is differential

across test cells

12

Experiment on minimum number of

qualifying titles in order for a “genre row” to

appear

13

Negative results appeared immediately

14

Discovered that the test cells were not

working properly

15

Number of genre rows on the page

Cumulative distribution of page views by test cell

Customers in the

test cell using 15 as the

minimum were seeing

fewer rows altogether

netflix_controlled experimentation_panel_the hive

Technology

streaming s hours

g streaming hours

key metrics

core metrics

test experience

streaming hours8 total

streaming optimization

distribution of hours