improving the sensitivity of online controlled experiments by utilizing pre-experiment data

21
Matsuo Lab, The University of Tokyo Improving the Sensitivity of Online Controlled Experiments by Utilizing Pre-Experiment Data Deng, Alex, et al. Proceedings of the sixth ACM international conference on Web search and data mining (WSDM). ACM, 2013. Presenter: Shuhei Iitsuka 1

Upload: shuhei-iitsuka

Post on 23-Jan-2017

105 views

Category:

Technology


0 download

TRANSCRIPT

Matsuo Lab, The University of Tokyo

Improving the Sensitivity of Online Controlled Experiments by Utilizing Pre-Experiment DataDeng, Alex, et al. Proceedings of the sixth ACM international conference on Web search and data mining (WSDM). ACM, 2013.

Presenter: Shuhei Iitsuka

1

Matsuo Lab, The University of Tokyo

Abstract

Background: Online controlled experiments (a.k.a. A/B testing) are playing role in making data-driven decisions in several companies.

Problem: Business wants to speed up the experiments with smaller samples and shorter duration.

Proposal: Reduce metric variability utilizing pre-experiment data.→ CUPED

The technique is evaluated on Bing, which shows CUPED can reduce variance by 50%.

This is applicable to wide range of key business metrics and easy to implement.

2

Matsuo Lab, The University of Tokyo

Reason of choice

● Selecting good KPI is necessary for the effective online experiments.

● Sensitivity is one important factor for good KPI.

3

Generate variationsx ∈ X

Set up the KPIto optimizef(x)

Search the optimal solutionx* = argmax f(x)

The process of online experiments

Matsuo Lab, The University of Tokyo

Agenda

1. Introduction

2. Background

3. Proposed Method

4. Empirical Results

5. Conclusions

4

Matsuo Lab, The University of Tokyo

Impact of online controlled experiments

● The oldest and powerful methodology for decision making.→ Rediscovered in the web industry.

● The change is critical for business even the difference is small.

5

Introduction

Widgets tested for MSN Real EstateWinner

Example: MSN Real Estate● Tested 6 variations● Key metrics: # of transfers

to partner website

→ Achieved +10% revenue

Matsuo Lab, The University of Tokyo

Why variance reduction matters

Companies are not satisfied with the amount of traffic (even Google!)

● The effects to detect tend to be very small.○ Δ ∝ n^(-2): If you want to detect 1/10 size difference, you need

100 times more users.● Launch good features early. Take down negative ones early.● Only a small fraction of users is affected by the treatments.● Want to run the experiments parallel to keep up the innovation.

6

Introduction

Matsuo Lab, The University of Tokyo

Current solutions

● Use a different metric● Filtering out users not affected● Page level randomization

How can we establish the ideal method?

● No assumption of parametric model● Applicable for any metric

7

Introduction

Matsuo Lab, The University of Tokyo

Objectives

The authors propose CUPED (Controlled-experiment Using Pre-Experiment Data), which adjusts metrics using pre-experiment data to reduce metric variability.

8

Introduction

Matsuo Lab, The University of Tokyo

Overall structure of this paper

9

Introduction

Variance Reduction

Stratification

Control VariatesAdjustment forOnline Experiments CUPED

Methods to adjust metrics to reduce variability

Matsuo Lab, The University of Tokyo

Two-sample t-test

Commonly used method to evaluate the difference of means.

10

Background

The problem: Can we find the adjusted estimate which satisfies?

● Still unbiased● Has a smaller variance than

→ We can increase the sensitivity of the metric.

Matsuo Lab, The University of Tokyo

Variance Reduction #1: Stratification

In short, grouping.

We can reduce the variance by using weighted mean of metric from each strata (=group).

Metric:

Adjusted metric:

where K is # of stratas and n is # of samples in strata k.

11

Background

Strata: Age 6-9

Strata: Age 10-13

Strata: Age 14-17

Matsuo Lab, The University of Tokyo

Principle of stratification

Intuitively, the variance reduction comes from removing between-strata variance.

12

Background

within-strata variance between-strata variance

Matsuo Lab, The University of Tokyo

Stratification for online experimentation

Assume X as a covariate which denotes the attribute of groups (e.g.; Browser type, cookie ID etc.)

13

Background

The impact of variance reduction relies on how we group the samples

= which covariate X we take.

t: treatment (AFTER)c: control (BEFORE)

Matsuo Lab, The University of Tokyo

Variance Reduction #2: Control Variates

We can reduce variance with the adjusted metric below.

where θ is any constraint.

Why?

14

Background

The impact relies on the correlation coefficient.

Matsuo Lab, The University of Tokyo

Control variates for online experimentation

As the result of the formula development,

So, if you find covariate X which strongly correlates to Y, you can reduce the variability by learning optimal θ to adjust the metric.

Since these two approaches are closely related, we will go forward with Control variates, which has more generalized expression.

15

Background

Matsuo Lab, The University of Tokyo

CUPED in practice

CUPED adjust the metric by learning the parameter from the data gathered in pre-experiment period.

The simplest and strongest approach is to use the same variable from the pre-experiment period as the covariate X.

But in practice, there are some cases this approach does not work.

● Missing pre-experiment data● Handling non-user metrics● Biased metrics

16

Proposed Method

Matsuo Lab, The University of Tokyo

Experiment #1: Slowdown experiment in Bing

Page load-time affects user engagement which is reflected by CTR.

Compare the sensitivity of each metric by slowing down the load-time by 250 milliseconds.

• t-test: Simple t-test on Y (=CTR)• CUPED: T-test on adjusted Y

17

Empirical Results

t-test required two weeks to exceed p=0.05 line, while CUPED shows significant difference from day 1.

CUPED shows high performance even using only half the users while t-test uses all users.

Matsuo Lab, The University of Tokyo

Experiment #2: Covariate & CUPED effectiveness

By conducting A/A test, the authors evaluated how covariate affect CUPED effectiveness (=variance reduction)

18

Empirical Results

Variation reduction for Queries/UU using different covariates.

Green: The variance is reduced when the same variable is used for CUPED.

Blue: Not much improvement is made with less correlated covariable.

Red: By combining two, the performance is slightly improved.

Matsuo Lab, The University of Tokyo

Experiment #3: Duration & CUPED effectiveness

In addition to experiment #2, the authors changed the length of pre-experiment period length.

19

Empirical Results

Impact of pre-experiment period length

Pre-experiment period: The longer is better. It increases coverage of user samples.

Experiment period: The longer is not necessarily better. It can decreases the coverage of users because of new visitors in the later phase.

Matsuo Lab, The University of Tokyo

Experiment #4: Warning on using post-triggering dataThe covariate X should be satisfy condition

If it’s not satisfied, the result can be opposite.

20

Empirical Results

Example where results are directionally incorrect when covariates violate the pre-triggering requirement.

Red: Positive effectBlue: Negative effect

This example takes Distinct queries per user as the covariate, but the expectation value of each group was not the same.→ The contradiction is caused.

Matsuo Lab, The University of Tokyo

Conclusions

● The authors introduced CUPED, a technique to increase the sensitivity of controlled experiments by utilizing pre-experiment data.

● The system is live in Bing, and the empirical results showed variance reductions of around 50%.

● The guidance for practical use:○ Variance works best when the distribution varies across

user segments.○ Using the metric measured in the pre-period as the covariate

is the best way.○ Take 1-2 weeks for pre-experiment period.○ Never use covariates that could be affected by the

treatment.

21