improving the sensitivity of online controlled experiments by utilizing pre-experiment data
TRANSCRIPT
Matsuo Lab, The University of Tokyo
Improving the Sensitivity of Online Controlled Experiments by Utilizing Pre-Experiment DataDeng, Alex, et al. Proceedings of the sixth ACM international conference on Web search and data mining (WSDM). ACM, 2013.
Presenter: Shuhei Iitsuka
1
Matsuo Lab, The University of Tokyo
Abstract
Background: Online controlled experiments (a.k.a. A/B testing) are playing role in making data-driven decisions in several companies.
Problem: Business wants to speed up the experiments with smaller samples and shorter duration.
Proposal: Reduce metric variability utilizing pre-experiment data.→ CUPED
The technique is evaluated on Bing, which shows CUPED can reduce variance by 50%.
This is applicable to wide range of key business metrics and easy to implement.
2
Matsuo Lab, The University of Tokyo
Reason of choice
● Selecting good KPI is necessary for the effective online experiments.
● Sensitivity is one important factor for good KPI.
3
Generate variationsx ∈ X
Set up the KPIto optimizef(x)
Search the optimal solutionx* = argmax f(x)
The process of online experiments
Matsuo Lab, The University of Tokyo
Agenda
1. Introduction
2. Background
3. Proposed Method
4. Empirical Results
5. Conclusions
4
Matsuo Lab, The University of Tokyo
Impact of online controlled experiments
● The oldest and powerful methodology for decision making.→ Rediscovered in the web industry.
● The change is critical for business even the difference is small.
5
Introduction
Widgets tested for MSN Real EstateWinner
Example: MSN Real Estate● Tested 6 variations● Key metrics: # of transfers
to partner website
→ Achieved +10% revenue
Matsuo Lab, The University of Tokyo
Why variance reduction matters
Companies are not satisfied with the amount of traffic (even Google!)
● The effects to detect tend to be very small.○ Δ ∝ n^(-2): If you want to detect 1/10 size difference, you need
100 times more users.● Launch good features early. Take down negative ones early.● Only a small fraction of users is affected by the treatments.● Want to run the experiments parallel to keep up the innovation.
6
Introduction
Matsuo Lab, The University of Tokyo
Current solutions
● Use a different metric● Filtering out users not affected● Page level randomization
How can we establish the ideal method?
● No assumption of parametric model● Applicable for any metric
7
Introduction
Matsuo Lab, The University of Tokyo
Objectives
The authors propose CUPED (Controlled-experiment Using Pre-Experiment Data), which adjusts metrics using pre-experiment data to reduce metric variability.
8
Introduction
Matsuo Lab, The University of Tokyo
Overall structure of this paper
9
Introduction
Variance Reduction
Stratification
Control VariatesAdjustment forOnline Experiments CUPED
Methods to adjust metrics to reduce variability
Matsuo Lab, The University of Tokyo
Two-sample t-test
Commonly used method to evaluate the difference of means.
10
Background
The problem: Can we find the adjusted estimate which satisfies?
● Still unbiased● Has a smaller variance than
→ We can increase the sensitivity of the metric.
Matsuo Lab, The University of Tokyo
Variance Reduction #1: Stratification
In short, grouping.
We can reduce the variance by using weighted mean of metric from each strata (=group).
Metric:
Adjusted metric:
where K is # of stratas and n is # of samples in strata k.
11
Background
Strata: Age 6-9
Strata: Age 10-13
Strata: Age 14-17
Matsuo Lab, The University of Tokyo
Principle of stratification
Intuitively, the variance reduction comes from removing between-strata variance.
12
Background
within-strata variance between-strata variance
Matsuo Lab, The University of Tokyo
Stratification for online experimentation
Assume X as a covariate which denotes the attribute of groups (e.g.; Browser type, cookie ID etc.)
13
Background
The impact of variance reduction relies on how we group the samples
= which covariate X we take.
t: treatment (AFTER)c: control (BEFORE)
Matsuo Lab, The University of Tokyo
Variance Reduction #2: Control Variates
We can reduce variance with the adjusted metric below.
where θ is any constraint.
Why?
14
Background
The impact relies on the correlation coefficient.
Matsuo Lab, The University of Tokyo
Control variates for online experimentation
As the result of the formula development,
So, if you find covariate X which strongly correlates to Y, you can reduce the variability by learning optimal θ to adjust the metric.
Since these two approaches are closely related, we will go forward with Control variates, which has more generalized expression.
15
Background
Matsuo Lab, The University of Tokyo
CUPED in practice
CUPED adjust the metric by learning the parameter from the data gathered in pre-experiment period.
The simplest and strongest approach is to use the same variable from the pre-experiment period as the covariate X.
But in practice, there are some cases this approach does not work.
● Missing pre-experiment data● Handling non-user metrics● Biased metrics
16
Proposed Method
Matsuo Lab, The University of Tokyo
Experiment #1: Slowdown experiment in Bing
Page load-time affects user engagement which is reflected by CTR.
Compare the sensitivity of each metric by slowing down the load-time by 250 milliseconds.
• t-test: Simple t-test on Y (=CTR)• CUPED: T-test on adjusted Y
17
Empirical Results
t-test required two weeks to exceed p=0.05 line, while CUPED shows significant difference from day 1.
CUPED shows high performance even using only half the users while t-test uses all users.
Matsuo Lab, The University of Tokyo
Experiment #2: Covariate & CUPED effectiveness
By conducting A/A test, the authors evaluated how covariate affect CUPED effectiveness (=variance reduction)
18
Empirical Results
Variation reduction for Queries/UU using different covariates.
Green: The variance is reduced when the same variable is used for CUPED.
Blue: Not much improvement is made with less correlated covariable.
Red: By combining two, the performance is slightly improved.
Matsuo Lab, The University of Tokyo
Experiment #3: Duration & CUPED effectiveness
In addition to experiment #2, the authors changed the length of pre-experiment period length.
19
Empirical Results
Impact of pre-experiment period length
Pre-experiment period: The longer is better. It increases coverage of user samples.
Experiment period: The longer is not necessarily better. It can decreases the coverage of users because of new visitors in the later phase.
Matsuo Lab, The University of Tokyo
Experiment #4: Warning on using post-triggering dataThe covariate X should be satisfy condition
If it’s not satisfied, the result can be opposite.
20
Empirical Results
Example where results are directionally incorrect when covariates violate the pre-triggering requirement.
Red: Positive effectBlue: Negative effect
This example takes Distinct queries per user as the covariate, but the expectation value of each group was not the same.→ The contradiction is caused.
Matsuo Lab, The University of Tokyo
Conclusions
● The authors introduced CUPED, a technique to increase the sensitivity of controlled experiments by utilizing pre-experiment data.
● The system is live in Bing, and the empirical results showed variance reductions of around 50%.
● The guidance for practical use:○ Variance works best when the distribution varies across
user segments.○ Using the metric measured in the pre-period as the covariate
is the best way.○ Take 1-2 weeks for pre-experiment period.○ Never use covariates that could be affected by the
treatment.
21