sample size – the indispensable a/b test calculation that you’re not making

75
Sample Size The indispensable A/B test calculation that you’re not making.

Upload: zack-notes

Post on 02-Jul-2015

291 views

Category:

Marketing


0 download

DESCRIPTION

If you’re a marketer it’s very likely that you’ve run an A/B test. It’s also likely that you’ve never calculated the sample size for your tests, and instead, you run tests until they reach statistical significance. If this is the case, your strategy is statistically flawed. Conforming to sample size requires marketers to wait longer for test results, but choosing to ignore it will bear false positives and lead to bad decisions. This deck was created for an email audience for there are valuable lessons for anyone who runs A/B tests.

TRANSCRIPT

Page 1: SAMPLE SIZE – The indispensable A/B test calculation that you’re not making

Sample Size

The indispensable A/B test calculation

that you’re not making.

Page 2: SAMPLE SIZE – The indispensable A/B test calculation that you’re not making

As Marketers, many of us run A/B Tests

Page 3: SAMPLE SIZE – The indispensable A/B test calculation that you’re not making

We test copy

Page 4: SAMPLE SIZE – The indispensable A/B test calculation that you’re not making

We test design

Page 5: SAMPLE SIZE – The indispensable A/B test calculation that you’re not making

We test subject lines

Page 6: SAMPLE SIZE – The indispensable A/B test calculation that you’re not making

We choose winners

Page 7: SAMPLE SIZE – The indispensable A/B test calculation that you’re not making

Version A is converting better than Version B and statistical significance

has breached 95%.

So, Version A won.

Page 8: SAMPLE SIZE – The indispensable A/B test calculation that you’re not making

Version A is converting better than Version B and statistical significance

has breached 95%.

So, Version A won.

OR DID IT?

Page 9: SAMPLE SIZE – The indispensable A/B test calculation that you’re not making

That math is half-baked

Page 10: SAMPLE SIZE – The indispensable A/B test calculation that you’re not making

Suppose you check an A/B Test twice: Once after 200 impressions and then after 500.

Then you end the test.

Page 11: SAMPLE SIZE – The indispensable A/B test calculation that you’re not making

Now, instead, suppose you stop the test once you reach significance:

Page 12: SAMPLE SIZE – The indispensable A/B test calculation that you’re not making

Now, suppose you stop the experiment as soon

as there is a significant result:

FALSE POSITIVE!

Page 13: SAMPLE SIZE – The indispensable A/B test calculation that you’re not making
Page 14: SAMPLE SIZE – The indispensable A/B test calculation that you’re not making

How often will you get a false positive?

Page 15: SAMPLE SIZE – The indispensable A/B test calculation that you’re not making

26.1%So you just went from 95% confidence to 74%

This is a worst-case scenario. BUT, some test platforms do this automatically!

Assuming you check results after every impression andstop once you reach significance….

Page 16: SAMPLE SIZE – The indispensable A/B test calculation that you’re not making

OK…well, then when should I stop an A/B test?

Page 17: SAMPLE SIZE – The indispensable A/B test calculation that you’re not making

SAMPLE SIZEDictates how long to run a test

Page 18: SAMPLE SIZE – The indispensable A/B test calculation that you’re not making

SAMPLE SIZE

• Used religiously in the pharmaceutical Industry, economic studies, etc…

Page 19: SAMPLE SIZE – The indispensable A/B test calculation that you’re not making

https://www.optimizely.com/resources/sample-size-calculator

Page 20: SAMPLE SIZE – The indispensable A/B test calculation that you’re not making

Agenda

1. How we put this into practice on a website test

2. How we applied these learnings to email testing:

• Open rates

• Click to Open Rates

• Conversion Rates

Page 21: SAMPLE SIZE – The indispensable A/B test calculation that you’re not making

A/B Testing on your websiteHere’s your new test process:

1. Determine your baseline conversion rate (or click rate, or download rate, etc..)

2. Decide how long you are willing to wait for a result. Convert your unique traffic metric to a sample size.

3. Adjust MDE (Minimum Detectable Effect) until your Sample Size is just under the target you determined in #2 above.

4. Re-adjust MDE until you are content.

5. Start the test, and don’t stop until you hit the sample size.

Page 22: SAMPLE SIZE – The indispensable A/B test calculation that you’re not making

Case Study: Item Urgency

Page 23: SAMPLE SIZE – The indispensable A/B test calculation that you’re not making

Case Study: Item Urgency

TEST (VERSION A):INVENTORY NOTIFICATION

CONTROL (VERSION B):NO INVENTORY NOTIFICATION

Page 24: SAMPLE SIZE – The indispensable A/B test calculation that you’re not making

STEP 1 – We determined our baseline conversion rate

Page 25: SAMPLE SIZE – The indispensable A/B test calculation that you’re not making

STEP 2 – Calculate Target Sample Size

We initially decided we wanted a result in 2 weeks. So we took the last 2 weeks of unique product page views:

Page 26: SAMPLE SIZE – The indispensable A/B test calculation that you’re not making

STEP 2 – Calculate Target Sample Size

We then divided that number by two (since we’ll have two test segments)

Divided by two again to account for desktop traffic only

Then multiplied by 5% (since the message only displays on 5% of product pages)

Sample Size -> 12,351

Page 27: SAMPLE SIZE – The indispensable A/B test calculation that you’re not making

This gave us 30% MDE (Conversion Lift). This is unrealistic

Page 28: SAMPLE SIZE – The indispensable A/B test calculation that you’re not making

How about 10% ?

Page 29: SAMPLE SIZE – The indispensable A/B test calculation that you’re not making

107,105 unique visits ~ 17 weeks

Page 30: SAMPLE SIZE – The indispensable A/B test calculation that you’re not making

Wow, that’s a long time…

Page 31: SAMPLE SIZE – The indispensable A/B test calculation that you’re not making

Yep.

Page 32: SAMPLE SIZE – The indispensable A/B test calculation that you’re not making

You’re probably not running your tests long

enough

Page 33: SAMPLE SIZE – The indispensable A/B test calculation that you’re not making

WAIT A MINUTE.

MY A/B TEST PLATFORM SAYS NOTHING ABOUT SAMPLE SIZE…

Page 34: SAMPLE SIZE – The indispensable A/B test calculation that you’re not making

EVERYONE WANTS INSTANT GRATIFICATION

Page 35: SAMPLE SIZE – The indispensable A/B test calculation that you’re not making

YOUR A/B TEST PLATFORM IS HAPPY TO SELL IT

Page 36: SAMPLE SIZE – The indispensable A/B test calculation that you’re not making

Quietly assuming you have calculated sample size on your own

Page 37: SAMPLE SIZE – The indispensable A/B test calculation that you’re not making

Item Urgency - Test ResultsWe are over 4 weeks in….

*Conv. rate is higher than expected because test platform runs on 7 day conversion window.

Page 38: SAMPLE SIZE – The indispensable A/B test calculation that you’re not making

Lift is over 10%

Note the spike in the beginning and the increased stabilization with time

Item Urgency - Test Results

Page 39: SAMPLE SIZE – The indispensable A/B test calculation that you’re not making

The effect is slowly approaching the MDE

Test Results

Page 40: SAMPLE SIZE – The indispensable A/B test calculation that you’re not making

Significance is now over 95%, but it’s been up and down.

Many marketers would stop the test on 9/5 and declare a 57% Lift.

Test Results

Page 41: SAMPLE SIZE – The indispensable A/B test calculation that you’re not making

Email Testing

Page 42: SAMPLE SIZE – The indispensable A/B test calculation that you’re not making

After learning about Sample Size, we reconsidered our email testing strategy

• Open Rate (Subject line testing)

• Click-to-Open (CTO) Rate

• Conversion Rate

Page 43: SAMPLE SIZE – The indispensable A/B test calculation that you’re not making

OPEN RATE

We used sample size to gut check the size of our subject line test segments

Page 44: SAMPLE SIZE – The indispensable A/B test calculation that you’re not making

OPEN RATE

Remember, for the sample size calculator, you need the baseline conversion rate and then the sample size, and that will give you

MDE.

Page 45: SAMPLE SIZE – The indispensable A/B test calculation that you’re not making

OPEN RATE

First, we needed the baseline conversion open rate

Page 46: SAMPLE SIZE – The indispensable A/B test calculation that you’re not making

OPEN RATE

Our open rates typically end up ~ 17% , but when we make the call on our winning subject line, open

rates are usually around 7%.

Page 47: SAMPLE SIZE – The indispensable A/B test calculation that you’re not making

OPEN RATE

Next we need the sample size

Page 48: SAMPLE SIZE – The indispensable A/B test calculation that you’re not making

OPEN RATE

We always test 4 different subject lines.

We had been sending each subject line to 10,000 customers.

So, sample size ~ 10,000

Page 49: SAMPLE SIZE – The indispensable A/B test calculation that you’re not making

OPEN RATE

Plugging these numbers in, this would only detect 13% open rate lift or higher

Page 50: SAMPLE SIZE – The indispensable A/B test calculation that you’re not making

OPEN RATE

13% lift on 17% open rate is 19.2%.

We rarely see subject lines this high

We needed a lower MDE to make sure we could detect more winners…

Page 51: SAMPLE SIZE – The indispensable A/B test calculation that you’re not making

OPEN RATE

We ended up doubling our subject line segment to 80,000, giving us an MDE ~ 9.2%

Page 52: SAMPLE SIZE – The indispensable A/B test calculation that you’re not making

CTO

First we needed the baseline

Page 53: SAMPLE SIZE – The indispensable A/B test calculation that you’re not making

CTO

We averaged the last 10 weeks -> 11% CTO

Page 54: SAMPLE SIZE – The indispensable A/B test calculation that you’re not making

CTO

Sample size = ½ of the avg opens count

Page 55: SAMPLE SIZE – The indispensable A/B test calculation that you’re not making

CTO

We averaged the last 10 weeks -> Avgopens = 107,000 / 2 = 53,500

Page 56: SAMPLE SIZE – The indispensable A/B test calculation that you’re not making

CTO

Page 57: SAMPLE SIZE – The indispensable A/B test calculation that you’re not making

CTO

4.4% CTO lift is a very reasonable goal for a test.

This showed us that we could trust most of the results of our past CTO tests.

Page 58: SAMPLE SIZE – The indispensable A/B test calculation that you’re not making

GRID vs. FREE FORM

15.7% CTO Lift

Page 59: SAMPLE SIZE – The indispensable A/B test calculation that you’re not making

PRODUCT NAMES vs. NO PRODUCT NAMES

22.6% CTO Lift

Page 60: SAMPLE SIZE – The indispensable A/B test calculation that you’re not making

Conversion Rate

We had been making many email decisions after reaching significance

on a conversion rate lift

Page 61: SAMPLE SIZE – The indispensable A/B test calculation that you’re not making

Conversion Rate

Time for a reality check.

Page 62: SAMPLE SIZE – The indispensable A/B test calculation that you’re not making

Conversion Rate

Baseline Conversion Rate ~ 1.5%

Page 63: SAMPLE SIZE – The indispensable A/B test calculation that you’re not making

Conversion Rate

Sample Size = ½ Average # Clicks -> 6,000

Page 64: SAMPLE SIZE – The indispensable A/B test calculation that you’re not making

Conversion Rate

Page 65: SAMPLE SIZE – The indispensable A/B test calculation that you’re not making

Conversion Rate

38% is ASTRONOMICAL

Page 66: SAMPLE SIZE – The indispensable A/B test calculation that you’re not making

Conversion Rate

To get meaningful results for conversion rate, consider running an email test many times, so that

you can eventually reach the necessary sample size.

Page 67: SAMPLE SIZE – The indispensable A/B test calculation that you’re not making

Takeaways

This is the MDE curve again. Remember what this looks like.The longer you run a test, the lower MDE will be.

The more traffic volume you have, the faster MDE will drop

Page 68: SAMPLE SIZE – The indispensable A/B test calculation that you’re not making

Takeaways

For Web Testing

• If you stop your A/B tests once you reach statistical significance, you are increasing your chances of finding false positives

• Calculating sample size will give you a clear stop date and an MDE

• MDE and sample size are inversely related – The lower the MDE, the larger the sample size

• Most likely, your A/B tests need to run much longer than you realize

For Email Testing

• Use sample size to determine the size of your subject line test segments

• Your CTO tests are probably reaching the necessary sample size

• Your Conversion tests are probably not hitting sample size

Page 69: SAMPLE SIZE – The indispensable A/B test calculation that you’re not making

Sources

Kyle Rush – Mozcon 2014 Presentation

https://seomoz.box.com/shared/static/2fw6yevkkmmdumz431j4.pdf

Evan Miller – How not to run an AB test

http://www.evanmiller.org/how-not-to-run-an-ab-test.html

Page 70: SAMPLE SIZE – The indispensable A/B test calculation that you’re not making

Zack NotesDigital Marketing Manager

[email protected]

@zacknotes

slideshare.net/zacknotes1/presentations

Page 71: SAMPLE SIZE – The indispensable A/B test calculation that you’re not making

Appendix

Page 72: SAMPLE SIZE – The indispensable A/B test calculation that you’re not making

GRID vs. FREE FORM

Page 73: SAMPLE SIZE – The indispensable A/B test calculation that you’re not making

PRODUCT NAMES vs. NO PRODUCT NAMES

Page 74: SAMPLE SIZE – The indispensable A/B test calculation that you’re not making

What do you do if a test reaches sample size and your lift < MDE?

Page 75: SAMPLE SIZE – The indispensable A/B test calculation that you’re not making

You can either extend the test and accept a lower MDE or Move On.