digital travel summit a b testing 2014-04-02

34
C O N F I D E N T I Testing and Tracking To Improve Conversion Rates Jonathan Isernhagen Digital Marketing Fundamentals Workshop Day 2 Digital Travel Summit 4/2/2014

Upload: jonathan-isernhagen

Post on 23-Jun-2015

169 views

Category:

Internet


1 download

DESCRIPTION

Presented at eTail's Digital Travel Summit in Las Vegas earlier this April. Runs through two actual A/B test examples, then covers A/B testing philosophy generally and the very different methodologies of the Eisenbergs and John Quarto Von Tivadar of "Always Be Testing" fame and Adobe consultant trainer and all-around thought leader Andrew Anderson, ultimately discussing the testing philosophy which Travelocity synthesized from these two different methods and how we implemented it in practice. Multi-armed bandit testing is briefly mentioned.

TRANSCRIPT

Page 1: Digital travel summit a b testing 2014-04-02

C O N F I D E N T I A L

Testing and Tracking To Improve Conversion Rates

Jonathan Isernhagen

Digital Marketing Fundamentals Workshop Day 2

Digital Travel Summit

4/2/2014

Page 2: Digital travel summit a b testing 2014-04-02

C O N F I D E N T I A L

Home Page search widget modification

Control

Page 3: Digital travel summit a b testing 2014-04-02

C O N F I D E N T I A L

Home Page search widget modification

Variant A

Page 4: Digital travel summit a b testing 2014-04-02

C O N F I D E N T I A L

Home Page search widget modification

Variant B

Page 5: Digital travel summit a b testing 2014-04-02

C O N F I D E N T I A L

Home Page search widget modification

Variant B

Variant C

Page 6: Digital travel summit a b testing 2014-04-02

C O N F I D E N T I A L

Home Page search widget modification

Variant B

Page 7: Digital travel summit a b testing 2014-04-02

C O N F I D E N T I A L

Hotel Detail Page “Flexible Dates” tab name

Control

Page 8: Digital travel summit a b testing 2014-04-02

C O N F I D E N T I A L

Hotel Detail Page “Flexible Dates” tab name

Variant A:Removal

Page 9: Digital travel summit a b testing 2014-04-02

C O N F I D E N T I A L

Hotel Detail Page “Flexible Dates” tab name

Variant B:“Best Dates”

Page 10: Digital travel summit a b testing 2014-04-02

C O N F I D E N T I A L

Hotel Detail Page “Flexible Dates” tab name

Variant C:“Cheapest Dates”

Page 11: Digital travel summit a b testing 2014-04-02

C O N F I D E N T I A L

Hotel Detail Page “Flexible Dates” tab name

Variant C:“Cheapest Dates”

Page 12: Digital travel summit a b testing 2014-04-02

C O N F I D E N T I A L

Agenda

1) Examples

2) Program goals

3) Test approach philosophiesa) Eisenbergb) Anderson

4) Processa) solicitationb) prioritizationc) pre-testd) testinge) reportage

5) Alternative method: the multi-armed bandit

[email protected] @jon_isernhagen www.linkedin.com/in/isernhagen

Page 13: Digital travel summit a b testing 2014-04-02

C O N F I D E N T I A L

What is A/B testing?

Comparing conversion performance between shoppers on:

• Current site (=“control”);

• Modified version(s) (=“variants”).

Highest Success / View ratio (=“conversion”) is the winner.

“Success” can be defined in many ways:

1) Some sites aim for visitor pages viewed or time-on-site

2) Transaction sites count booking completion pages

3) Sophisticated testers incorporate all revenue (even of ads)

Control page

Test variant

Completion“success”

pageOther site pages

[email protected] @jon_isernhagen www.linkedin.com/in/isernhagen

Page 14: Digital travel summit a b testing 2014-04-02

C O N F I D E N T I A L

Why do we A/B Test? Conversion Improvement

Continuous conversion improvement

subject to market competition

≈Building in Venice

Page 15: Digital travel summit a b testing 2014-04-02

C O N F I D E N T I A L

Why do we A/B Test? HIPPO* Defense

Highest Income Person’s Opinion

“If we have data, let’s look at data.

If all we have are opinions, let’s go with mine.”

– Jim Barksdale, Netscape CEO

*[email protected] @jon_isernhagen www.linkedin.com/in/isernhagen

Page 16: Digital travel summit a b testing 2014-04-02

C O N F I D E N T I A L

Why do we A/B test? Causality is Optional

“A British ship’s captain observed the lack of scurvy among sailors serving on the naval ships of Mediterranean countries, where citrus fruit was part of their rations…

He then gave half his crew limes (the Treatment group) while the other half (the Control group) continued with their regular diet...

While the captain did not realize that scurvy is a consequence of vitamin C deficiency, and that limes are rich in vitamin C, the intervention worked.” –

-Ron Kohavi, Practical Guide to Controlled Experiments on the Web

[email protected] @jon_isernhagen www.linkedin.com/in/isernhagen

Page 17: Digital travel summit a b testing 2014-04-02

C O N F I D E N T I A L

Agenda

1) Examples

2) Program goal

3) Test approach philosophiesa) Eisenbergb) Anderson

4) Processa) solicitationb) prioritizationc) pre-testd) testinge) reportage

5) Alternative method: the multi-armed bandit

[email protected] @jon_isernhagen www.linkedin.com/in/isernhagen

Page 18: Digital travel summit a b testing 2014-04-02

C O N F I D E N T I A L

What should we test?

Travelocity relied heavily on “Always be Testing”

Contains laundry list of every combination of testable elements on web pages.

Good starting point for test brainstorming.

Page 19: Digital travel summit a b testing 2014-04-02

C O N F I D E N T I A L

Approaches: Eisenbergs (Monetate)

[email protected] @jon_isernhagen www.linkedin.com/in/isernhagen

Credentials: author of “Always Be Testing,” lecturer, consultant

Philosophy:

• understand shopper psychology;

• sustain “scent;”

• note where shoppers bail;

• write formal hypotheses/draw wireframe, and;

• use matrix to prioritize.

Pace: high volume: 30+/month

Reporting: record results for historical memory/learning

Rigor: 80-95% confidence threshold, extensive path analysis

Page 20: Digital travel summit a b testing 2014-04-02

C O N F I D E N T I A L

Credentials: top consultant of top A/B test firm, 300+ clients gained average 10-40% conversion improvement using his methodology.

Philosophy:

• cares about the shopper’s actions, not thoughts

• test with open mind, no preconceived notions

• start with page element elimination or MVT,

then test permutations

• prioritize:– low-converting high-traffic or – high-conversion-correlated pages

Pace: limited only by sequential testing, 10-14 days/test

Reporting: five metrics max, revenue per visitor, no path reporting

Rigor: 80% confidence threshold, 5% rise, frequent re-testing.

Approaches: Andrew Anderson (Adobe)

[email protected] @jon_isernhagen www.linkedin.com/in/isernhagen

Page 21: Digital travel summit a b testing 2014-04-02

C O N F I D E N T I A L

Comparing the methodologies

Topic areaEisenberg / Monetate

Adobe/Andrew

Sabre Research / Travelocity

Test prioritization 5 x 5 x 5 product Page element Effort/impact

Customer intent Profile/channel Ignore Understand

Design bias Univariate Multivariate Univariate A/B/C…

Contamination Embrace/ignore Acknowledge Accept sometimes

Negative lift Unfortunate Interesting Unfortunate

“Micro-conversion” Exciting Worthless Evaluating

Prone to find Occasional big wins Constant small wins 1 win per 7-10 tests

Test ends Eyeball assessment Eyeball assessment Precalculated size

Error risk High Medium Low

[email protected] @jon_isernhagen www.linkedin.com/in/isernhagen

Page 22: Digital travel summit a b testing 2014-04-02

C O N F I D E N T I A L

Agenda

1) Examples

2) Program goal

3) Test approach philosophiesa) Eisenbergb) Anderson

4) Processa) solicitationb) prioritizationc) pre-testd) testinge) reportage

5) Alternative method: the multi-armed bandit

[email protected] @jon_isernhagen www.linkedin.com/in/isernhagen

Page 23: Digital travel summit a b testing 2014-04-02

C O N F I D E N T I A L

Process: test suggestion database

• Test name

• Page name

• Page abbreviation

• Site section

• Date submitted

• Description

• Wireframe

• Effort expected

• Uplift expected

• Submitter

• Current owner

• Status

• Status change comments

Page 24: Digital travel summit a b testing 2014-04-02

C O N F I D E N T I A L

Process: solicitation

• Request everyone’s ideas (there is no monopoly on good ones)– Direct them to the online test database (include the link in the message)

• Ask them to:– do filtered search for similar ideas on same page– complete the form as completely as possible, including• Rate (from 1-5) the business improvement they expect, and expected effort• Include a wireframe picture of the change, even if it’s a scanned crayon drawing• Describe the effect they’re expecting

[email protected] @jon_isernhagen www.linkedin.com/in/isernhagen

Page 25: Digital travel summit a b testing 2014-04-02

C O N F I D E N T I A L

Process: prioritization

1) Schedule periodic prioritization meetings

2) For each page on the site:a) Review recent test results and how they affect your thinkingb) Multiply projected impact numbers * Ease of implementation numbersc) Rank in descending order.d) Select the next wave of tests

3) For each site section/pathwaya) Sample conversion and traffic volume as applies to prioritized test pagesb) Decide what sensitivity/confidence /variant count you wish to specifyc) Calculate sample size and minimum test lengthd) Adjust test parameters as appropriate

4) Evaluate selected tests’ potential to interfere with each other

5) Ratify final test set and test parameters

6) Assign the developers and testers their tasks

[email protected] @jon_isernhagen www.linkedin.com/in/isernhagen

Page 26: Digital travel summit a b testing 2014-04-02

C O N F I D E N T I A L

Reject null Accept null

Null Type I error Right decision

Alternative Right decision Type II error

Decision

Truth

Avoiding errors

Accepting a bad variant.

Testing doesn’t reveal The Truth, it enables us to make accurate guesses about the truth. Those guesses can be wrong in one of two ways:

Rejecting a good variant.

[email protected] @jon_isernhagen www.linkedin.com/in/isernhagen

Page 27: Digital travel summit a b testing 2014-04-02

C O N F I D E N T I A L

Test length for statistical significance

Sample size = 2 * Z^2 * Conversion * (1 - Conversion)

(Conversion * Change)^2

• Change: ….the smaller the lift you want to detect

• Confidence: …the greater the confidence you want to have

• Conversion:…the closer the page’s conversion is to 50%

• Contamination: …the purer you want the results to be. If you let experiments re-use each others’ traffic, you can get more data faster.

[email protected] @jon_isernhagen www.linkedin.com/in/isernhagen

You have to test longer…

Page 28: Digital travel summit a b testing 2014-04-02

C O N F I D E N T I A L

Test length: additional considerations

Beyond statistical minimum sample size, there are other test-sizing factors to consider:

• Cyclicality: shoppers often demonstrate different conversion behavior on different days of week, to a different degree across variants. Inoculate your test by stopping it on a multiple of 7 days.

• Re-shop: the benefits of a superior variant may not express themselves during the same session. Know the average shop-to-book incubation period for the tested page and run several weeks longer.

• Interface shock: as above, the control has a home court advantage by virtue of being well-known to past visitors. Even superior variants play at a disadvantage until shoppers become accustomed.

[email protected] @jon_isernhagen www.linkedin.com/in/isernhagen

Page 29: Digital travel summit a b testing 2014-04-02

C O N F I D E N T I A L

Process: pre-test

1) Ensure all variants have been fully tested

2) Announce forthcoming tests, with special attention to:a) Help Deskb) Site Health

3) Re-communicate your reporting procedure:a) All results and your interpretations will be immediately posted to databaseb) Interim test results are not scientific and will not be:

i. Subject to early speculationii. Cause to prematurely terminate tests or extend them beyond agreed interval

[email protected] @jon_isernhagen www.linkedin.com/in/isernhagen

Page 30: Digital travel summit a b testing 2014-04-02

C O N F I D E N T I A L

During the tests

• Keep an eye on site metrics generally and where your tests touch

• Avoid “premature exclamation”– Observe, but do not report on, the test variants’ performance– Remember that measures of statistical confidence are valid only in the exact

moment you pre-determined to end the test.

Page 31: Digital travel summit a b testing 2014-04-02

C O N F I D E N T I A L

Process: reportage

Once each test has run to its pre-determined conclusion trigger:

1) Copy the results from test tool into the database

2) Complete the after-test fields, including:a) What we conclude from this test, especially if results are significant?b) What (if any) concerns we have about its accuracy?c) What follow-on tests and actions you recommend?

3) Decide whether the results are interesting/controversial enough to warrant calling a meeting and/or distributing an explanatory .ppt.

4) Periodically review the relative performance of the various test groups in your site metrics tool if it permits selection that way.

[email protected] @jon_isernhagen www.linkedin.com/in/isernhagen

Page 32: Digital travel summit a b testing 2014-04-02

C O N F I D E N T I A L

Agenda

1) Examples

2) Program goal

3) Test approach philosophiesa) Eisenbergb) Anderson

4) Processa) solicitationb) prioritizationc) pre-testd) testinge) reportage

5) Alternative test method: the multi-armed bandit

[email protected] @jon_isernhagen www.linkedin.com/in/isernhagen

Page 33: Digital travel summit a b testing 2014-04-02

C O N F I D E N T I A L

Testing alternatives: Multi-armed bandit method

Like a hive of bees that always reconnoiter but sends most bees to known gardens.

[email protected] @jon_isernhagen www.linkedin.com/in/isernhagen

Page 34: Digital travel summit a b testing 2014-04-02

C O N F I D E N T I A L

Take-aways

1) Commission a company-wide online test idea database

2) Be clear about your testing methodology:a) document your hypotheses, be able to tell a narrativeb) know what error risks you’re exposing yourself toc) quick tests with low confidence/sensitivity can be okay if followed up

3) Ignore your test tool vendor repa) “Let’s test longer to see if it goes to confidence.” = disqualifying statementb) The test tool readout is accurate only on the pre-chosen test day.

4) Consider multi-armed bandit optimization

[email protected] @jon_isernhagen www.linkedin.com/in/isernhagen