get deeper insights from your optimizely results
DESCRIPTION
Analyzing and interpreting your A/B testing results can be a daunting task. For this webinar, we’ve recruited experts Hudson Arnold (Optimizely’s Optimization Strategist) and Darwish Gani (Optimizely’s Results Associate Product Manager) to demonstrate the power of the Results Page and, ultimately, help you turn your data into action. Whether you’re a newcomer to the Optimizely Results page, or well-versed on the intricacies of statistical significance, we’ll cover applicable tips and best practices in the context of real test results. Topics to be discussed include: Advanced features and new functionality on the Optimizely Results page How to interpret your data and avoid common pitfalls. An in-depth look at the advantages of experience optimization for your business How to inform an effective iteration strategy to make your results actionable.TRANSCRIPT
Get Deeper Insights From Your Optimizely Results
Hudson ArnoldDarwish Gani
#OptimizelyResults
Housekeeping notes
Just a reminder:● Chat box is available for questions● There will be time for Q&A at the end● We will be recording the webinar for future viewing● All attendees will receive a copy of the slides and
recording of today’s webinar
Get Deeper Insights From Your Optimizely Results
Hudson ArnoldDarwish Gani
#OptimizelyResults
Table of Contents
● Results page feature overview
● How do I know if I have a winner?
● What do I do next?
How do I know if I have a winner?
Statistics in Optimizely Deeper Analysis
#OptimizelyResults
Terminology
Hypothesis TestStatistical Inference methodology used to determine if an experiment result was likely due to chance alone
Assume two variations to be the same. Determine ‘confidence’ that we can disprove assumption
#OptimizelyResults
Terminology
Significance Level Deals with risk of encountering a false positive.
How likely am I to declare a test inconclusive (no difference) when my variations have no difference in conversion rate
Example: 95% Significance = 5% of A/A tests will report a significant difference, when none exist
By setting a Significance Level of 95%, we are accepting a 5% False Positive Rate
#OptimizelyResults
Terminology
Power Deals with risk of encountering a false negative
Example: 80% Power = 20% of A/B tests will accurately detect a difference
By setting a Power of 80%, we are accepting a 20% False Negative Rate
Tradeoff between power of a test and the size of a difference your test can accurately detect
#OptimizelyResults
By random chance, some of your tests will be incorrectly declared as winners or losers
#OptimizelyResults
By using Chance to Beat Baseline you can control these errors
#OptimizelyResults
Chance to Beat Baseline Likelihood that the observed conversion rate improvement is not due to chance
Compare to a set significance level (95%)
Technical Note:P value: Likelihood that we saw a result this extreme, were the original and variation actually the same.
Chance to Beat Baseline = 1 - P value
#OptimizelyResults
The procedure outlined today will help you maximize the accuracy of your testing program
#OptimizelyResults
1. Set sample size a. How many visitors? Use Sample Calculatorb. Which Visitors? Ensure you use Representative Sample
2. Make a decision on the test a. Chance To Beat Baseline > 95% or < 5%b. Improvement > MDEc. Don’t make make decisions before this.d. Don’t decide to run the test longer
How to run a test
#OptimizelyResults
“Representative sample”Weekend vs Weekday Traffic? Promotions? etc..
Sample size that provide adequate powerUse Sample Size Calculator (link)
Setting the appropriate Minimal Detectable Effect (MDE)Opportunity cost of waiting to detect effects and running tests
#OptimizelyResults
At Optimizely, we only target conversion rate improvements of >12% (MDE) on our home page
Opportunity cost of running more tests vs detecting smaller differences
#OptimizelyResults
Using Chance to Beat Baseline
>95%
<5%
Winner: Variation is Better than Original
Loser: Original is Better than Variation
5-95% Inconclusive: We don’t know if the variation is better or worse.
#OptimizelyResults
Why do I need to set a sample size?● Classical Hypothesis (made in early 19th century) tests assume that there is only 1
decision point in the test● You should make calls on significance when your test is adequately powered
Why can’t I look at results before I reach my sample size?● By making decisions on a test at multiple points, you are biasing your results, giving
yourself a higher chance of finding a variation that looks like a winner or loser
Some questions you may have...
#OptimizelyResults
Change is on the way!
#OptimizelyResults
Other best practices when analyzing results
#OptimizelyResults
4 Steps for Analyzing Every Test’s Results1. Ensure your test has an appropriate sample size2. Use chart functionality3. Use segments4. Be strategic with your goals
#OptimizelyResults
Chart Functionality
Toggling charts to different views can help deliver much needed context to results analysis.
● See the volume of conversions rather than the conversion rate.
● Visualize volume of visitors, % improvement, CBB, etc.● Zoom on a given time period and view annotations.
#OptimizelyResults
SegmentsYour total conversion rate can be thought of as the average performance of many different segments.
Segments can show consistent results (a strong corroboration of total results), or can vary (a strong case for a personalization strategy).
What do I do next?
Iteration Strategy & Telling a Story
#OptimizelyResults
Difficult results
Every result leads to iteration!
#OptimizelyResults
Iterating on losers
“Failure is not fatal, but failure to change might be.”
- John Wooden
#OptimizelyResults
Iterating on losers: play again?
Try again!
You’ve identified an influential element, and succeeded in generating significant results - why was the winner better? How else could we execute this test concept?
#OptimizelyResults
Iterating on losers: move on!
Experimentation is just that; every new thing you try won’t always be better than the way it is now.
Especially if you’ve tried a given test multiple times using different executions, take a step back, and shift focus to other test ideas.
#OptimizelyResults
Iterating on inconclusive resultsEven more so than losers, inconclusive results can be confusing.
#OptimizelyResults
Inconclusive results: Making small tweaks sound alluring; low effort, much-publicized by the CRO community (Optimizely included), and dangling high ROI in front of you.
The reality is that these kind of tests do generate wins, but irregularly, and with diminishing returns as you execute more and more of them.
#OptimizelyResults
Iterating on inconclusive results: go bigger!Test more than one element at a time, and make bolder changes to each variation to tackle the problem with a greater chance of avoiding inconclusive results.
#OptimizelyResults
#OptimizelyResults
Bringing ‘Why’ to the ‘What’: Telling a Story
● Be consistent with your documented test hypotheses when analyzing and communicating results.
● Reference the data and criteria that led you to run the test in the first place, note what was changed, how it was changed, and why.
● Note any external events that occurred that you had and hadn’t planned for, such as traffic spikes from media campaigns and coincidental release cycles.
● Connect this individual result to the history and goals of the larger testing/optimization program, and the larger business goals on the line.
#OptimizelyResults
Takeaways:
● Estimating sample size in advance is essential for ensuring that your test is statistically powered.
● Take advantage of advanced features to deliver valuable context.
● Every test should feed iteration. Winning, inconclusive, and losing tests all have different paths forward.