statistical models explored and explained

Speakers

Statistical Models, Explored and Explained

Sara Vafi, Stats Expert, OptimizelyShana Rusonis, Product Marketing, Optimizely

Today’s Speakers

Sara Vafi Shana Rusonis

Housekeeping• We’re recording!• Slides and recording will

be emailed to you tomorrow

• Time for questions at the end

Agenda• Bayesian & Frequentist Statistics • Error Control - Average vs. All Error Control• Bayes Rule• Benefits & Risks • Optimizely Stats Engine• Q&A

Why Do We Experiment?

● Experimentation is essential for learning● Try new ideas without fear of failure● Give your business a signal to act on

in a sea of noisy data

What’s most Important to You?

● Running experiments quickly● But also reporting on results accurately● When not all statistical solutions are created

Types of Statistical Methods

BayesianOR

Frequentist

Bayesian Statistics● Bayesian statistics take a more bottom-up approach to data

analysis● Our parameters are unknown● The data is fixed● There is a prior probability● “Opinion-based”

“A Bayesian is one who, vaguely

expecting a horse, and catching a

glimpse of a donkey, strongly

believes he has seen a mule.”

Source

Frequentist Statistics● Frequentist arguments are more counter-factual in nature● Parameters remain constant during the repeatable sampling

process● Resemble the type of logic that lawyers use in court● ‘Is this variation different from the control?’ is a basic building

block of this approach.

Example Dan & Pete Rolling a 6-Sided DieScenario:● Pete will roll a die and the outcome can either be 1, 2, 3,

4, 5, or 6● If Pete rolls a 4, he will give Dan $1 million

If Dan was a Bayesian statistician, how would he react? If Dan was a Frequentist statistician, how would he react?

ExampleProbability of the sun exploding

Source● Frequentist, relies on

probability● Bayesian, relies on prior

knowledge

Error Control

Error Control Explained● The likelihood that the observed result of an experiment happened by

chance, rather than a change that you introduced● When we set the statistical significance on an experiment to 90%, that

means there's a 10% chance of a statistical error, or a 1 in 10 chance that the result happened by chance

Average Error Control

● Corresponds to Bayesian A/B Testing

● Less useful for iterating on test results

● Harder to learn from individual experiments with confidence

All Error Control

● Corresponds to Frequentist A/B Testing

● Any experiment will have less than a 10% chance of a mistake

● Rate of errors is 1 in 10

Average Error Control vs. All Error Control

● Average error control leads to lower accuracy for small

improvements

● All error control is accurate for all users

● There are certain cases where average error control is an

appropriate alternative

Error Rates for Experiments

Bayes Rule

Average Error Control & Bayesian A/B Testing

● Requires two sources of randomness• Randomness or “noise” in the data

• The makeup of the “typical” experiment group

● Distribution over experiment improvements

Different Beliefs in Composition of ‘Typical’ Experiments

Bayes Rule

Bayes Rule & Bayesian A/B Testing

Bayes Rule & Average Error Value

Recap Average Error Control

Bayesian A/B Testing

Prior Distributions

Bayes Rule

All Error Control is Frequentist A/B Testing

● All error control corresponds to Frequentist AB testing

● We want to aim to control the false positive rate

● Chance an experiment is either called a winner or loser

Benefits & Risks

Benefits of Bayesian A/B Testing

● Average error control can be very attractive

● Helps solve the “peeking” problem

● Average error control is fast

Risks of Bayesian A/B Testing

● It’s more appealing but it’s risky in practice

● Smaller improvement experiments with fast results = high risk

● Higher error rate than the method actually suggests

Benefits of Frequentist A/B Testing● This type of test will make fewer mistakes on experiments

with non-zero improvements ● The rate of errors will be less than 1 in 10● Option to speed up experimentation by using a prior

Learning from A/B Tests

Risk Involved with Typical Realistic Experiments

Realistic Bayesian A/B Tests vs. Stats Engine

● The hardest experiments to call correctly are those with small improvements

● A/B testing in the wild is not easy● We need more and more data in order to...

So what does this mean?

Stats Engine

Stats EngineTM

Results are valid whenever you check

Avoid costly statistics errors

Measure real-time resultswith confidence

Key Takeaways

● Bayesian vs. Frequentist methods● All error control vs. average error control● Blended approach leads to greater confidence

QUESTIONS?

THANK YOU!

Appendix

Attic and button example

Attic and button example cont. In relation to all error

control

Attic and button example cont. In relation to Average error

control

How to define a Bayesian AB test *FIX THIS SLIDE*

Trade offs with Bayesian AB testingHigh improvement > low improvement

Bayesian A/B testing is average error control

Introduction slide about what topics will be covered

SARA’S SLIDES

Results are valid whenever you check

Avoid costly statistics errors

Measure real-time resultswith confidence

Stats EngineTM

statistical models explored and explained

Technology

leadership styles explored

introduction to data science - github · introduction to...

ordination explored - wordpress.com

clinical trials explained and explored

information promoting explained - content marketing...

plate climatology...

effusions explored

technical symposium 109th · 2018-11-08 · auctioned to...

ordination explored!

hierarchical clustering - statistical...

ict education - a statistical overview statistics...

creativity explored dma

rework explored

statistical models in automatic speech...

employment skills explored

exploring secular factors for the lack of violent muslim...

orientalism explored

statistical molecular thermodynamics - the cramer...

tim bennett china explored, laterally india explored

core theme 2 cooperation mechanisms · implementing...