statistical models explored and explained
Post on 19-Mar-2017
271 Views
Preview:
TRANSCRIPT
Speakers
Statistical Models, Explored and Explained
Sara Vafi, Stats Expert, OptimizelyShana Rusonis, Product Marketing, Optimizely
Today’s Speakers
Sara Vafi Shana Rusonis
Housekeeping• We’re recording!• Slides and recording will
be emailed to you tomorrow
• Time for questions at the end
Agenda• Bayesian & Frequentist Statistics • Error Control - Average vs. All Error Control• Bayes Rule• Benefits & Risks • Optimizely Stats Engine• Q&A
Why Do We Experiment?
● Experimentation is essential for learning● Try new ideas without fear of failure● Give your business a signal to act on
in a sea of noisy data
What’s most Important to You?
● Running experiments quickly● But also reporting on results accurately● When not all statistical solutions are created
equal
Types of Statistical Methods
BayesianOR
Frequentist
Bayesian Statistics● Bayesian statistics take a more bottom-up approach to data
analysis● Our parameters are unknown● The data is fixed● There is a prior probability● “Opinion-based”
“A Bayesian is one who, vaguely
expecting a horse, and catching a
glimpse of a donkey, strongly
believes he has seen a mule.”
Source
Frequentist Statistics● Frequentist arguments are more counter-factual in nature● Parameters remain constant during the repeatable sampling
process● Resemble the type of logic that lawyers use in court● ‘Is this variation different from the control?’ is a basic building
block of this approach.
Example Dan & Pete Rolling a 6-Sided DieScenario:● Pete will roll a die and the outcome can either be 1, 2, 3,
4, 5, or 6● If Pete rolls a 4, he will give Dan $1 million
If Dan was a Bayesian statistician, how would he react? If Dan was a Frequentist statistician, how would he react?
ExampleProbability of the sun exploding
Source● Frequentist, relies on
probability● Bayesian, relies on prior
knowledge
Error Control
Error Control Explained● The likelihood that the observed result of an experiment happened by
chance, rather than a change that you introduced● When we set the statistical significance on an experiment to 90%, that
means there's a 10% chance of a statistical error, or a 1 in 10 chance that the result happened by chance
Average Error Control
● Corresponds to Bayesian A/B Testing
● Less useful for iterating on test results
● Harder to learn from individual experiments with confidence
All Error Control
● Corresponds to Frequentist A/B Testing
● Any experiment will have less than a 10% chance of a mistake
● Rate of errors is 1 in 10
Average Error Control vs. All Error Control
● Average error control leads to lower accuracy for small
improvements
● All error control is accurate for all users
● There are certain cases where average error control is an
appropriate alternative
Error Rates for Experiments
Bayes Rule
Average Error Control & Bayesian A/B Testing
● Requires two sources of randomness• Randomness or “noise” in the data
• The makeup of the “typical” experiment group
● Distribution over experiment improvements
Different Beliefs in Composition of ‘Typical’ Experiments
Bayes Rule
Bayes Rule & Bayesian A/B Testing
Bayes Rule & Average Error Value
Recap Average Error Control
Bayesian A/B Testing
Prior Distributions
Bayes Rule
All Error Control is Frequentist A/B Testing
● All error control corresponds to Frequentist AB testing
● We want to aim to control the false positive rate
● Chance an experiment is either called a winner or loser
Benefits & Risks
Benefits of Bayesian A/B Testing
● Average error control can be very attractive
● Helps solve the “peeking” problem
● Average error control is fast
Risks of Bayesian A/B Testing
● It’s more appealing but it’s risky in practice
● Smaller improvement experiments with fast results = high risk
● Higher error rate than the method actually suggests
Benefits of Frequentist A/B Testing● This type of test will make fewer mistakes on experiments
with non-zero improvements ● The rate of errors will be less than 1 in 10● Option to speed up experimentation by using a prior
Learning from A/B Tests
Learning from A/B Tests
Risk Involved with Typical Realistic Experiments
Realistic Bayesian A/B Tests vs. Stats Engine
● The hardest experiments to call correctly are those with small improvements
● A/B testing in the wild is not easy● We need more and more data in order to...
So what does this mean?
Stats Engine
Stats EngineTM
Results are valid whenever you check
Avoid costly statistics errors
Measure real-time resultswith confidence
Key Takeaways
● Bayesian vs. Frequentist methods● All error control vs. average error control● Blended approach leads to greater confidence
QUESTIONS?
THANK YOU!
Appendix
Attic and button example
Attic and button example cont. In relation to all error
control
Attic and button example cont. In relation to Average error
control
How to define a Bayesian AB test *FIX THIS SLIDE*
Trade offs with Bayesian AB testingHigh improvement > low improvement
Bayesian A/B testing is average error control
Introduction slide about what topics will be covered
SARA’S SLIDES
Results are valid whenever you check
Avoid costly statistics errors
Measure real-time resultswith confidence
Stats EngineTM
top related