advanced a/b testing at wix - aviran mordo and sagy rozman, wix.com
DESCRIPTION
While A/B test is a very known and familiar methodology for conducting experiments on production when you do that on a large scale it has many challenges in the organization level and operational level. At Wix we are practicing continuous delivery for over 4 years. Conducting A/B tests and writing feature toggles is at the core of our development process. However when doing so on a large scale, with over 1000 experiments every month, it holds many challenges and affect everyone in the company, from developers, product managers, QA, marketing and management. In this talk we will explain what is the lifecycle of an experiment, some of the challenges we faced and the effect on our development process. * How an experiment begins its life * How an experiment is defined * How do you let non technical people control the experiment while preventing mistakes * How an experiment go live, what is the lifecycle of an experiment from beginning to end * What is the difference between client and server experiments * How do you keep the user experience and not confuse them * How does it affect the development process * How can QA test an environment that changes every 9 minutes * How can support help users when every user may be part of different experiment * How can we find if an experiment is causing errors when you have millions of permutations [at least 2^(number of active experiments)] * What are the effects of always having multiple experiments on system architecture * What are the development patterns when working with AB test At Wix we have developed our 3rd generation experiment system called PETRI, which is (will be) open sourced, that helps us maintain some order in a chaotic system that keep changing. We will also explain how PETRI works, what are the patterns in conducting experiments that will have a minimal effect on performance and user experience.TRANSCRIPT
Experimenting on Humans
Aviran Mordo Head of Back-end Engineering
@aviranm
www.linkedin.com/in/aviran
www.aviransplace.com
Sagy Rozman Back-end Guild master
www.linkedin.com/in/sagyrozman
@sagyrozman
Wix In Numbers
• Over 55M users + 1M new users/month • Static storage is >1.5Pb of data • 3 data centers + 3 clouds (Google, Amazon, Azure) • 1.5B HTTP requests/day • 900 people work at Wix, of which ~ 300 in R&D
1542 (A/B Tests in 3 months)
• Basic A/B testing
• Experiment driven development
• PETRI – Wix’s 3rd generation open source experiment
system
• Challenges and best practices
• How to (code samples)
Agenda
11:31 A/B Test
To B or NOT to B?
A
B
Home page results (How many registered)
Experiment Driven Development
This is the Wix editor
Our gallery manager What can we improve?
Is this better?
Don’t be a loser
Product Experiments Toggles & Reporting
Infrastructure
How do you know what is running?
If I “know” it is better, do I really need to test it?
Why so many?
Sign-up Choose Template Edit site Publish Premium
The theory
Result = Fail
Intent matters
• EVERY new feature is A/B tested
• We open the new feature to a % of users ○ Measure success
○ If it is better, we keep it
○ If worse, we check why and improve
• If flawed, the impact is just for % of our users
Conclusion
Start with 50% / 50% ?
• New code can have bugs
• Conversion can drop
• Usage can drop
• Unexpected cross test dependencies
Sh*t happens (Test could fail)
• Language
• GEO
• Browser
• User-agent
• OS
Minimize affected users (in case of failure) Gradual exposure (percentage of…)
• Company employees
• User roles
• Any other criteria you have (extendable)
• All users
• First time visitors = Never visited wix.com
• New registered users = Untainted users
Not all users are equal
We need that feature
…and failure is not an option
Defensive Testing
Adding a mobile view
First trial failed
Performance had to be improved
Halting the test results in loss of data. What can we do about it?
Solution – Pause the experiment! • Maintain NEW experience for already exposed users • No additional users will be exposed to the NEW feature
PETRI’s pause implementation
• Use cookies to persist assignment ○ If user changes browser assignment is unknown
• Server side persistence solves this ○ You pay in performance & scalability
Decision
Keep feature Drop feature
Improve code & resume experiment
Keep backwards compatibility for exposed users forever?
Migrate users to another equivalent feature
Drop it all together (users lose data/work)
The road to success
• Numbers look good but sample size is small
• We need more data!
• Expand
Reaching statistical significance
25% 50% 75% 100%
75% 50% 25% 0% Control Group (A)
Test Group (B)
Keep user experience consistent
Control Group
(A)
Test Group
(B)
• Signed-in user (Editor) ○ Test group assignment is determined by the user ID ○ Guarantee toss persistency across browsers
• Anonymous user (Home page)
○ Test group assignment is randomly determined ○ Can not guarantee persistent experience if changing
browser • 11% of Wix users use more than one desktop
browser
Keeping persistent UX
There is MORE than one
# of active experiment
Possible # of states
10 1024
20 1,048,576
30 1,073,741,824
Possible states >= 2^(# experiments)
Wix has ~200 active experiments = 1.606938e+60
A/B testing introduces complexity
• Override options (URL parameters, cookies, headers…) • Near real time user BI tools
• Integrated developer tools in the product
Support tools
Define
Code
Experiment Expand
Merge code
Close
• Spec = Experiment template (in the code) ○ Define test groups ○ Mandatory limitations (filters, user types) ○ Scope = Group of related experiments (usually by product)
• Why is it needed ○ Type safety ○ Preventing human errors (typos, user types) ○ Controlled by the developer (developer knows about the context) ○ Conducting experiments in batch
Define spec
public class ExampleSpecDefinition extends SpecDefinition {
@Override protected ExperimentSpecBuilder customize(ExperimentSpecBuilder builder) {
return builder.withOwner("OWNERS_EMAIL_ADDRESS").withScopes(aScopeDefinitionForAllUserTypes(
"SOME_SCOPE")) .withTestGroups(asList("Group A", "Group B")); }}
Spec code snippet
• Experiment = “If” statement in the code
Conducting experiment
final String result = laboratory.conductExperiment(key, fallback, new StringConverter());
if (result.equals("group a")) // execute group a's logicelse if (result.equals("group b")) // execute group b's logic // in case conducting the experiment failed -
the fallback value is returned// in this case you would usually execute the
'old' logic
• Upload the specs to Petri server ○ Enables to define an experiment instance
Upload spec
{ "creationDate" : "2014-01-09T13:11:26.846Z", "updateDate" : "2014-01-09T13:11:26.846Z", "scopes" : [ { "name" : "html-editor", "onlyForLoggedInUsers" : true }, { "name" : "html-viewer","onlyForLoggedInUsers" : false } ], "testGroups" : [ "old", "new" ], "persistent" : true, "key" : "clientExperimentFullFlow1", "owner" : "" }
Start new experiment (limited population)
Manage experiment states
1. Convert A/B Test to Feature Toggle (100% ON)
2. Merge the code
3. Close the experiment
4. Remove experiment instance
Ending successful experiment
• Define spec
• Use Petri client to conduct experiment in the code (defaults to old)
• Sync spec
• Open experiment
• Manage experiment state
• End experiment
Experiment lifecycle
Petri is more than just an A/B test framework
Feature toggle
A/B Test
Personalization
Internal testing
Continuous deployment
Jira integration
Experiments
Dynamic configuration
QA
Automated testing
• Expose features internally to company employees • Enable continuous deployment with feature toggles • Select assignment by sites (not only by users) • Automatic selection of winning group* • Exposing feature to #n of users* • Integration with Jira * Planned feature
Other things we (will) do with Petri
Petri is now an open source project https://github.com/wix/petri
Q&A
Aviran Mordo Head of Back-end Engineering
@aviranm
www.linkedin.com/in/aviran
www.aviransplace.com
https://github.com/wix/petri http://goo.gl/L7pHnd
Sagy Rozman Back-end Guild master
www.linkedin.com/in/sagyrozman
@sagyrozman
Credits http://upload.wikimedia.org/wikipedia/commons/b/b2/Fiber_optics_testing.jpg http://goo.gl/nEiepT https://www.flickr.com/photos/ilo_oli/2421536836 https://www.flickr.com/photos/dexxus/5791228117 http://goo.gl/SdeJ0o https://www.flickr.com/photos/112923805@N05/15005456062 https://www.flickr.com/photos/wiertz/8537791164 https://www.flickr.com/photos/laenulfean/5943132296 https://www.flickr.com/photos/torek/3470257377 https://www.flickr.com/photos/i5design/5393934753 https://www.flickr.com/photos/argonavigo/5320119828
• Modeled experiment lifecycle
• Open source (developed using TDD from day 1)
• Running at scale on production
• No deployment necessary
• Both back-end and front-end experiment
• Flexible architecture
Why Petri
PERTI Server Your app
Laboratory
DB Logs