slash n: tech talk track 1 – experimentation platform - ashok banerjee

15
Experimentation Platform Ashok Banerjee

Upload: slashn

Post on 05-Dec-2014

813 views

Category:

Technology


1 download

DESCRIPTION

 

TRANSCRIPT

Page 1: Slash n: Tech Talk Track 1 – Experimentation Platform - Ashok Banerjee

Experimentation Platform

Ashok Banerjee

Page 2: Slash n: Tech Talk Track 1 – Experimentation Platform - Ashok Banerjee

Motivation

• Innovation iteration -> correct evaluation– Blindingly obvious– Clear but deductive reasoning (involved)– A/B Testing

• Segment based optimization• Multi dimensional impact and stochastic

• Incremental Radicalism

• Disclaimer: Some parts of this platform are in existence but more will come to life and we will solicit more inputs and involvement

Page 3: Slash n: Tech Talk Track 1 – Experimentation Platform - Ashok Banerjee

Experimentation Platform Components

• Bucketing (A or B)– Web Bucketing on User Cohorts– Supply Chain Bucketing on Order Basket or

Warehouse (e.g. Packing)• Control variables – what is being tested– Price– Gift Wrap– Position on Web Page– Recommendation Positioning

Page 4: Slash n: Tech Talk Track 1 – Experimentation Platform - Ashok Banerjee

Experimentation Platform

• Result variables (often studied for a week to a month)– Repeat Visit– Repeat Buy– Repeat Engagement– Spend

• Result interpretation– Z-test– T-test– Chi Squared

Page 5: Slash n: Tech Talk Track 1 – Experimentation Platform - Ashok Banerjee

Bucketing (Web)• Bucketing: Declarative Common Cohorts– User (sync): Cohorts are complex queries often run

async. If sufficiently complex e.g. • Users who bought Books with increasing spend but did not

buy electronics • User Activity Store searches, clicks, views etc. • Cached and hit at web scale

• Cohorts can be selected declaratively e.g.– Category Purchased– Search Ranking – Email Marketing– Spend slope

Page 6: Slash n: Tech Talk Track 1 – Experimentation Platform - Ashok Banerjee

Bucketing (Fulfilment)

– Order Fulfilment (async): Rules • RETE evaluation of rules: Predicates evaluate minimal

number of times 1000 rules• Async process => on the fly evaluation

– Interaction Plots need to be looked into for multiple experiments

– Exclusive buckets on control variables • e.g. 2 experiments cannot both decide on gift wrap• Price cannot be influenced by 2 different experiments

Page 7: Slash n: Tech Talk Track 1 – Experimentation Platform - Ashok Banerjee

Control Variables

• Control Variables: Configuration Based delta– Price elasticity– Position on page– Recommendation– Gift Wrap– Business Flow (e.g. in Mumbai a new Packing

technique) => BPM

Page 8: Slash n: Tech Talk Track 1 – Experimentation Platform - Ashok Banerjee

Execution

• Execution– Client Library to evaluate – if (experiment45) { ….. }– Configuration based deviators

• Better still evaluate experiment deviator e.g.• SLA = SLA - experimentDelta (experimenting with early

delivery)– experimentDelta comes from config service

Multi-armed bandit to apply the changes? 90% Greedy and 10% random

Page 9: Slash n: Tech Talk Track 1 – Experimentation Platform - Ashok Banerjee

Binomial at Large # -> Normal

• Binomial (Most human decisions) -> Normal(p + q)n = Sum(nCr prq(n-r))

Yr = nCr prq(n-r)

(Yr+1 – Yr)/Yr [Large n]

dy = -x2

Y (std dev)2

Page 10: Slash n: Tech Talk Track 1 – Experimentation Platform - Ashok Banerjee

Interaction Plot

– From Peltier Stats on OKCupid Data– Smile no interaction with eye contact– Flirty face significant interaction

Beware of interaction Between experiments

Page 11: Slash n: Tech Talk Track 1 – Experimentation Platform - Ashok Banerjee

Result Interpretation

• Result Interpretation– T-test: Samples less than 30 [Fatter tail]– Z-test: (x-m)/(std dev) = 1.95 [Normal]– Paired t-test: Return/Refund-> Gift -> Repeat Buys– Chi Squared– F test

• Do we lose anything by repeated testing until test convergence?

Page 12: Slash n: Tech Talk Track 1 – Experimentation Platform - Ashok Banerjee

Development Paradigm

– Simplify during experiment– Scalability: Build experiment to work out of memory – Availability: Fail-Open– Sharding and Database: Not big scale– Performance: In Memory for a few– Figure out control variables

Upper bound of expected results -> 90% of experiments may not need to be scaled out

Page 13: Slash n: Tech Talk Track 1 – Experimentation Platform - Ashok Banerjee

Decision Paradigm

– No code needed to test an idea– Experiments run in parallel – Need to test for interaction and main effects

Page 14: Slash n: Tech Talk Track 1 – Experimentation Platform - Ashok Banerjee

Development Paradigm

– Scalability: Build experiment to work out of memory

– Availability: Fail-Open– Sharding and Database: Not big scale– Performance: In Memory for a few nodes

Upper bound of expected results -> 90% of experiments may not need to be scaled out

Page 15: Slash n: Tech Talk Track 1 – Experimentation Platform - Ashok Banerjee

Summary

• A/B Testing Platform becomes key beyond trivially obvious

• Configuration based A/B tests (trivial to check on curiousity)

• Result interpretation is non trivial and varies