machine learning experimentation at sift science

47
ML Experimentation at Sift Alex Paino [email protected] Follow along at: http://go.siftscience.com/ml-experimentation 1

Upload: sift-science

Post on 24-Jan-2018

390 views

Category:

Engineering


3 download

TRANSCRIPT

Page 1: Machine Learning Experimentation at Sift Science

ML Experimentation at SiftAlex [email protected]

Follow along at: http://go.siftscience.com/ml-experimentation

1

Page 2: Machine Learning Experimentation at Sift Science

Agenda

Background

Motivation

Running experiments correctly

Comparing experiments correctly

Building tools to ensure correctness

2

Page 3: Machine Learning Experimentation at Sift Science

About Sift Science

- Abuse prevention platform powered by machine learning

- Learns in real-time

- Several abuse prevention products and counting:

3

Payment Fraud Content Abuse Promo Abuse Account Abuse

Page 4: Machine Learning Experimentation at Sift Science

About Sift Science

4

Page 5: Machine Learning Experimentation at Sift Science

Motivation - Why is this important?

1. Experiments must happen to improve an ML system

5

Page 6: Machine Learning Experimentation at Sift Science

Motivation - Why is this important?

1. Experiments must happen to improve an ML system

2. Evaluation needs to correctly identify positive changes

Evaluation as a loss function for your stack

6

Page 7: Machine Learning Experimentation at Sift Science

Motivation - Why is this important?

1. Experiments must happen to improve an ML system

2. Evaluation needs to correctly identify positive changes

Evaluation as a loss function for your stack

3. Getting this right is a subtle and tricky problem

7

Page 8: Machine Learning Experimentation at Sift Science

How do we run experiments?

8

Page 9: Machine Learning Experimentation at Sift Science

Running experiments correctly - Background

- Large delay in feedback for Sift - up to 90 days

- → offline experiments over historical data

9

Created

account

Updated credit

card info

Updated

settings

Purchased

item

Chargeback

t

90 days

Page 10: Machine Learning Experimentation at Sift Science

Running experiments correctly - Background

- Large delay in feedback for Sift - up to 90 days

- → offline experiments over historical data

- Need to simulate the online case as closely as possible

10

Created

account

Updated credit

card info

Updated

settings

Purchased

item

Chargeback

t

90 days

Page 11: Machine Learning Experimentation at Sift Science

Running experiments correctly - Lessons

Lesson: train & test set creation

- Can’t pick random splits

11

Page 12: Machine Learning Experimentation at Sift Science

Running experiments correctly - Lessons

Lesson: train & test set creation

- Can’t pick random splits

- Disjoint in time and set of users

12

Train

Test

t

users

Page 13: Machine Learning Experimentation at Sift Science

Running experiments correctly - Lessons

Lesson: train & test set creation

- Can’t pick random splits

- Disjoint in time and set of users

- Watch for class skew - ours is over 50:1 → need to downsample

13

Train

Test

t

users

Page 14: Machine Learning Experimentation at Sift Science

Running experiments correctly - Lessons

Lesson: preventing cheating

- External data sources need to be versioned

14

t

Created

account

Updated credit

card info

Login from IP

Address A

IP Address B

Known Tor

Exit Node

Tor Exit

Node DB

Login from IP

Address B

Login from IP

Address B

Transaction

Page 15: Machine Learning Experimentation at Sift Science

Running experiments correctly - Lessons

Lesson: preventing cheating

- External data sources need to be versioned

- Can’t leak groundtruth into feature vectors

15

t

Created

account

Updated credit

card info

Login from IP

Address A

IP Address B

Known Tor

Exit Node

Tor Exit

Node DB

Login from IP

Address B

Login from IP

Address B

Transaction

Page 16: Machine Learning Experimentation at Sift Science

Running experiments correctly - Lessons

Lesson: considering scores at key decision points

- Scores given for any event (e.g. user login)

16

t

Page 17: Machine Learning Experimentation at Sift Science

Running experiments correctly - Lessons

Lesson: considering scores at key decision points

- Scores given for any event (e.g. user login)

- Need to evaluate scores our customers use to

make decisions

17

t

Page 18: Machine Learning Experimentation at Sift Science

Running experiments correctly - Lessons

Lesson: parity with the online system

- Our system does online learning → so should the offline experiments

18

Page 19: Machine Learning Experimentation at Sift Science

Running experiments correctly - Lessons

Lesson: parity with the online system

- Our system does online learning → so should the offline experiments

- Reusing the same code paths

19

Page 20: Machine Learning Experimentation at Sift Science

How do we compare experiments?

20

Page 21: Machine Learning Experimentation at Sift Science

Comparing Experiments Correctly - Background

21

Customer-specific

Global

Global

Models

Sift Score

Page 22: Machine Learning Experimentation at Sift Science

Comparing Experiments Correctly - Background

22

Customer-specific (Payment Abuse)

Global (Payment Abuse)

Global (Payment Abuse)

Payment Abuse Models

Payment Abuse Score

Customer-specific (Account Abuse)

Global (Account Abuse)

Global (Account Abuse)

Account Abuse Models

Account Abuse Score

Customer-specific (Promotion Abuse)

Global (Promotion Abuse)

Global (Promotion Abuse)

Promotion Abuse Models

Promotion Abuse Score

Customer-specific (Content Abuse)

Global (Content Abuse)

Global (Content Abuse)

Content Abuse Models

Content Abuse Score

Page 23: Machine Learning Experimentation at Sift Science

Comparing Experiments Correctly - Background

23

Thousands of configurations

to evaluate!

Page 24: Machine Learning Experimentation at Sift Science

Comparing Experiments Correctly - Background

Thousands of (customer, abuse type)

combinations to evaluate

24

Page 25: Machine Learning Experimentation at Sift Science

Comparing Experiments Correctly - Background

Thousands of (customer, abuse type)

combinations to evaluate

Each with different features, models, class

skew, and noise levels

25

Page 26: Machine Learning Experimentation at Sift Science

Comparing Experiments Correctly - Background

Thousands of (customer, abuse type)

combinations to evaluate

Each with different features, models, class

skew, and noise levels

→ Need some way to consolidate these

evaluations

26

??

Page 27: Machine Learning Experimentation at Sift Science

Comparing Experiments Correctly - Lessons

Lesson: pitfalls with consolidating results

- Can’t throw all samples together → different score distributions

27

Customer 2: PerfectCustomer 1: Perfect Combined: Imperfect

+ =

Page 28: Machine Learning Experimentation at Sift Science

Comparing Experiments Correctly - Lessons

Lesson: pitfalls with consolidating results

- Can’t throw all samples together → different score distributions

- Weighted averages are tricky

28

Customer 2: PerfectCustomer 1: Perfect Combined: Imperfect

+ =

Page 29: Machine Learning Experimentation at Sift Science

Comparing Experiments Correctly - Lessons

Lesson: require statistical significance everywhere

- Examine significant differences in per-customer summary stats

29

Page 30: Machine Learning Experimentation at Sift Science

Comparing Experiments Correctly - Lessons

Lesson: require statistical significance everywhere

- Examine significant differences in per-customer summary stats

- Use confidence intervals where possible, e.g. for AUC ROC

30

http://www.med.mcgill.ca/epidemiology/hanley/software/hanley_mcneil_radiology_82.pdfhttp://www.cs.nyu.edu/~mohri/pub/area.pdf

Page 31: Machine Learning Experimentation at Sift Science

How do we ensure correctness?

31

Page 32: Machine Learning Experimentation at Sift Science

Building tools to ensure correctness

32

Page 33: Machine Learning Experimentation at Sift Science

Building tools to ensure correctness

- Big productivity win

33

Page 34: Machine Learning Experimentation at Sift Science

Building tools to ensure correctness

- Big productivity win

- Allows non-data scientists to conduct experiments safely

34

Page 35: Machine Learning Experimentation at Sift Science

Building tools to ensure correctness

- Big productivity win

- Allows non-data scientists to conduct experiments safely

- Saves the team from drawing incorrect conclusions

35

Page 36: Machine Learning Experimentation at Sift Science

Building tools to ensure correctness

- Big productivity win

- Allows non-data scientists to conduct experiments safely

- Saves the team from drawing incorrect conclusions

36

vs

Page 37: Machine Learning Experimentation at Sift Science

Building tools to ensure correctness - Examples

Example: Sift’s experiment evaluation page for high-level analysis

37

Page 38: Machine Learning Experimentation at Sift Science

Building tools to ensure correctness - Examples

Example: Sift’s experiment evaluation page for high-level analysis

38

Page 39: Machine Learning Experimentation at Sift Science

Building tools to ensure correctness - Examples

Example: Sift’s experiment evaluation page for high-level analysis

39

Page 40: Machine Learning Experimentation at Sift Science

Building tools to ensure correctness - Examples

Example: Sift’s experiment evaluation page for high-level analysis

40

ROC

Page 41: Machine Learning Experimentation at Sift Science

Building tools to ensure correctness - Examples

Example: Sift’s experiment evaluation page for high-level analysis

41

ROC Score distribution

Page 42: Machine Learning Experimentation at Sift Science

Building tools to ensure correctness - Examples

Example: Jupyter notebooks

for deep-dives

42

Page 43: Machine Learning Experimentation at Sift Science

Key Takeaways

43

Page 44: Machine Learning Experimentation at Sift Science

Key Takeaways

1. Need to carefully design experiments to remove biases

44

Page 45: Machine Learning Experimentation at Sift Science

Key Takeaways

1. Need to carefully design experiments to remove biases

2. Require statistical significance when comparing results to filter out noise

45

Page 46: Machine Learning Experimentation at Sift Science

Key Takeaways

1. Need to carefully design experiments to remove biases

2. Require statistical significance when comparing results to filter out noise

3. The right tools can help ensure all of your analyses are correct while

improving productivity

46

Page 47: Machine Learning Experimentation at Sift Science

Questions?

47