is machine learning useful for fraud prevention?

19
Introduction Expert Driven approach Data Driven approach Tools Conclusion Is Machine Learning useful for Fraud prevention? Andrea Dal Pozzolo 22/07/2015 1/ 18

Upload: andrea-dal-pozzolo

Post on 15-Aug-2015

100 views

Category:

Technology


1 download

TRANSCRIPT

Page 1: Is Machine learning useful for Fraud Prevention?

Introduction Expert Driven approach Data Driven approach Tools Conclusion

Is Machine Learning useful for Fraudprevention?

Andrea Dal Pozzolo

22/07/2015

1/ 18

Page 2: Is Machine learning useful for Fraud Prevention?

Introduction Expert Driven approach Data Driven approach Tools Conclusion

INTRODUCTION

I Frauds are old as the human race.I They follow the money, e.g. credit cards are well-know for

being targeted by fraudulent activities.I We witness a growing presence of frauds on online

transactions.I Need of automatic systems able to detect and fight

fraudsters.

2/ 18

Page 3: Is Machine learning useful for Fraud Prevention?

Introduction Expert Driven approach Data Driven approach Tools Conclusion

THE PROBLEM

Fraud detection is notably a challenging problem because:I Fraud strategies change in time, as well as customers’

spending habits evolve.I Few examples of frauds available, so it is hard to model

fraudulent behaviour.I Not all frauds are reported or reported with large delay.I Few transactions can be timely investigated.

3/ 18

Page 4: Is Machine learning useful for Fraud Prevention?

Introduction Expert Driven approach Data Driven approach Tools Conclusion

THE PROBLEM II

With the large number of transactions we witness everyday:I We cannot ask human analyst to check every transactions

one by one.I We wish to automatise to detection of fraudulent

transaction.I We want accurate predictions, i.e. minimise missed frauds

and false alarms.

Two standard approaches for FD:I Expert DrivenI Data Driven

4/ 18

Page 5: Is Machine learning useful for Fraud Prevention?

Introduction Expert Driven approach Data Driven approach Tools Conclusion

EXPERT DRIVEN APPROACH

A straightforward approach to automatise detection is to definerules that exploit fraud expert knowledge.

I E.g. IF transaction amount > e 10’000 & Betting websiteTHEN Class = FRAUD

5/ 18

Page 6: Is Machine learning useful for Fraud Prevention?

Introduction Expert Driven approach Data Driven approach Tools Conclusion

CASE STUDY

Rule: IF N Trans > 80 ANDTot Amt > 2000 THEN fraud

Rule: ?? We can learn this bymeans of Machine Learning

6/ 18

Page 7: Is Machine learning useful for Fraud Prevention?

Introduction Expert Driven approach Data Driven approach Tools Conclusion

EXPERT DRIVEN PROS & CONS

ProsI Easy to develop.I Easy to understand.I Explain why an alert was

generated.I Exploit Domain Expert

knowledge.

ConsI Subjective (Ask 7 experts,

get 7 opinions).I Hard boundaries.I Difficulties thinking in

more than 3 dimensions.I Detect only easy

correlations betweenvariables and frauds.

I Able to detect only knownfraudulent strategies.

I Become obsolete soon(fraud evolution).

7/ 18

Page 8: Is Machine learning useful for Fraud Prevention?

Introduction Expert Driven approach Data Driven approach Tools Conclusion

DATA DRIVEN APPROACH

Use Machine Learning to learn automatically rules able to findfraudulent patterns.

I E.g. COUNTRY=USA & LANGUAGE=EN &HAD TEST=TRUE & NB TX>10 & GENDER=MALE &AGE> 50 & ONLINE=TRUE & AMOUNT>1000 &BANK=XXX THEN fraud

8/ 18

Page 9: Is Machine learning useful for Fraud Prevention?

Introduction Expert Driven approach Data Driven approach Tools Conclusion

WHAT’S MACHINE LEARNING?I The design of algorithms that discover patterns in a

collection of data instances in an automated manner.I The goal is to use the discovered patterns to make

predictions on new data.

Figure : Training

What is Machine Learning?

The design of computational systems that discover patterns in a collectionof data instances in an automated manner.

The ultimate goal is to use the discovered patterns to make predictions onnew data instances not seen before.

Instead of manually encoding patterns in computer programs, we makecomputers learn these patterns without explicitly programming them .

Figure source [Hinton et al. 2006].

2

Figure : Testing

Instead of manually encoding patterns in computer programs,we make computers learn these patterns without explicitlyprogramming them.9/ 18

Page 10: Is Machine learning useful for Fraud Prevention?

Introduction Expert Driven approach Data Driven approach Tools Conclusion

MACHINE LEARNING PROS & CONS

ProsI Learn complex fraudulent

pattern (use all features).I Can ingest large volumes

of data.I Optimally model complex

shapes.I Predict new types of fraud.I Adapt to changing

distribution (fraudevolution).

ConsI Need enough samples.I Some models are black box

(not interpretable byinvestigators)

10/ 18

Page 11: Is Machine learning useful for Fraud Prevention?

Introduction Expert Driven approach Data Driven approach Tools Conclusion

IMPLEMENTATION

Implementation steps:1. Feature engineering (i.e. enriching the data using in-house

information and external sources)2. Transaction aggregation (create new features to model

customer behaviour)3. Train a ML model on the data and use it to predict new

transactions.4. Integrate feedbacks from investigators to improve the

detection.

11/ 18

Page 12: Is Machine learning useful for Fraud Prevention?

Introduction Expert Driven approach Data Driven approach Tools Conclusion

CHOOSING THE ALGORITHM

I Thousands of ML algorithms available.I The best one does not exist (No-free lunch theorem).I However, some have better performances under certain

conditions.I Several studies have reported that Random Forest [3] is the

most accurate for fraud detection [8, 2, 5, 7, 4, 1].

12/ 18

Page 13: Is Machine learning useful for Fraud Prevention?

Introduction Expert Driven approach Data Driven approach Tools Conclusion

RANDOM FORESTI Ensemble of decision trees (combination of >100 models).I Robust to irrelevant feature.I Easy to scale with Bid Data architecture (e.g. Hadoop).I Return feature relevance.I Rule extraction is possible.

Figure : Decision Tree: predict play/not play based on weatherconditions.

13/ 18

Page 14: Is Machine learning useful for Fraud Prevention?

Introduction Expert Driven approach Data Driven approach Tools Conclusion

MACHINE LEARNING TOOLSWhich software should I use? R [6] appears to be the standardbetween data scientist.

(a) kdnuggets survey 2015 (b) Rexer Analytics survey 2013

(c) Software used inKaagle data analysiscompetitions in 201114/ 18

Page 15: Is Machine learning useful for Fraud Prevention?

Introduction Expert Driven approach Data Driven approach Tools Conclusion

WHY R?I Open source (free) & developed by academics.I Almost all ML algorithms implemented. 1

I Microsoft, Amazon, IBM, SAP and many others haveMachine Learning solutions based on R.

1http://cran.r-project.org/web/views/MachineLearning.html15/ 18

Page 16: Is Machine learning useful for Fraud Prevention?

Introduction Expert Driven approach Data Driven approach Tools Conclusion

WORRIED ABOUT R SUPPORT?I Huge community of R-users.I Most books/manuals available are free.I Several R-consulting companies.

Figure : Software popularity on statistically-oriented forums.

16/ 18

Page 17: Is Machine learning useful for Fraud Prevention?

Introduction Expert Driven approach Data Driven approach Tools Conclusion

CONCLUSION

I Machine Learning can efficiently support fraud detection.I ML allows to automatise detection and reaction to frauds.I Expert Driven and Data Driven approaches have both pros

and cons.I ML is not going to replace Expert Driven rules, but it

allows to reduce False Positive.I Random Forest is often the most accurate model for FD.I I recommend to use the R software.

17/ 18

Page 18: Is Machine learning useful for Fraud Prevention?

Introduction Expert Driven approach Data Driven approach Tools Conclusion

Web: www.ulb.ac.be/di/map/adalpozzEmail: [email protected]

Thank you for the attention

18/ 18

Page 19: Is Machine learning useful for Fraud Prevention?

Introduction Expert Driven approach Data Driven approach Tools Conclusion

BIBLIOGRAPHY[1] A. C. Bahnsen, A. Stojanovic, D. Aouada, and B. Ottersten.

Cost sensitive credit card fraud detection using bayes minimum risk.In Machine Learning and Applications (ICMLA), 2013 12th International Conference on, volume 1, pages333–338. IEEE, 2013.

[2] S. Bhattacharyya, S. Jha, K. Tharakunnel, and J. C. Westland.Data mining for credit card fraud: A comparative study.Decision Support Systems, 50(3):602–613, 2011.

[3] L. Breiman.Random forests.Machine learning, 45(1):5–32, 2001.

[4] A. Dal Pozzolo, G. Boracchi, O. Caelen, C. Alippi, and G. Bontempi.Credit card fraud detection and concept-drift adaptation with delayed supervised information.In Neural Networks (IJCNN), 2015 International Joint Conference on. IEEE, 2015.

[5] A. Dal Pozzolo, O. Caelen, Y.-A. Le Borgne, S. Waterschoot, and G. Bontempi.Learned lessons in credit card fraud detection from a practitioner perspective.Expert Systems with Applications, 41(10):4915–4928, 2014.

[6] R Core Team.R: A Language and Environment for Statistical Computing.R Foundation for Statistical Computing, Vienna, Austria, 2015.

[7] V. Van Vlasselaer, C. Bravo, O. Caelen, T. Eliassi-Rad, L. Akoglu, M. Snoeck, and B. Baesens.Apate: A novel approach for automated credit card transaction fraud detection using network-basedextensions.Decision Support Systems, 2015.

[8] C. Whitrow, D. J. Hand, P. Juszczak, D. Weston, and N. M. Adams.Transaction aggregation as a strategy for credit card fraud detection.Data Mining and Knowledge Discovery, 18(1):30–55, 2009.

19/ 18