![Page 1: Fraud Detection in Real-time @ Apache Big Data Con](https://reader035.vdocuments.net/reader035/viewer/2022070603/586fa0091a28abcc238b661f/html5/thumbnails/1.jpg)
Seshika Fernando
Technical Lead
Catch them in the ActFraud Detection in Real-time
![Page 2: Fraud Detection in Real-time @ Apache Big Data Con](https://reader035.vdocuments.net/reader035/viewer/2022070603/586fa0091a28abcc238b661f/html5/thumbnails/2.jpg)
Fraud: A Trillion Dollar Problem
Survey results๏ $ 3.5 – 4 Trillion in Global Losses per year
(5% of Global GDP)
Payment Fraud Only๏ Merchants are losing around $ 250B globally๏ Cost of Fraud is around 0.68% of Revenue for Retailers
(2014)๏ Steep rise in Fraud in eCommerce (0.85% of Revenue) and
mCommerce (1.36% of Revenue) with a movement of payments to newer channels
![Page 3: Fraud Detection in Real-time @ Apache Big Data Con](https://reader035.vdocuments.net/reader035/viewer/2022070603/586fa0091a28abcc238b661f/html5/thumbnails/3.jpg)
3
Why WSO2 Analytics Platform?
Domain Knowledge
Batch Analytics
Interactive Analytics
Real-time Analytics
Predictive Analytics
Fraud Detection Toolkit
![Page 4: Fraud Detection in Real-time @ Apache Big Data Con](https://reader035.vdocuments.net/reader035/viewer/2022070603/586fa0091a28abcc238b661f/html5/thumbnails/4.jpg)
Solution: Many WaysFraud = AnomalyWe provide many methods of Anomaly Detection in order to capture known and unknown types of fraudulent behavior ๏ Generic Rules๏ Fraud Scoring๏ Advanced Techniques
Capturing anomalous behavior using mathematical modelling
![Page 5: Fraud Detection in Real-time @ Apache Big Data Con](https://reader035.vdocuments.net/reader035/viewer/2022070603/586fa0091a28abcc238b661f/html5/thumbnails/5.jpg)
5
Capturing Domain Expertise
An example from Payment Fraud Domain
Fraudsters…
๏ Use stolen cards
๏ Buy Expensive stuff
๏ In Large Quantities
๏ Very quickly
๏ At odd hours
๏ Ship to many places
๏ Provide weird email addresses
CEP Queries
![Page 6: Fraud Detection in Real-time @ Apache Big Data Con](https://reader035.vdocuments.net/reader035/viewer/2022070603/586fa0091a28abcc238b661f/html5/thumbnails/6.jpg)
Generic Rules
Convert all pre-existing knowledge about Fraudulent Behavior within a domain to Generic Rules
๏ Blacklists/Whitelists
๏ Moving Averages
๏ Known Patterns
๏ Outliers
![Page 7: Fraud Detection in Real-time @ Apache Big Data Con](https://reader035.vdocuments.net/reader035/viewer/2022070603/586fa0091a28abcc238b661f/html5/thumbnails/7.jpg)
7
Queries for Expensive Purchases
define table PremiumProducts (itemNo string);
from TransactionStream[(itemNo== PremiumProducts.itemNo) in PremiumProducts ]
select *
insert into FraudStream;
![Page 8: Fraud Detection in Real-time @ Apache Big Data Con](https://reader035.vdocuments.net/reader035/viewer/2022070603/586fa0091a28abcc238b661f/html5/thumbnails/8.jpg)
8
Queries for Large Quantities
define table QuantityAverages
(itemNo string, avgQty int, stdevQty int);
from TransactionStream
[(itemNo== av.itemNo and qty > (av.avgQty + 3 * av.stdevQty)) in QuantityAverages as av]
select *
insert into FraudStream;
![Page 9: Fraud Detection in Real-time @ Apache Big Data Con](https://reader035.vdocuments.net/reader035/viewer/2022070603/586fa0091a28abcc238b661f/html5/thumbnails/9.jpg)
9
Queries for Large Quantities (Learning)
define table QuantityAverages(itemNo string, avgQty int, stdevQty int);
from TransactionStream#window.time(8 hours) select itemNo, avg(qty) as avg, stdev(qty) as stdevgroup by itemNoupdate QuantityAverages as avon itemNo == av.itemNo;
from TransactionStream
[(itemNo== av.itemNo and qty > (av.avgQty + 3 * av.stdevQty)) in QuantityAverages as av]
select *
insert into FraudStream;
![Page 10: Fraud Detection in Real-time @ Apache Big Data Con](https://reader035.vdocuments.net/reader035/viewer/2022070603/586fa0091a28abcc238b661f/html5/thumbnails/10.jpg)
10
Queries for Transaction Velocity
from e1 = TransactionStream ->
e2 = TransactionStream[e1.cardNo == e2.cardNo] <3:>
within 5 min
select e1.cardNo, e1.txnID, e2[0].txnID, e2[1].txnID, e2[2].txnID
insert into FraudStream;
2:20
![Page 11: Fraud Detection in Real-time @ Apache Big Data Con](https://reader035.vdocuments.net/reader035/viewer/2022070603/586fa0091a28abcc238b661f/html5/thumbnails/11.jpg)
11
The False Positive Trap
๏ So what if I buy Expensive stuff
๏ And why can’t I buy a lot
๏ Very Quickly
๏ At odd hours
๏ Ship to many places
Rich guy
Gift giver
Busy man
Night owl
Many girlfriends?
Blocking genuine customers could be counter productive and costly
![Page 12: Fraud Detection in Real-time @ Apache Big Data Con](https://reader035.vdocuments.net/reader035/viewer/2022070603/586fa0091a28abcc238b661f/html5/thumbnails/12.jpg)
12
Fraud Scoring
๏ Use combinations of rules
๏ Give weights to each rule
๏ Derive a single number that reflects many fraud indicators
๏ Use a threshold to reject transactions
๏ You just bought a Diamond Ring?
๏ You bought 20 Diamond Rings, in 15 minutes at 3am from a blacklisted IP address?
![Page 13: Fraud Detection in Real-time @ Apache Big Data Con](https://reader035.vdocuments.net/reader035/viewer/2022070603/586fa0091a28abcc238b661f/html5/thumbnails/13.jpg)
13
Fraud Scoring
Score =
0.001 * itemPrice
+ 0.1 * itemQuantity
+ 2.5 * isFreeEmail
+ 5 * riskyCountry
+ 8 * suspicousIPRange
+ 5 * suspicousUsername
+ 3 * highTransactionVelocity
2:27
![Page 14: Fraud Detection in Real-time @ Apache Big Data Con](https://reader035.vdocuments.net/reader035/viewer/2022070603/586fa0091a28abcc238b661f/html5/thumbnails/14.jpg)
Learn from Data
Utilize Machine Learning Techniques to identify ‘unknown’ point anomalies
K-means Clustering
![Page 15: Fraud Detection in Real-time @ Apache Big Data Con](https://reader035.vdocuments.net/reader035/viewer/2022070603/586fa0091a28abcc238b661f/html5/thumbnails/15.jpg)
Use Markov Models to discover fraudulent behavior through rare activity sequences
Markov Models are stochastic models used to model randomly changing systems
15
Markov Models for Fraud Detection
![Page 16: Fraud Detection in Real-time @ Apache Big Data Con](https://reader035.vdocuments.net/reader035/viewer/2022070603/586fa0091a28abcc238b661f/html5/thumbnails/16.jpg)
16
Markov Modelling: Process
Classify EventsUpdate
Probability Matrix
Compare Incoming
Sequences
Probability Matrix
Events Alerts
![Page 17: Fraud Detection in Real-time @ Apache Big Data Con](https://reader035.vdocuments.net/reader035/viewer/2022070603/586fa0091a28abcc238b661f/html5/thumbnails/17.jpg)
17
Markov Model: Classification
Example:
Each transaction is classified under the following three qualities and expressed as a 3 letter token, e.g., HNN
๏ Amount spent: Low, Normal and High
๏ Whether the transaction includes high price ticket item: Normal and High
๏ Time elapsed since the last transaction: Large, Normal and Small
![Page 18: Fraud Detection in Real-time @ Apache Big Data Con](https://reader035.vdocuments.net/reader035/viewer/2022070603/586fa0091a28abcc238b661f/html5/thumbnails/18.jpg)
18
๏ Create a State Transition Probability Matrix
Markov Models: Probability Matrix
LNL LNH LNS LHL HHL HHS HNSLNL
0.976788 0.542152 0.20706 0.095459 0.007166 0.569172 0.335481LNH
0.806876 0.609425 0.188628 0.651126 0.113801 0.630711 0.099825LNS
0.07419 0.83973 0.951471 0.156532 0.12045 0.201713 0.970792LHL
0.452885 0.634071 0.328956 0.786087 0.676753 0.063064 0.225353HHL
0.386206 0.255719 0.451524 0.469597 0.810013 0.444638 0.612242HHS
0.204606 0.832722 0.043194 0.459342 0.960486 0.796382 0.34544HNS
0.757737 0.371359 0.326846 0.970243 0.771326 0.015835 0.574333
![Page 19: Fraud Detection in Real-time @ Apache Big Data Con](https://reader035.vdocuments.net/reader035/viewer/2022070603/586fa0091a28abcc238b661f/html5/thumbnails/19.jpg)
19
Markov Models: Probability Comparison
๏ Compare the probabilities of incoming transaction sequences with thresholds and flag fraud as appropriate
๏ Can use direct probabilities or more complex metrics๏ Miss Rate Metric
๏ Miss Probability Metric
๏ Entropy Reduction Metric
๏ Update Markov Probability table with incoming transactions
2:35
![Page 20: Fraud Detection in Real-time @ Apache Big Data Con](https://reader035.vdocuments.net/reader035/viewer/2022070603/586fa0091a28abcc238b661f/html5/thumbnails/20.jpg)
Dig Deeper
Access historical data using๏ expressive querying๏ easy filtering๏ useful visualizations
to isolate incidents and unearth connections
![Page 21: Fraud Detection in Real-time @ Apache Big Data Con](https://reader035.vdocuments.net/reader035/viewer/2022070603/586fa0091a28abcc238b661f/html5/thumbnails/21.jpg)
21
Usecase: Payment Fraud
Dashboard
Transactions
Transactions
Transactions
Transactions
PaymentSystem
Batch Analytics
Interactive Analytics
Real-time Analytics
Predictive Analytics
Alerts
![Page 22: Fraud Detection in Real-time @ Apache Big Data Con](https://reader035.vdocuments.net/reader035/viewer/2022070603/586fa0091a28abcc238b661f/html5/thumbnails/22.jpg)
22
Usecase: Anti Money Laundering
Dashboard
Bank Txns
Bank Txns
Bank Txns
Bank Txns
Core BankingSystem
Batch Analytics
Interactive Analytics
Real-time Analytics
Predictive Analytics
Alerts
![Page 23: Fraud Detection in Real-time @ Apache Big Data Con](https://reader035.vdocuments.net/reader035/viewer/2022070603/586fa0091a28abcc238b661f/html5/thumbnails/23.jpg)
23
Usecase: Identity Fraud
DashboardEvents
Events
Batch Analytics
Interactive Analytics
Real-time Analytics
Predictive Analytics
Alerts
2:40
![Page 24: Fraud Detection in Real-time @ Apache Big Data Con](https://reader035.vdocuments.net/reader035/viewer/2022070603/586fa0091a28abcc238b661f/html5/thumbnails/24.jpg)
Referenceso WSO2 Whitepaper on Fraud Detection:
http://wso2.com/whitepapers/fraud-detection-and-prevention-a-data-analytics-approach/
o True Cost of Fraud 2014 http://www.lexisnexis.com/risk/downloads/assets/true-cost-fraud-2014.pdf
o Stop Billions in Fraud Losses using Machine Learning https://www.forrester.com/Stop+Billions+In+Fraud+Losses+With+Machine+Learning/fulltext/-/E-res120912
o Big Data In Fraud Management: Variety Leads To Value And Improved Customer Experience https://www.forrester.com/Big+Data+In+Fraud+Management+Variety+Leads+To+Value+And+Improved+Customer+Experience/fulltext/-/E-RES103841
o Predictions 2015: Identity Management, Fraud Management, And Cybersecurity Converge https://www.forrester.com/Predictions+2015+Identity+Management+Fraud+Management+And+Cybersecurity+Converge/fulltext/-/E-RES120014
o Markov Modelling for Fraud Detection https://pkghosh.wordpress.com/2013/10/21/real-time-fraud-detection-with-sequence-mining/