slash n: technical session 7 - fraudsters are smart, frank is smarter - vivek mehta, fareed jawad
DESCRIPTION
TRANSCRIPT
28/01/13 1
Fraudsters are smart, “Frank” is Smarter
- Fareed and Vivek
2 of 22
Outline
Why detect fraud – Is there a problem?Why an intelligent system?How we built one
Show me some numbers
What was the value of all electronic transactions globally for year 2012?
$17 trillion (with a T)This includes all credit, debit and pre-paid
cards used in both online and offline (card present) scenarios for purchases and cash withdrawals
More Numbers
How much of $17T was lost due to FRAUD?$8 billion in 2012, > $10 billion by 2015 Fraud rate of 0.05% – Not too bad right?Wrong !!
Getting specific
Reminder - 0.05% ratio is for all transactions including face to face transactions
The fraud rate is a much more scary 3.5% for Online transactions aka CNP
Global e-Commerce is expected to exceed $1T in 2013 –> $3.5B will be lost due to fraud
Add to this, the erosion due to loss of future business from impacted customers
Big Customer Impact ! Big deal for us !!
The Big Fight...
Fraud to transaction ratio has been constant over the past 10 years
This ratio should not lull us into a false sense of security – bigger numbers are at stake and increasing as volumes grow
The crooks LOVE e-Commerce (think 3.5%)How do we then figure out if a transaction is
genuine or a victim of fraudIntelligently of course! - ENTER FRANK !!
Why Frank?
8 of 22
Fraud Detection System
Two partsSignals/FeaturesAlgorithm
9 of 22
Rule based system
Rules on various signalsNum of transaction from a card in last one dayTransaction amountand many more
Thresholds are hand craftedFraud Score = sum of individual scores
10 of 22
Need for Smarter system Too much data for manual analysis Businesses are evolving Fraudsters are evolving Extending to really high dimension – pushing
beyond limits of rule based system
11 of 22
Designing Frank
Labeled data missingObservation
Very few fraud records
When you see one, you can identify one
Social behavior
12 of 22
Visualization
Number of transactions in a day by a user
Total Amount
13 of 22
Visualization
14 of 22
Clustering – Centroid based
15 of 22
Clustering - Distance based
16 of 22
Clustering - Distribution based
17 of 22
Clustering – Density based
18 of 22
Density based clustering
p
qp1
EpsilonMin-ptsStatistical distance
Scale invariantCorrelation taken into
account
19 of 22
Clustering for detecting fraud
Cluster the data using density based clusteringFor new point find distance to all the existing
clustersIf there exists min-pts with epsilon dist in a
cluster, new point belongs to this clusterIf doesn't belong to any cluster -> fraud
20 of 22
Computing fraud probability
We find nearest clusterConvert the distance to probability
using chi-square distribution
Probability of fraud between 0 and 1
21 of 22
Execution
Distributed clusteringReal-time model updating< 20ms to compute fraud probabilitySuspend the payment authorization in real
time
22 of 22
We Frank, You Shop.