2012.09 a million mousetraps: using big data and little loops to build better defenses

Post on 30-May-2015

159 Views

Category:

Technology

0 Downloads

Preview:

Click to see full reader

DESCRIPTION

An examination of how behavioral analytics can be leveraged to design better defenses in complex user-facing platforms.

TRANSCRIPT

A Million Mousetraps Using Big Data and Little Loops to Build Better DefensesAllison Miller

Overview

Protecting customers on an open platform

Big data + Little loops enable automation via analytics

Decisions as defenses

Putting your data to work

the interdependent system

the porous attack surface

so, about that perimeter...

Spam !

!

Credential Theft

Malware

Bots

Account takeover Fraud

DOS

Phishing

Griefers

Scammers

The Better Mousetrap

Automates defensive action x-platform

- Fast

- Accurate

- Cheap

IN REAL TIMEIN TIME TO MINIMIZE LOSS

REASONABLE FALSE POSITIVESAS GOOD AS A HUMAN SPECIALIST

REDUCES MORE LOSS THAN COST CREATEDCHEAPER THAN MANUAL

INTERVENTION

BIG DATA &LITTLE LOOPS

BIG DATA &LITTLE LOOPS

123.123.123.123 - - [26/Apr/2000:00:23:48 -0400] "GET /pics/wpaper.gif HTTP/1.0" 200 6248 "http://www.jafsoft.com/asctortf/" "Mozilla/4.05 (Macintosh; I; PPC)"!123.123.123.123 - - [26/Apr/2000:00:23:47 -0400] "GET /asctortf/ HTTP/1.0" 200 8130 "http://search.netscape.com/Computers/Data_Formats/Document/Text/RTF" "Mozilla/4.05 (Macintosh; I; PPC)"!123.123.123.123 - - [26/Apr/2000:00:23:48 -0400] "GET /pics/5star2000.gif HTTP/1.0" 200 4005 "http://www.jafsoft.com/asctortf/" "Mozilla/4.05 (Macintosh; I; PPC)"![Tue Mar 9 22:02:41 2004] [info] created shared memory segment #10813446![Tue Mar 9 22:02:41 2004] [notice] Apache/1.3.29 (Unix) mod_ssl/2.8.16 OpenSSL/0.9.7c configured -- resuming normal operations![Tue Mar 9 22:02:41 2004] [info] Server built: Mar 7 2004 13:38:59!pausing [http://xmlrevenue.com/s.php?username=jenneypan&keywords=Online+Gambling] for 50000 ms![Tue Mar 9 22:04:16 2004] [error] [client 218.93.92.137] mod_security: Access denied with code 200. Pattern match "Basic" at HEADER.![Tue Mar 9 22:07:16 2004] [error] [client 203.121.182.190] mod_security: Invalid character detected [4]!123.123.123.123 - - [26/Apr/2000:00:23:50 -0400] "GET /pics/5star.gif HTTP/1.0" 200 1031 "http://www.jafsoft.com/asctortf/" "Mozilla/4.05 (Macintosh; I; PPC)"!123.123.123.123 - - [26/Apr/2000:00:23:51 -0400] "GET /pics/a2hlogo.jpg HTTP/1.0" 200 4282 "http://www.jafsoft.com/asctortf/" "Mozilla/4.05 (Macintosh; I; PPC)"!123.123.123.123 - - [26/Apr/2000:00:23:51 -0400] "GET /cgi-bin/newcount?jafsof3&width=4&font=digital&noshow HTTP/1.0" 200 36 "http://www.jafsoft.com/asctortf/" "Mozilla/4.05 (Macintosh; I; PPC)"![Tue Mar 9 22:02:41 2004] [notice] Accept mutex: sysvsem (Default: sysvsem)![Tue Mar 9 22:03:26 2004] [error] [client 218.93.92.137] mod_security:![Tue Mar 9 22:07:16 2004] [error] [client 203.121.182.190] mod_security: Invalid character detected [4]!123.123.123.123 - - [26/Apr/2000:00:23:50 -0400] "GET /pics/5star.gif HTTP/1.0" 200 1031 "http://www.jafsoft.com/asctortf/" "Mozilla/4.05 (Macintosh; I; PPC)"!123.123.123.123 - - [26/Apr/2000:00:23:51 -0400] "GET /pics/a2hlogo.jpg HTTP/1.0" 200 4282 "http://www.jafsoft.com/asctortf/" "Mozilla/4.05 (Macintosh; I; PPC)"!123.123.123.123 - - [26/Apr/2000:00:23:51 -0400] "GET /cgi-bin/newcount?jafsof3&width=4&font=digital&noshow HTTP/1.0" 200 36 "http://www.jafsoft.com/asctortf/" "Mozilla/4.05 (Macintosh; I; PPC)"![Tue Mar 9 22:02:41 2004] [notice] Accept mutex: sysvsem (Default: sysvsem)

BIG DATA &LITTLE LOOPS

BIG DATA &LITTLE LOOPS

* Loop Disposition: Logic, Human, or Other?

APPLIED RISK ANALYTICSUse of technology, data, research &

statistics to solve problems associated with losses or costs due to

security vulnerabilities / gaps in a system -- resulting in the deployment of optimized

detection, prevention, or response capabilities.

BRIEF TANGENT

WHAT IS THE DIFFERENCE BETWEEN RISK ANALYTICS

AND RISK METRICS?

METRICS ANALYTICS

Such as...Metrics Analytics

$ Loss Txns Purchase trends of high loss users

# Compromised Accts IP Sources of bad login attempts

% of Spam Messages Delivered

Spam subject lines generating most clicks

Minutes of downtime Most process-intensive applications

# Customer Contacts Generated

Highest-contact exception flows

YMMV

END TANGENT

Applied where?Where risks manifest in observable behavior

Where system owners make decisions

Where controls can be optimized by better recognizing identity, intent, or change

Decisions, Decisions

Authorize Block

Good false positive

Bad false negative

RESPONSE

POPULATION

Incorrect decisions have a cost Correct decisions are free (usually)

Good Action Gets

Blocked

Bad Action Gets

Through

Downstream Impacts

BIG DATA &LITTLE LOOPS

Why are you picking on me?Boo-yah! Still

getting away with it.

<Sigh> Nobody

understands me.

Such as...Populations- Users, Transactions, Messages, Packets, API calls,

Files

Actions- Allow, Block, Challenge, Review, Retry, Quarantine,

Add privileges, Upgrade privileges, Make Offer

Costs- Fraud, Data leakage, Customer churn, Customer

contacts, Downstream liability

Applying Decisions

Risk management is decision management

ACTOR ATTEMPTS

ACTIONSUBMIT

WHAT IS THE REQUEST

HOW TO HONOR THE REQUEST

SHOULD WE HONOR?

RESULT ACTION OCCURS

For example:ACTOR

ATTEMPTS PAYMENT

p (actor attempting payment is

accountholder)

Decision

Authorize

Review

Refer

Request Authentication

Decline

f(variable A + Variable B + ...)

SUBMIT

Flavors of Risk Models

I deviate significantly from a normal (good)

pattern

I summarize a known bad pattern

fa(x), fb(x), fc(x) fq(x), fr(x), fs(x)

What is normal?

http://en.wikipedia.org/wiki/Normal_distribution

WHAT IS BAD? WHAT IS GOOD?

Study history...Who

What

Where

When

Why

And then?

Study history...User IP Country

<> Billing Country

Buying prepaid mobile phones

Add new shipping address in cart

HoweverBuyer =

Phone reseller, static machine

ID

How much $$ is at risk?What is “normal” for this customer?What “bad” profiles does this match?

SHALL WE PLAY A GAME?(SINCE WE CAN’T PLAY “CLUE” FOR EVERY LOGIN

TRANSACTION NEW USER MESSAGE

FRIEND REQUEST ATTACHMENT

PACKET WINK POKE CLICK

BIT

WE BUILD RISK MODELS)

Model Development Process

Target -> Yes/No questions best

Find Data, Variable Creation -> Best part

Data Prep -> Worst part

Model Training -> Pick an algorithm

Assessment -> Catch vs FP rate

Deployment -> Decisioning vs Detection

User IP Country

<> Billing Country Buying prepaid mobile phones

Add new shipping address in cart

Buyer = Phone reseller, static machine

ID

How much $$ is at risk?What is “normal” for this customer?What “bad” profiles does this match?

GEOLOCATE IP

CONVERT GEO TO COUNTRY

CODE

FLAG ON MISMATCH

CART CATEGORY

MERCH RISK LEVEL

DATE ADDED

ADDRESS TYPE

STRING MATCHING

CUSTOMER PROFILE

DEVICE IDDEVICE HISTORYTXN-$-AMT

CHURN RISK, CLV, ...TXNS, LOGINS, ...

STOLEN CC, COLLUSION

Model TrainingSome algorithms:- Regression: Determines the best equation describe

relationship between control variable and independent variables

Linear Regression: Best equation is a lineLogistic Regression: Best equation is a curve (exponential properties)

- Bayesian: Used to estimate regression models, useful when working w/small data sets

- Neural Nets: Can approximate any type of non-linear function, often highly predictive, but doesn’t explain the relationship between control and independent variables

LOGISTIC <DEPVAR> <VAR1> <VAR2>...

P-VALUE OF SIGNIFICANCE, THROW OUT IF > .05

VARIANCE IN DEPENDENT VARIABLE EXPLAINED BY INDEPENDENT VARIABLES

DEPENDENT VARIABLE

INDEPENDENT VARIABLES

FACTOR ODDS OF DEPENDENT GO UP WHEN

INDEPENDENT VAR INCREMENTED

P-VALUE SHOULD BE < SIGNIFICANCE

LEVEL (.05)

GAIN

More gain/lift = more efficient predictions

Catch as much as possible (as much of the “bads”)

Minimize the overall affected

Target

In the end, we only hit what we aim at

And now an example

Everyone loves a good 419 scam

419 example: the 411Trigger - Contact receives 419 from a (free) business email

account, who contacts victim OOBBacktrack- Password was changed (user had to go through

reset process)- Contacts, inbox, outbox deleted- Nigerian IP login

Elaboration- “Reply-to”: changed an “i” to an “l” (same ISP)- Only takes Western Union

419 example: with love, from Abuja

What is the question? - p(ATO)- p(Spam:scam)- p(Fake acct creation)

What are our available answer/action sets?

What else can we do to detect/mitigate?

419 example: Reducing 911sVariables - “New” session variables: New login IP, new login IP country, new

cookie/machine ID- “Change” account variables: Change password, change secondary

email, change name, change public profile- “New” activity variables: Send to all contacts, # of accounts in “cc”

or “bcc”, Edit/delete contacts en masse- Association variables: New recipients, New “reply-to” fields,

“Similar” accounts created/associated (fuzzy=more difficult)User empowerment- Stronger password reset options (SMS)- Transparency: Other current sessions, past session history (IPs,

logins) - Auto-logout all other sessions upon password reset- Reporting: Details of elaboration as well as cut and paste messages

RecapProtecting customers requires understanding not just technology but also behavior. This requires:- Activity data

- Clear definitions of “good” vs “bad” results

- Constant feedback

- Analysis

Designing data-driven defenses- Decisions that can be automated w/data

- Where/what data sets to use

- Business drivers to keep in mind

An example

BIG DATA &LITTLE LOOPS

p (bad)

f(variable A + Variable B + ...)

Prediction is very difficult, especially about the future

Niels Bohr

Allison Miller @selenakyle

top related