on the hardness of evading combinations of linear classifiers

On the Hardness of EvadingCombinations of Linear Classifiers

Daniel LowdUniversity of Oregon

Joint work with David Stevens

Machine learning is used more and more in adversarial domains…

• Intrusion detection• Malware detection• Phishing detection• Detecting malicious advertisements• Detecting fake reviews• Credit card fraud detection• Online auction fraud detection• Email spam filtering• Blog spam filtering• OSN spam filtering• Political censorship• …and more every year!

Evasion Attack1. System designers deploy a classifier.2. An attacker learns about the model through interaction

(and possibly other information sources).3. An attacker uses this knowledge to evade detection by

changing its behavior as little as possible.

Example: A spammer sends test emails to learn how to modify a spam so that it gets past a spam filter.

Question: How easily can the attacker learn enough to mount an effective attack?

4

Adversarial Classifier Reverse Engineering (ACRE)

Task: Find the negative instance “closest” to xa

(We will also refer to this distance as a “cost” to be minimized.)

Problem: the adversary doesn’t know the classifier!

[Lowd&Meek,’05]

X1

X2

+

-

xa

5

Adversarial Classifier Reverse Engineering (ACRE)

Task: Find the negative instance “closest” to xa

Given:

X1

X2

? ??

??

??

?-

+

– One positive and one negative instance, x+ and x

– A polynomial number of membership queries

Within a factor of k

[Lowd&Meek,’05]

xa

Example: Linear Classifiers

With continuous features and L1 distance, find optimal point by doing line search in each dimension:

However, with binary features, we can’t do line searches.X1

X2

xa

* -- Somewhat more efficient methods exist for the continuous case.[Nelson&al.,2012].

7

Attacking Linear Classifiers with Boolean features

Can efficiently find an evasion with at most twice the optimal cost, assuming unit cost for each “change”.

xa x-

wi wj wk wl wm

c(x)METHOD: Iteratively reduce cost in two ways:1. Remove any unnecessary change: O(n)

2. Replace any two changes with one: O(n3)

xa y

wi wj wk wl

c(x)

wm

x-

xa y’

wi wj wk wl

c(x)

wpAlso known: Any convex-inducing classifier with continuous features is ACRE-learnable.[Nelson&al.,2012]

[Lowd&Meek’05]

This work: We consider when the positive or negative class is an intersection of half-spaces, or polytope, representable by combinations of linear classifiers:

What about non-linear classifiers?

Positive class is conjunctionof linear classifiers.Example: One classifier to identify each legitimate user.

Positive class is disjunctionof linear classifiers.Example: One classifier foreach type of attack.

We show that the attack problem is hard in general, but easy when the half-spaces are defined over disjoint features.

Hardness Results

• With continuous features and L1 costs, near-optimal evasion of a polytope requires polynomially many queries. [Nelson et al., 2012]

• With discrete features, we show that exponentially many queries are required in the worst case.

• Proofs work for any fixed approximation ratio k.Key Idea: Construct a set of component classifiers so there is no clear path from “distant” to “close” negative instances.

Hardness of Evading Disjunctionsn/2k classifiers

n/2+1

• Two ways to evade:– Include all light-green features

(cost: n/2+1)– Include all dark-green features

(cost: n/2k)• Challenge:

– If you don’t guess all dark-green features, some classifier remains positive.

– If you include extra red features, all classifiers become positive.

• Guessing low-cost instance requires exponentially many queries!

(Instance is negative only if all component classifiers mark it as negative.)

Hardness of Evading Conjunctions

• To evade c2: Include > ½ the light-green features (cost: n/4+1)

• To evade c1: Include all dark-green features (cost: n/4k), or all light-green features (cost: n/2), or a combo.

• Two cases:– When > ½ the light-green features are included,

c2 is negative so dark-greens have no effect on the class label.

– When < ½ the light-green features are included, we need > ½ the dark-green features to evade c1.

• Adversary must guess n/8k features!

n/2

n/4k

(Instance is negative only if any component classifier marks it as negative.)

c1 c2

Restriction: Disjoint Features

• In practice, classifiers do not always represent the worst case.

• In some applications, each classifier in the set works on a different set of features:– Image or fingerprint biometrics classifiers– Separate image spam and HTML spam classifiers

• This simple restriction makes attacks easy!

Evading Disjoint Disjunctions

Theorem: Linear attack from [Lowd&Meek,2005] is at most twice optimal on disjoint disjunctions.Proof Sketch: When features are disjoint, the optimal evasion is to evade each component classifier optimally.

When the algorithm terminates, there is no way to reduce the cost with individual or pairs of changes, so each separate evasion is at most twice optimal.

(Instance is negative only if all component classifiers mark it as negative.)

xa x-

wi wj wk wl

c1(x)

xa x-

wm wn wo

c2(x)

Example:

Evading Disjoint Conjunctions

Theorem: By repeating linear attack with different constraints, we can efficiently find an attack that is at most twice optimal.Proof Sketch:• Each component classifier has some optimal evasion.• The optimal overall attack is the cheapest of these attacks.• Running the linear attack once finds a good evasion against

some classifier.• Since it’s an evasion, one classifier must be negative.• All feature changes for other classifiers can be removed.• Since no individual or pair of changes reduces the cost, this evasion is

at most twice optimal.

• By rerunning the linear attack restricted to features we haven’t used before, eventually we will find good evasions against all component classifiers.

(Instance is negative if any component classifier marks it as negative.)

Experiments

• Data: 2005 TREC spam corpus• Component classifiers: LR (SVM, NB in paper)

• Features partitioned into 3 or 5 sets:– Randomly– Spammy / Neutral / Hammy [Jorgensen et al., 2008]

• Fixed overall false negative rate to 10%.• We attempted to disguise 100 different spams.

To make this more challenging, we first added 100 random “spammy” features to each spam.

Results: Attack Cost

Results: Attack Optimality

Results: Attack EfficiencyNumber of queries before algorithms terminate:

Conjunction: ~1,000,000(Restricted: ~50,000)

Disjunction: ~10,000,000(Restricted: ~700,000)

1 million queries is not very efficient!

• The purpose of this experiment is to understand how performance depends on different factors, not the exact number of queries.

• In practice, the adversary’s job is much easier:– We added 100 spammy features to make it harder.– Additional background knowledge could make this much

easier.– Restricted vocabulary reduces queries 10x with minimal

increase in attack cost(90% of the time, still within 2x of optimal)

– Attackers don’t need guarantees of optimality.

Results: Attack EfficiencyNumber of queries before our attack is within twice optimal:

Conjunction: ~3,000 / ~100,000 Disjunction: ~10,000 / ~300,000

Attacks are even easier with background knowledge and without 100 spammy words.

Discussion and Conclusion

• Evading discrete classifiers is provably harder than evading continuous classifiers.– Linear: k-approximation vs. (1+ε)-approximation– Polytope: Exponential vs. polynomial queries

• Interesting sub-classes of discrete non-linear classifiers are still vulnerable.– Disjoint features are a sufficient condition– Open question: What other sub-classes are vulnerable?

• Conjunction (convex spam) is theoretically harder but practically easier.– In addition to worst-case bounds, we need realistic simulations that

can be applied to specific classifiers.

on the hardness of evading combinations of linear classifiers

Documents