Download - Evade Hard Multiple Classifier Systems
SUEMA 2008
Battista Biggio, Giorgio Fumera, Fabio Roli
Pattern Recognition and Applications GroupUniversity of Cagliari, ItalyDepartment of Electrical and Electronic Engineering
R AP G
ECAI / SUEMA 2008, Patras, Greece, July 21st - 25th
Evade HardMultiple Classifier Systems
21-07-2008 Evade Hard MCSs 2SUEMA 2008
About me
• Pattern Recognition and Applications Grouphttp://prag.diee.unica.it– DIEE, University of Cagliari, Italy.
• Contact– Battista Biggio, Ph.D. student
21-07-2008 Evade Hard MCSs 3SUEMA 2008
Pattern Recognition andApplications Group
• Research interests– Methodological issues
• Multiple classifier systems• Classification reliability
– Main applications• Intrusion detection in computer networks• Multimedia document categorization, Spam filtering• Biometric authentication (fingerprint, face)• Content-based image retrieval
R AP G
21-07-2008 Evade Hard MCSs 4SUEMA 2008
Why are we working on this topic?
• MCSs are widely used in security applications,but…– Lack of theoretical motivations
• Only few theoretical works on machine learningfor adversarial classification
• Goal of this (ongoing) work– To give some theoretical background to the use of
MCSs in security applications
21-07-2008 Evade Hard MCSs 5SUEMA 2008
Outline
• Introducing the problem– Adversarial Classification
• A study on MCSs for adversarial classification– MCS hardening strategy: adding classifiers trained on
different features– A case study in spam filtering: SpamAssassin
21-07-2008 Evade Hard MCSs 6SUEMA 2008
• Adversarial classification– An intelligent adaptive adversary modifies patterns to
defeat the classifier.• e.g., spam filtering, intrusion detection systems (IDSs).
Adversarial Classification
• Goals– How to design adversary-
aware classifiers?– How to improve classifier
hardness of evasion?
Dalvi et al., Adversarial Classification, 10th ACM SIGKDD Int. Conf. 2004
21-07-2008 Evade Hard MCSs 7SUEMA 2008
Definitions
X1
X2 x
+
-
Classifier
c ∈ C, concept class(e.g., linear classifier)
C : X !{+,"}
X1
X2
Adversarialcost function
(e.g., more legiblespam is better)
W : X ! X " !Each Xi is a featureInstances, x ∈ X(e.g., emails)
X1
X2 x
Instance space
X = {X
1, ... , X
N}
• Two class problem:– Positive/malicious patterns (+)– Negative/innocent patterns (-)
Dalvi et al., 2004
21-07-2008 Evade Hard MCSs 8SUEMA 2008
Adversarial cost function• Cost is related to
– Adversary efforts• e.g., to use a different server for sending spam
– Attack effectiveness• more legible spam is better!
Example• Original spam message: BUY VIAGRA!
– Easy to be detected by classifier
• Slightly modified spam message: BU-Y V1@GR4!– It can evade classifier and be effective
• No more legible spam (uneffective message): B--Y V…!– It can evade several systems, but who will still buy viagra?
21-07-2008 Evade Hard MCSs 9SUEMA 2008
A framework foradversarial classification
• Problem formulation– Two player game: Classifier vs Adversary
• Utility and cost functions for each player• Classifier chooses a decision function C(x) at each ply• Adversary chooses a modification function A(x) to evade classifier
• Assumptions in Dalvi et al., 2004– Perfect Information
• Adversary knows the classifier’s discriminant function C(x)• Classifier knows adversary strategy A(x) for modifying patterns
– Actions• Adversary can only modify malicious patterns at operation phase
(training process is untainted)
Dalvi et al., 2004
21-07-2008 Evade Hard MCSs 10SUEMA 2008
In a nutshell
+
-
Classifier’s Task:Choose a new decisionfunction to minimise theexpected risk
+
-
Adversary’s Task:Choose minimum costmodifications toevade classifier
Lowd & Meek, Adversarial Learning, 11th ACM SIGKDD Int. Conf. 2005
21-07-2008 Evade Hard MCSs 11SUEMA 2008
Mimimum costcamouflage(s)BUY VI@GRA!
x
1
Adversary’s strategy
+ x
x
2
+ x ''+
x '''
+ x '
C(x) = + C(x) = !
Too high costcamouflage(s)B--Y V…!
BUY VIAGRA!
21-07-2008 Evade Hard MCSs 12SUEMA 2008
Classifier’s strategy
• The Classifier knows A(x) [perfect information]– Adversary-aware classifier
Dalvi et al. showed that adversary-aware classifier canperform significantly better
x
1
C(x) = + C(x) = !
x
2 + x
+ x '
x
x '
??
detected!still evades…
21-07-2008 Evade Hard MCSs 13SUEMA 2008
Goals of this work
• Analysis of a widely used strategy for hardeningMCSs
– Using different sets of heterogeneus and redundantfeatures [Giacinto et al. (2003), Perdisci et al. (2006)]
• Only heuristic and qualitative motivations havebeen given
• Using the described framework, we give moreformal explainations about the effectiveness ofthis strategy
21-07-2008 Evade Hard MCSs 14SUEMA 2008
An example of theconsidered strategy
• Biometric verification system
Decisionrule
genuineimpostor
Face
…
Voice
Fingerprint
Claimed Identity
21-07-2008 Evade Hard MCSs 15SUEMA 2008
• Spam filtering
Black/White List
URL Filter
Signature Filter
Header Analysis
Content Analysis
Σ… Assigned class
legitimatespam
Another example of theconsidered strategy
http://spamassassin.apache.org
21-07-2008 Evade Hard MCSs 16SUEMA 2008
Applying the frameworkto the spam filtering case
• Cost for Adversary
Σlegitimate
spam
s<th
false
Black/White List
Signature Filter
Text Classifier
Header Analysis
Keyword Filters…
s1 = 0.2
s2 = 0
s3 = 0
s4 = 2.5
sN = 3
s = 5.7
BUYVIAGRA!
s<5BUYVI@GR4!
sN = 0
s = 2.7
truetrue
false
Working assumption: changing “VIAGRA” to “VI@GR4” costs 3!
21-07-2008 Evade Hard MCSs 17SUEMA 2008
s = 5.7s = 3.2
s1 = 3.2
s2 = 0
s3 = 0
sN = 2.5sN = 0
s = 6.2
AFM Continues to Climb. Big News OnHorizon | UP 50 % This Week
Aerofoam Metals Inc.Symbol : AFMLPrice : $ 0.10 UP AGAINStatus : Strong Buy
Applying the frameworkto the spam filtering case
Σlegitimate
spam
s<5
false
Image Analysis…
sN+1 = 3
truetrue
false
Text is embeddedinto an image!
Black/White List
Signature Filter
Text Classifier
Header Analysis
Now both text and image classifiers must be evaded to evade the filter!
Evasion costs 2.5
Evasion costs 3.0
21-07-2008 Evade Hard MCSs 18SUEMA 2008
Forcing the adversary to surrender
• Hardening the system by adding modules canmake the evasion too costly for the adversary
– In the end, the optimal adversary strategy becomesnot fighting!
“The ultimate warrior is one who wins the war by forcing the enemy to surrender without fighting any battles”
The Art of War, Sun Tzu, 500 BC
21-07-2008 Evade Hard MCSs 19SUEMA 2008
Experimental Setup
• SpamAssassin– 619 tests– includes a text classifier (naive bayes)
• Data set: TREC 2007 spam track– 75,419 e-mails (25,220 ham - 50,199 spam).– We used the first 10K e-mails (taken in chronological
order) for training the SpamAssassin naive Bayesclassifier.
21-07-2008 Evade Hard MCSs 20SUEMA 2008
Experimental Setup
• Adversary– Cost simulated at score level
• Manhattan distance between test scores
– Maximum cost fixed• Rationale: higher cost modifications will make the spam
message no more effective/legible
• Classifier– We did not take into account the computational cost
for adding tests
• Performance measure– Expected utility
21-07-2008 Evade Hard MCSs 21SUEMA 2008
Experimental Resultsmaximum cost = 1
21-07-2008 Evade Hard MCSs 22SUEMA 2008
Experimental Resultsmaximum cost = 5
21-07-2008 Evade Hard MCSs 23SUEMA 2008
Will spammers give up?
• Spammer economics– Goal: beat enough of the filters temporarily to get a bit
of mails through and generate a quick profit– As filters accuracy increases, spammers simply send
larger quantities of spam in order to get the same bitof mails still pass through• the cost of sending spam is negligible with respect to the
achievable profit!
• Is it feasible to push the accuracy of spam filtersup to the point where only ineffective spammessages can pass through the filters?– Otherwise spammers won’t give up!
21-07-2008 Evade Hard MCSs 24SUEMA 2008
Future work
• Theory of Adversarial Classification– Extend the model to more realistic situations
• Investigating other defence strategies– We are expanding the framework to model
information hiding strategies [Barreno et al. (2006)]• Possible implementation: randomising the placement of
the decision boundary
“Keep the adversary guessing. If your strategy is a mystery, it cannot be counteracted. This gives you a significant advantage”
The Art of War, Sun Tzu, 500 BC