learning to detect phishing emails
DESCRIPTION
Learning to Detect Phishing Emails. Report : 鄭志欣 Advisor: Hsing-Kuo Pao. I. Fette, N. Sadeh, and A. Tomasic. Learning to detect phishing emails. In Proceedings of the International World Wide Web Conference (WWW), pages 649 – 656, 2007. Outline. Introduction Method Empirical evaluation - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Learning to Detect Phishing Emails](https://reader035.vdocuments.net/reader035/viewer/2022062518/56813f63550346895daa341e/html5/thumbnails/1.jpg)
Report : 鄭志欣Advisor: Hsing-Kuo Pao
1
Learning to Detect Phishing Emails
I. Fette, N. Sadeh, and A. Tomasic. Learning to detect phishing emails. In Proceedings of the International World Wide Web Conference (WWW), pages 649–656, 2007.
![Page 2: Learning to Detect Phishing Emails](https://reader035.vdocuments.net/reader035/viewer/2022062518/56813f63550346895daa341e/html5/thumbnails/2.jpg)
Outline
2
Introduction MethodEmpirical evaluationConclusion
![Page 3: Learning to Detect Phishing Emails](https://reader035.vdocuments.net/reader035/viewer/2022062518/56813f63550346895daa341e/html5/thumbnails/3.jpg)
Introduction
3
Phishing (Spoofed websites)Stealing account informationLogon credentialsIdentity information
Phishing Problem – Hard
![Page 4: Learning to Detect Phishing Emails](https://reader035.vdocuments.net/reader035/viewer/2022062518/56813f63550346895daa341e/html5/thumbnails/4.jpg)
Method
4
PILFER – A Machine Learning based approach to classification.phishing emails / ham (good) emailsFeature Set
Features as used in email classification
![Page 5: Learning to Detect Phishing Emails](https://reader035.vdocuments.net/reader035/viewer/2022062518/56813f63550346895daa341e/html5/thumbnails/5.jpg)
Features as used in email classification
5
IP-based URLs:http://192.168.0.1/paypal.cgi?fix_account Phishing attacks are hosted off of
compromised PCs. This feature is binary.
![Page 6: Learning to Detect Phishing Emails](https://reader035.vdocuments.net/reader035/viewer/2022062518/56813f63550346895daa341e/html5/thumbnails/6.jpg)
6
Age of linked-to domain namesLegitimate-sounding domain name
Palypal.com paypal-update.com
These domains often have a limited life WHOIS query
date is within 60 days of the date the email was sent – “fresh” domain.
This is a binary feature
![Page 7: Learning to Detect Phishing Emails](https://reader035.vdocuments.net/reader035/viewer/2022062518/56813f63550346895daa341e/html5/thumbnails/7.jpg)
7
Nonmatching URLsThis is a case of a link that says paypal.com
but actually links to badsite.com.
Such a link looks like <a href="badsite.com"> paypal.com</a>.
This is a binary feature.
![Page 8: Learning to Detect Phishing Emails](https://reader035.vdocuments.net/reader035/viewer/2022062518/56813f63550346895daa341e/html5/thumbnails/8.jpg)
8
“Here” links to non-modal domain“Click here to restore your account access”
Link with the text “link”, “click”, or “here” that links to a domain other than this “modal domain”
This is a binary feature.
![Page 9: Learning to Detect Phishing Emails](https://reader035.vdocuments.net/reader035/viewer/2022062518/56813f63550346895daa341e/html5/thumbnails/9.jpg)
9
HTML emailsEmails are sent as either plain text, HTML, or
a combination of the two - multipart/alternative format.
To launch an attack without using HTML is difficult.
This is a binary feature.
![Page 10: Learning to Detect Phishing Emails](https://reader035.vdocuments.net/reader035/viewer/2022062518/56813f63550346895daa341e/html5/thumbnails/10.jpg)
10
Number of linksThe number of links present in an email.
<a> in HTML tag
This is a continuous feature.
![Page 11: Learning to Detect Phishing Emails](https://reader035.vdocuments.net/reader035/viewer/2022062518/56813f63550346895daa341e/html5/thumbnails/11.jpg)
11
Number of domainsSimply take the domain names previously
extracted from all of the links, and simply count the number of distinct domains.
Look at the “main” part of a domain https://www.cs.university.edu/ http://www.company.co.jp/
This is a continuous feature.
![Page 12: Learning to Detect Phishing Emails](https://reader035.vdocuments.net/reader035/viewer/2022062518/56813f63550346895daa341e/html5/thumbnails/12.jpg)
12
Number of dotsSubdomains like
http://www.my-bank.update.data.com.Redirection script, such as
http://www.google.com/url?q=http://www.badsite.com
This feature is simply the maximum number of dots (`.') contained in any of the links present in the email, and is a continuous feature.
![Page 13: Learning to Detect Phishing Emails](https://reader035.vdocuments.net/reader035/viewer/2022062518/56813f63550346895daa341e/html5/thumbnails/13.jpg)
13
Contains javascriptAttackers can use JavaScript to hide
information from the user, and potentially launch sophisticated attacks.
An email is flagged with the “contains javascript” feature if the string “javascript” appears in the email, regardless of whether it is actually in a <script> or <a> tag
This is a binary feature.
![Page 14: Learning to Detect Phishing Emails](https://reader035.vdocuments.net/reader035/viewer/2022062518/56813f63550346895daa341e/html5/thumbnails/14.jpg)
14
Spam-filter outputThis is a binary feature, using the trained
version of SpamAssassin with the default rule weights and threshold.
“Ham” or “Spam”This is a Binary feature.
![Page 15: Learning to Detect Phishing Emails](https://reader035.vdocuments.net/reader035/viewer/2022062518/56813f63550346895daa341e/html5/thumbnails/15.jpg)
Empirical Evaluation
15
Machine-Learning Implementation Testing Spam Assassin Datasets Additional ChallengesFalse Positives vs. False Negatives
![Page 16: Learning to Detect Phishing Emails](https://reader035.vdocuments.net/reader035/viewer/2022062518/56813f63550346895daa341e/html5/thumbnails/16.jpg)
16
Machine-Learning Implementation-PILFERFirst, run a set of scripts to extract all the
features listed.Second , we train and test a classifier using
10-fold cross validation. Random Forest (classifier)
Random forests create a number of decision trees and each decision tree is made by randomly choosing an attribute to split on at each level, and then pruning the tree.
![Page 17: Learning to Detect Phishing Emails](https://reader035.vdocuments.net/reader035/viewer/2022062518/56813f63550346895daa341e/html5/thumbnails/17.jpg)
17
• we use a random forest as a classifier.
![Page 18: Learning to Detect Phishing Emails](https://reader035.vdocuments.net/reader035/viewer/2022062518/56813f63550346895daa341e/html5/thumbnails/18.jpg)
18
Testing SpamAssassinSpamAssassin is a widely-deployed freely-
available spam filter that is highly accurate in classifying spam emails.
We classify the exact same dataset using SpamAssassin version 3.1.0, using the default thresholds and rules.
Using “Untrain” SpamAssassin “Training” on 10-fold
![Page 19: Learning to Detect Phishing Emails](https://reader035.vdocuments.net/reader035/viewer/2022062518/56813f63550346895daa341e/html5/thumbnails/19.jpg)
19
DatasetsTwo publicly available datasets.
ham corpora from the SpamAssassin project6950 non-phishing non-spam emails
Phishingcorpusapproximately 860 email messages
![Page 20: Learning to Detect Phishing Emails](https://reader035.vdocuments.net/reader035/viewer/2022062518/56813f63550346895daa341e/html5/thumbnails/20.jpg)
20
Additional ChallengesThe age of the dataset. Phishing websites are short-lived. Some of our features can therefore not be
extracted from older emails, making our tests difficult. EX: Domain linked to
![Page 21: Learning to Detect Phishing Emails](https://reader035.vdocuments.net/reader035/viewer/2022062518/56813f63550346895daa341e/html5/thumbnails/21.jpg)
Result
21
![Page 22: Learning to Detect Phishing Emails](https://reader035.vdocuments.net/reader035/viewer/2022062518/56813f63550346895daa341e/html5/thumbnails/22.jpg)
22
![Page 23: Learning to Detect Phishing Emails](https://reader035.vdocuments.net/reader035/viewer/2022062518/56813f63550346895daa341e/html5/thumbnails/23.jpg)
Conclusion
23
it is possible to detect phishing emails with high accuracy by using a specialized filter, using features that are more directly applicable to phishing emails than those employed by general purpose spam filters.
![Page 24: Learning to Detect Phishing Emails](https://reader035.vdocuments.net/reader035/viewer/2022062518/56813f63550346895daa341e/html5/thumbnails/24.jpg)
Reference
24
I. Fette, N. Sadeh, and A. Tomasic. Learning to detect phishing emails. In Proceedings of the International World Wide Web Conference (WWW), pages 649–656, 2007.
www.ics.uci.edu/.../Learning%20to%20Detect%20Phishing%20Emails.pptx
http://armorize-cht.blogspot.com/2010/01/phishing-mail.html
![Page 25: Learning to Detect Phishing Emails](https://reader035.vdocuments.net/reader035/viewer/2022062518/56813f63550346895daa341e/html5/thumbnails/25.jpg)
25