twitter content-based spam filtering - cisis 2013
DESCRIPTION
Presentation at CISIS 2013 International conference of the paper: Twitter Content-based Spam FilteringTRANSCRIPT
![Page 1: Twitter Content-based Spam Filtering - CISIS 2013](https://reader034.vdocuments.net/reader034/viewer/2022051514/5497a35eac7959132e8b543e/html5/thumbnails/1.jpg)
Igor Santos Igor Miñambres-Marcos Carlos Laorden Patxi Galán-García Aitor Santamaría-Ibirika Pablo G. Bringas
![Page 2: Twitter Content-based Spam Filtering - CISIS 2013](https://reader034.vdocuments.net/reader034/viewer/2022051514/5497a35eac7959132e8b543e/html5/thumbnails/2.jpg)
![Page 3: Twitter Content-based Spam Filtering - CISIS 2013](https://reader034.vdocuments.net/reader034/viewer/2022051514/5497a35eac7959132e8b543e/html5/thumbnails/3.jpg)
![Page 4: Twitter Content-based Spam Filtering - CISIS 2013](https://reader034.vdocuments.net/reader034/viewer/2022051514/5497a35eac7959132e8b543e/html5/thumbnails/4.jpg)
![Page 5: Twitter Content-based Spam Filtering - CISIS 2013](https://reader034.vdocuments.net/reader034/viewer/2022051514/5497a35eac7959132e8b543e/html5/thumbnails/5.jpg)
![Page 6: Twitter Content-based Spam Filtering - CISIS 2013](https://reader034.vdocuments.net/reader034/viewer/2022051514/5497a35eac7959132e8b543e/html5/thumbnails/6.jpg)
![Page 7: Twitter Content-based Spam Filtering - CISIS 2013](https://reader034.vdocuments.net/reader034/viewer/2022051514/5497a35eac7959132e8b543e/html5/thumbnails/7.jpg)
![Page 8: Twitter Content-based Spam Filtering - CISIS 2013](https://reader034.vdocuments.net/reader034/viewer/2022051514/5497a35eac7959132e8b543e/html5/thumbnails/8.jpg)
![Page 9: Twitter Content-based Spam Filtering - CISIS 2013](https://reader034.vdocuments.net/reader034/viewer/2022051514/5497a35eac7959132e8b543e/html5/thumbnails/9.jpg)
![Page 10: Twitter Content-based Spam Filtering - CISIS 2013](https://reader034.vdocuments.net/reader034/viewer/2022051514/5497a35eac7959132e8b543e/html5/thumbnails/10.jpg)
![Page 11: Twitter Content-based Spam Filtering - CISIS 2013](https://reader034.vdocuments.net/reader034/viewer/2022051514/5497a35eac7959132e8b543e/html5/thumbnails/11.jpg)
![Page 12: Twitter Content-based Spam Filtering - CISIS 2013](https://reader034.vdocuments.net/reader034/viewer/2022051514/5497a35eac7959132e8b543e/html5/thumbnails/12.jpg)
![Page 13: Twitter Content-based Spam Filtering - CISIS 2013](https://reader034.vdocuments.net/reader034/viewer/2022051514/5497a35eac7959132e8b543e/html5/thumbnails/13.jpg)
![Page 14: Twitter Content-based Spam Filtering - CISIS 2013](https://reader034.vdocuments.net/reader034/viewer/2022051514/5497a35eac7959132e8b543e/html5/thumbnails/14.jpg)
![Page 15: Twitter Content-based Spam Filtering - CISIS 2013](https://reader034.vdocuments.net/reader034/viewer/2022051514/5497a35eac7959132e8b543e/html5/thumbnails/15.jpg)
![Page 16: Twitter Content-based Spam Filtering - CISIS 2013](https://reader034.vdocuments.net/reader034/viewer/2022051514/5497a35eac7959132e8b543e/html5/thumbnails/16.jpg)
![Page 17: Twitter Content-based Spam Filtering - CISIS 2013](https://reader034.vdocuments.net/reader034/viewer/2022051514/5497a35eac7959132e8b543e/html5/thumbnails/17.jpg)
![Page 18: Twitter Content-based Spam Filtering - CISIS 2013](https://reader034.vdocuments.net/reader034/viewer/2022051514/5497a35eac7959132e8b543e/html5/thumbnails/18.jpg)
![Page 19: Twitter Content-based Spam Filtering - CISIS 2013](https://reader034.vdocuments.net/reader034/viewer/2022051514/5497a35eac7959132e8b543e/html5/thumbnails/19.jpg)
![Page 20: Twitter Content-based Spam Filtering - CISIS 2013](https://reader034.vdocuments.net/reader034/viewer/2022051514/5497a35eac7959132e8b543e/html5/thumbnails/20.jpg)
![Page 21: Twitter Content-based Spam Filtering - CISIS 2013](https://reader034.vdocuments.net/reader034/viewer/2022051514/5497a35eac7959132e8b543e/html5/thumbnails/21.jpg)
![Page 22: Twitter Content-based Spam Filtering - CISIS 2013](https://reader034.vdocuments.net/reader034/viewer/2022051514/5497a35eac7959132e8b543e/html5/thumbnails/22.jpg)
![Page 23: Twitter Content-based Spam Filtering - CISIS 2013](https://reader034.vdocuments.net/reader034/viewer/2022051514/5497a35eac7959132e8b543e/html5/thumbnails/23.jpg)
![Page 24: Twitter Content-based Spam Filtering - CISIS 2013](https://reader034.vdocuments.net/reader034/viewer/2022051514/5497a35eac7959132e8b543e/html5/thumbnails/24.jpg)
![Page 25: Twitter Content-based Spam Filtering - CISIS 2013](https://reader034.vdocuments.net/reader034/viewer/2022051514/5497a35eac7959132e8b543e/html5/thumbnails/25.jpg)
![Page 26: Twitter Content-based Spam Filtering - CISIS 2013](https://reader034.vdocuments.net/reader034/viewer/2022051514/5497a35eac7959132e8b543e/html5/thumbnails/26.jpg)
![Page 27: Twitter Content-based Spam Filtering - CISIS 2013](https://reader034.vdocuments.net/reader034/viewer/2022051514/5497a35eac7959132e8b543e/html5/thumbnails/27.jpg)
Detecting spammer accounts
Content-based analysis
![Page 28: Twitter Content-based Spam Filtering - CISIS 2013](https://reader034.vdocuments.net/reader034/viewer/2022051514/5497a35eac7959132e8b543e/html5/thumbnails/28.jpg)
![Page 29: Twitter Content-based Spam Filtering - CISIS 2013](https://reader034.vdocuments.net/reader034/viewer/2022051514/5497a35eac7959132e8b543e/html5/thumbnails/29.jpg)
![Page 30: Twitter Content-based Spam Filtering - CISIS 2013](https://reader034.vdocuments.net/reader034/viewer/2022051514/5497a35eac7959132e8b543e/html5/thumbnails/30.jpg)
(TweetSpike) (Legitimate)
spam ham
![Page 31: Twitter Content-based Spam Filtering - CISIS 2013](https://reader034.vdocuments.net/reader034/viewer/2022051514/5497a35eac7959132e8b543e/html5/thumbnails/31.jpg)
![Page 32: Twitter Content-based Spam Filtering - CISIS 2013](https://reader034.vdocuments.net/reader034/viewer/2022051514/5497a35eac7959132e8b543e/html5/thumbnails/32.jpg)
t1
t2
t3
m1
m2
m10
m3
m9
m4
m7
m8
m5
m11
m6
![Page 33: Twitter Content-based Spam Filtering - CISIS 2013](https://reader034.vdocuments.net/reader034/viewer/2022051514/5497a35eac7959132e8b543e/html5/thumbnails/33.jpg)
![Page 34: Twitter Content-based Spam Filtering - CISIS 2013](https://reader034.vdocuments.net/reader034/viewer/2022051514/5497a35eac7959132e8b543e/html5/thumbnails/34.jpg)
![Page 35: Twitter Content-based Spam Filtering - CISIS 2013](https://reader034.vdocuments.net/reader034/viewer/2022051514/5497a35eac7959132e8b543e/html5/thumbnails/35.jpg)
legitimate
spam
![Page 36: Twitter Content-based Spam Filtering - CISIS 2013](https://reader034.vdocuments.net/reader034/viewer/2022051514/5497a35eac7959132e8b543e/html5/thumbnails/36.jpg)
legitimate
spam
testing
probability
![Page 37: Twitter Content-based Spam Filtering - CISIS 2013](https://reader034.vdocuments.net/reader034/viewer/2022051514/5497a35eac7959132e8b543e/html5/thumbnails/37.jpg)
Dynamic Markov Chain (DMC)
Prediction by Partial Match (PPM)
![Page 38: Twitter Content-based Spam Filtering - CISIS 2013](https://reader034.vdocuments.net/reader034/viewer/2022051514/5497a35eac7959132e8b543e/html5/thumbnails/38.jpg)
![Page 39: Twitter Content-based Spam Filtering - CISIS 2013](https://reader034.vdocuments.net/reader034/viewer/2022051514/5497a35eac7959132e8b543e/html5/thumbnails/39.jpg)
Classifier Acc. Sp Sr F-Measure AUC
Random Forest N=50 96.42 0.98 0.94 0.96 0.99
DMC without Adaptation 95.99 0.96 0.95 0.96 0.99
Random Forest N=10 95.96 0.97 0.94 0.95 0.99
PPM without Adaptation 94.80 0.97 0.91 0.94 0.99
Naive Bayes Multinomial Word Frequency 94.94 0.95 0.93 0.94 0.98
Bayes K2 94.12 0.99 0.88 0.93 0.98
DMC with Adaptation 93.11 0.94 0.90 0.92 0.98
C4.5 95.79 0.98 0.92 0.95 0.97
KNN K=3 93.71 0.97 0.89 0.93 0.97
SVM PVK 95.81 0.97 0.93 0.95 0.96
PPM with Adaptation 76.50 0.78 0.69 0.72 0.86
Naive Bayes 72.72 0.64 0.89 0.75 0.76
![Page 40: Twitter Content-based Spam Filtering - CISIS 2013](https://reader034.vdocuments.net/reader034/viewer/2022051514/5497a35eac7959132e8b543e/html5/thumbnails/40.jpg)
![Page 41: Twitter Content-based Spam Filtering - CISIS 2013](https://reader034.vdocuments.net/reader034/viewer/2022051514/5497a35eac7959132e8b543e/html5/thumbnails/41.jpg)
A new and public dataset of twitter spam to serve as evaluation
Adaptation of content-based spam filtering to Twitter
A new compression-based text filtering library for the ML tool WEKA
![Page 42: Twitter Content-based Spam Filtering - CISIS 2013](https://reader034.vdocuments.net/reader034/viewer/2022051514/5497a35eac7959132e8b543e/html5/thumbnails/42.jpg)
enhance this approach using social network features
semantic capabilities by studying the linguistic relationships
![Page 43: Twitter Content-based Spam Filtering - CISIS 2013](https://reader034.vdocuments.net/reader034/viewer/2022051514/5497a35eac7959132e8b543e/html5/thumbnails/43.jpg)
![Page 44: Twitter Content-based Spam Filtering - CISIS 2013](https://reader034.vdocuments.net/reader034/viewer/2022051514/5497a35eac7959132e8b543e/html5/thumbnails/44.jpg)
![Page 45: Twitter Content-based Spam Filtering - CISIS 2013](https://reader034.vdocuments.net/reader034/viewer/2022051514/5497a35eac7959132e8b543e/html5/thumbnails/45.jpg)
1. Follow me: http://files.twiyo-magazine.com/200000231-
1dfbb1ef57/follow-me-twitter.png
2. Twitter: http://www.redunonet.co/twitter.png
3. Twitter Infography: http://expandedramblings.com/index.php/march-
2013-by-the-numbers-a-few-amazing-twitter-stats
4. Twitter news: http://techtips.biz/wp-
content/uploads/sites/9/2013/07/twitter-news.jpg
5. Customer service: http://www.parature.com/wp-
content/uploads/2012/04/customerservice_twitter.jpg
6. MUSI Deusto: https://twitter.com/MUSIDeusto
7. Gossip: http://polskilive.pl/wp-content/uploads/2013/02/bigstock-
Gossiping-Women-Retro-Clip-A-17343494.jpg
8. Cyber-bullying:
http://jodielouiseuow.files.wordpress.com/2013/05/2010-10-21-cyber-
bullies.jpg
9. Sad teddy bear: http://thumbs.dreamstime.com/x/sad-lonely-teddy-
bear-15726476.jpg
![Page 46: Twitter Content-based Spam Filtering - CISIS 2013](https://reader034.vdocuments.net/reader034/viewer/2022051514/5497a35eac7959132e8b543e/html5/thumbnails/46.jpg)
10. Spam bird: http://all4boys.ru/_pu/0/52734883.png
11. Dollars: http://vegasgravy.com/News-detail/two-women-caught-for-
transporting-drug-money-from-vegas/dollars/
12. Day 97: Infected by dustywrath:
http://www.flickr.com/photos/10921499@N07/2187318683
13. my bank sucks by B Rosen:
http://www.flickr.com/photos/rosengrant/3537904106/
14. Spam wall by freezelight:
http://www.flickr.com/photos/63056612@N00/155554663/
15. Bird with boxing gloves: http://www.fightlikeagirlclub.com/wp-
content/uploads/2010/11/Bird-with-Boxing-Gloves.png
16. Twitter media: http://media.meltybuzz.fr/article-1440806-
ajust_930/media.jpg
17. Construction bird: http://i1-news.softpedia-
static.com/images/news2/Malicious-URL-Filtering-on-Twitter-2.jpg
18. Bird in egg: http://needsomeonetoblog.com/wp-
content/uploads/2013/07/bigstock-Blue-bird-in-egg-6079257.jpg
![Page 47: Twitter Content-based Spam Filtering - CISIS 2013](https://reader034.vdocuments.net/reader034/viewer/2022051514/5497a35eac7959132e8b543e/html5/thumbnails/47.jpg)
19. Document folder:
http://www.gsstr.nl/upload/9/4/1/gsstr/documentfolder.large.jpg?0.7202
662836172612
20. ZIP: http://www.kohl.bz/fileadmin/template/ZIP.png
21. Bird in pole: http://www.microcenterblog.com/wp-
content/uploads/2013/01/Fake-or-Real-150x150.jpg
22. Bird screaming: http://www.bluewaterbrand.com/wp-
content/uploads/2013/04/168_2671597.jpg
23. Bird with sign: http://blog.retirementincomenetwork.com/wp-
content/uploads/2013/05/twitter-bird.jpg
24. Bird in lineup: http://sparkboutik.com/wp-
content/uploads/2012/01/twitterfauxpas.jpg