fighting spam enterprise spam filtering using open source tools

19
Fighting Spam Fighting Spam Enterprise Spam Filtering Enterprise Spam Filtering Using Open Source Tools Using Open Source Tools

Upload: kristian-holmes

Post on 23-Dec-2015

222 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Fighting Spam Enterprise Spam Filtering Using Open Source Tools

Fighting SpamFighting SpamEnterprise Spam Filtering Using Enterprise Spam Filtering Using

Open Source Tools Open Source Tools

Page 2: Fighting Spam Enterprise Spam Filtering Using Open Source Tools

IntroductionIntroduction

Newsflash: SPAM is a problem Newsflash: SPAM is a problem SRJC: 60-80% of mail received is Spam!SRJC: 60-80% of mail received is Spam! Commercial Solutions exist, but are Commercial Solutions exist, but are

expensiveexpensive Open Source tools are a powerful Open Source tools are a powerful

alternativealternative

Page 3: Fighting Spam Enterprise Spam Filtering Using Open Source Tools

Tonight’s AgendaTonight’s Agenda

SpamAssassin OverviewSpamAssassin Overview Additional Spam Rules (S.A.R.E.)Additional Spam Rules (S.A.R.E.) Integrating with Multiple Mail ServersIntegrating with Multiple Mail Servers Bayesian FilteringBayesian Filtering

Page 4: Fighting Spam Enterprise Spam Filtering Using Open Source Tools

SpamAssassin – How It WorksSpamAssassin – How It Works Uses the combined score from multiple types of checks Uses the combined score from multiple types of checks

to determine if a given message is spam. to determine if a given message is spam. Header tests Header tests Body phrase testsBody phrase tests Bayesian filtering Bayesian filtering Automatic address whitelist/blacklist Automatic address whitelist/blacklist Manual address whitelist/blacklist Manual address whitelist/blacklist Collaborative spam identification databases (DCC, Pyzor, Collaborative spam identification databases (DCC, Pyzor,

Razor2)Razor2) DNS Blocklists ( "RBLs" )DNS Blocklists ( "RBLs" ) Character sets and locales Character sets and locales

Even though any one of these tests might, by Even though any one of these tests might, by themselves, mis-identify a Ham or Spam, their combined themselves, mis-identify a Ham or Spam, their combined score is terribly difficult to fool.score is terribly difficult to fool.

Page 5: Fighting Spam Enterprise Spam Filtering Using Open Source Tools

SpamAssassin - AdvantagesSpamAssassin - Advantages

Wide-spectrum of different testsWide-spectrum of different tests Open Source and Free!Open Source and Free! Flexible – works with many platforms Flexible – works with many platforms

and serversand servers Easy ConfigurationEasy Configuration

Page 6: Fighting Spam Enterprise Spam Filtering Using Open Source Tools

SpamAssassin Rules EmporiumSpamAssassin Rules Emporium

http://rulesemporium.com/http://rulesemporium.com/ Popular Repository for Third Party Popular Repository for Third Party

SpamAssassin RulesSpamAssassin Rules ““Actively” Updated between Actively” Updated between

SpamAssassin releasesSpamAssassin releases

Page 7: Fighting Spam Enterprise Spam Filtering Using Open Source Tools

SARE Usage GuidelinesSARE Usage Guidelines

Just download rules into SpamAssassin Just download rules into SpamAssassin directory (i.e.: /etc/spamassassin)directory (i.e.: /etc/spamassassin)

Restart daemon if necessaryRestart daemon if necessary Most Popular Rules have “levels” (i.e.: 0 = Most Popular Rules have “levels” (i.e.: 0 =

conservative, 3 = aggressive)conservative, 3 = aggressive) Choose Rules you use carefully!Choose Rules you use carefully!

Page 8: Fighting Spam Enterprise Spam Filtering Using Open Source Tools

Rules Du JourRules Du Jour

http://www.exit0.us/index.php?pagename=http://www.exit0.us/index.php?pagename=RulesDuJourRulesDuJour

Automates updating, downloading and Automates updating, downloading and installation of most popular SARE rulesinstallation of most popular SARE rules

Page 9: Fighting Spam Enterprise Spam Filtering Using Open Source Tools

Rules Du JourRules Du Jour

Install script in $PATH (i.e.: /usr/local/sbin) and make Install script in $PATH (i.e.: /usr/local/sbin) and make executableexecutable

Create a blank configuration file at /etc/rulesdujour/config Create a blank configuration file at /etc/rulesdujour/config Add a TRUSTED_RULESETS line to your config file that Add a TRUSTED_RULESETS line to your config file that

contains the names of the rulesets you chose. i.e.:contains the names of the rulesets you chose. i.e.: TRUSTED_RULESETS="SARE_ADULT SARE_OBFU0 TRUSTED_RULESETS="SARE_ADULT SARE_OBFU0

SARE_OBFU1 SARE_URI0 SARE_URI1" SARE_OBFU1 SARE_URI0 SARE_URI1"

Configure any local settings. Examples below:Configure any local settings. Examples below: SA_DIR="/etc/mail/spamassassin" SA_DIR="/etc/mail/spamassassin" MAIL_ADDRESS="[email protected]" MAIL_ADDRESS="[email protected]" SA_RESTART="killall -HUP spamd" SA_RESTART="killall -HUP spamd"

Run this script periodically (manually or via crontab)Run this script periodically (manually or via crontab)

Page 10: Fighting Spam Enterprise Spam Filtering Using Open Source Tools

SpamAssassin Serving Multiple ServersSpamAssassin Serving Multiple Servers

Problem: Problem: How do you keep How do you keep

multiple mail multiple mail servers servers syncronized?syncronized?

Spam checking Spam checking adds load to mail adds load to mail serverserver

Page 11: Fighting Spam Enterprise Spam Filtering Using Open Source Tools

SpamAssassin Serving Multiple ServersSpamAssassin Serving Multiple Servers

Solution: Use a single Solution: Use a single machine to manage machine to manage spam sitewide!spam sitewide!

Logs, Configuration Logs, Configuration unified on a single unified on a single machinemachine

Page 12: Fighting Spam Enterprise Spam Filtering Using Open Source Tools

SA/multi-server – set up serverSA/multi-server – set up server

Server must be running SpamAssassin as Server must be running SpamAssassin as a daemon (spamd -d)a daemon (spamd -d)

Server must accept outside connections Server must accept outside connections (i.e.: spamd –A (i.e.: spamd –A 127.0.0.1,192.168.1.10,192.168.1.11)127.0.0.1,192.168.1.10,192.168.1.11)

Make sure server can listen to port 783 Make sure server can listen to port 783 (spamd’s default port)(spamd’s default port)

Page 13: Fighting Spam Enterprise Spam Filtering Using Open Source Tools

SA/multi-server – set up clientSA/multi-server – set up client

Use “spamc” command instead of Use “spamc” command instead of “spamassassin”“spamassassin”

Use switch for remote server: spamc -d Use switch for remote server: spamc -d 192.168.1.10 , and so forth …192.168.1.10 , and so forth …

Test:Test: spamc –d my.server.net < spamc –d my.server.net <

/path/to/sample/email/path/to/sample/email

Page 14: Fighting Spam Enterprise Spam Filtering Using Open Source Tools

Bayesian Filtering - IntroductionBayesian Filtering - Introduction

“Bayesian Filtering uses statistics from previously-classified messages to estimate the likelihood that a particular message is spam.”*

“This likelihood estimate is converted to a (possibly negative) weight which is added to the ad hoc spamminess score.”*

*GORDON V. CORMACK and THOMAS R. LYNAM, University of Waterloo

Page 15: Fighting Spam Enterprise Spam Filtering Using Open Source Tools

Bayes – Getting StartedBayes – Getting Started

Enable Bayes in Config: use_bayes 1Enable Bayes in Config: use_bayes 1 Put aside space for Bayes DB (either file-Put aside space for Bayes DB (either file-

based or SQL)based or SQL) bayes_path /var/local/spamassassin/bayesbayes_path /var/local/spamassassin/bayes oror bayes_store_module bayes_store_module

Mail::SpamAssassin::BayesStore::SQLMail::SpamAssassin::BayesStore::SQL

Page 16: Fighting Spam Enterprise Spam Filtering Using Open Source Tools

Bayes – Getting StartedBayes – Getting Started

Feed Bayes “ham” and “spam”Feed Bayes “ham” and “spam” You MUST feed it samples of good and bad You MUST feed it samples of good and bad

messages to start!messages to start! At least 200 samples of each, but use as At least 200 samples of each, but use as

much as possiblemuch as possible sa-learn --spam --dir sa-learn --spam --dir

/path/to/directory/full/of/spam/msgs /path/to/directory/full/of/spam/msgs sa-learn --ham --dir sa-learn --ham --dir

/path/to/directory/full/of/ham/msgs /path/to/directory/full/of/ham/msgs

Page 17: Fighting Spam Enterprise Spam Filtering Using Open Source Tools

Bayes – EnhancingBayes – Enhancing

Enable automated learning:Enable automated learning: bayes_auto_learn 1 bayes_auto_learn 1 bayes_auto_learn_threshold_nonspam 0.1 bayes_auto_learn_threshold_nonspam 0.1

bayes_auto_learn_threshold_spam 6.0bayes_auto_learn_threshold_spam 6.0

““Teach” BayesTeach” Bayes Create mailbox for “ham” and “spam” and Create mailbox for “ham” and “spam” and

scan periodicallyscan periodically Note: “Resend” email, don’t forward!Note: “Resend” email, don’t forward! You can’t overtrain the Bayes database!You can’t overtrain the Bayes database!

Page 18: Fighting Spam Enterprise Spam Filtering Using Open Source Tools

Bayes – EnhancingBayes – Enhancing

Give more “weight” to Bayesian Give more “weight” to Bayesian ResultsResults score BAYES_00 -4 score BAYES_00 -4 score BAYES_05 -2 score BAYES_05 -2 score BAYES_95 6 score BAYES_95 6 score BAYES_99 9 score BAYES_99 9

Page 19: Fighting Spam Enterprise Spam Filtering Using Open Source Tools

ConclusionConclusion

World-class Spam Prevention is Possible World-class Spam Prevention is Possible with Freely Available Tools!with Freely Available Tools!

SRJC Stats:SRJC Stats: Process 30,000 – 60,000 messages per day Process 30,000 – 60,000 messages per day

with one dual-processor serverwith one dual-processor server Most messages scanned < 10 seconds ( < 1 Most messages scanned < 10 seconds ( < 1

without network tests)without network tests) < 0.007% false positives/negatives< 0.007% false positives/negatives