fighting spam enterprise spam filtering using open source tools
TRANSCRIPT
Fighting SpamFighting SpamEnterprise Spam Filtering Using Enterprise Spam Filtering Using
Open Source Tools Open Source Tools
IntroductionIntroduction
Newsflash: SPAM is a problem Newsflash: SPAM is a problem SRJC: 60-80% of mail received is Spam!SRJC: 60-80% of mail received is Spam! Commercial Solutions exist, but are Commercial Solutions exist, but are
expensiveexpensive Open Source tools are a powerful Open Source tools are a powerful
alternativealternative
Tonight’s AgendaTonight’s Agenda
SpamAssassin OverviewSpamAssassin Overview Additional Spam Rules (S.A.R.E.)Additional Spam Rules (S.A.R.E.) Integrating with Multiple Mail ServersIntegrating with Multiple Mail Servers Bayesian FilteringBayesian Filtering
SpamAssassin – How It WorksSpamAssassin – How It Works Uses the combined score from multiple types of checks Uses the combined score from multiple types of checks
to determine if a given message is spam. to determine if a given message is spam. Header tests Header tests Body phrase testsBody phrase tests Bayesian filtering Bayesian filtering Automatic address whitelist/blacklist Automatic address whitelist/blacklist Manual address whitelist/blacklist Manual address whitelist/blacklist Collaborative spam identification databases (DCC, Pyzor, Collaborative spam identification databases (DCC, Pyzor,
Razor2)Razor2) DNS Blocklists ( "RBLs" )DNS Blocklists ( "RBLs" ) Character sets and locales Character sets and locales
Even though any one of these tests might, by Even though any one of these tests might, by themselves, mis-identify a Ham or Spam, their combined themselves, mis-identify a Ham or Spam, their combined score is terribly difficult to fool.score is terribly difficult to fool.
SpamAssassin - AdvantagesSpamAssassin - Advantages
Wide-spectrum of different testsWide-spectrum of different tests Open Source and Free!Open Source and Free! Flexible – works with many platforms Flexible – works with many platforms
and serversand servers Easy ConfigurationEasy Configuration
SpamAssassin Rules EmporiumSpamAssassin Rules Emporium
http://rulesemporium.com/http://rulesemporium.com/ Popular Repository for Third Party Popular Repository for Third Party
SpamAssassin RulesSpamAssassin Rules ““Actively” Updated between Actively” Updated between
SpamAssassin releasesSpamAssassin releases
SARE Usage GuidelinesSARE Usage Guidelines
Just download rules into SpamAssassin Just download rules into SpamAssassin directory (i.e.: /etc/spamassassin)directory (i.e.: /etc/spamassassin)
Restart daemon if necessaryRestart daemon if necessary Most Popular Rules have “levels” (i.e.: 0 = Most Popular Rules have “levels” (i.e.: 0 =
conservative, 3 = aggressive)conservative, 3 = aggressive) Choose Rules you use carefully!Choose Rules you use carefully!
Rules Du JourRules Du Jour
http://www.exit0.us/index.php?pagename=http://www.exit0.us/index.php?pagename=RulesDuJourRulesDuJour
Automates updating, downloading and Automates updating, downloading and installation of most popular SARE rulesinstallation of most popular SARE rules
Rules Du JourRules Du Jour
Install script in $PATH (i.e.: /usr/local/sbin) and make Install script in $PATH (i.e.: /usr/local/sbin) and make executableexecutable
Create a blank configuration file at /etc/rulesdujour/config Create a blank configuration file at /etc/rulesdujour/config Add a TRUSTED_RULESETS line to your config file that Add a TRUSTED_RULESETS line to your config file that
contains the names of the rulesets you chose. i.e.:contains the names of the rulesets you chose. i.e.: TRUSTED_RULESETS="SARE_ADULT SARE_OBFU0 TRUSTED_RULESETS="SARE_ADULT SARE_OBFU0
SARE_OBFU1 SARE_URI0 SARE_URI1" SARE_OBFU1 SARE_URI0 SARE_URI1"
Configure any local settings. Examples below:Configure any local settings. Examples below: SA_DIR="/etc/mail/spamassassin" SA_DIR="/etc/mail/spamassassin" MAIL_ADDRESS="[email protected]" MAIL_ADDRESS="[email protected]" SA_RESTART="killall -HUP spamd" SA_RESTART="killall -HUP spamd"
Run this script periodically (manually or via crontab)Run this script periodically (manually or via crontab)
SpamAssassin Serving Multiple ServersSpamAssassin Serving Multiple Servers
Problem: Problem: How do you keep How do you keep
multiple mail multiple mail servers servers syncronized?syncronized?
Spam checking Spam checking adds load to mail adds load to mail serverserver
SpamAssassin Serving Multiple ServersSpamAssassin Serving Multiple Servers
Solution: Use a single Solution: Use a single machine to manage machine to manage spam sitewide!spam sitewide!
Logs, Configuration Logs, Configuration unified on a single unified on a single machinemachine
SA/multi-server – set up serverSA/multi-server – set up server
Server must be running SpamAssassin as Server must be running SpamAssassin as a daemon (spamd -d)a daemon (spamd -d)
Server must accept outside connections Server must accept outside connections (i.e.: spamd –A (i.e.: spamd –A 127.0.0.1,192.168.1.10,192.168.1.11)127.0.0.1,192.168.1.10,192.168.1.11)
Make sure server can listen to port 783 Make sure server can listen to port 783 (spamd’s default port)(spamd’s default port)
SA/multi-server – set up clientSA/multi-server – set up client
Use “spamc” command instead of Use “spamc” command instead of “spamassassin”“spamassassin”
Use switch for remote server: spamc -d Use switch for remote server: spamc -d 192.168.1.10 , and so forth …192.168.1.10 , and so forth …
Test:Test: spamc –d my.server.net < spamc –d my.server.net <
/path/to/sample/email/path/to/sample/email
Bayesian Filtering - IntroductionBayesian Filtering - Introduction
“Bayesian Filtering uses statistics from previously-classified messages to estimate the likelihood that a particular message is spam.”*
“This likelihood estimate is converted to a (possibly negative) weight which is added to the ad hoc spamminess score.”*
*GORDON V. CORMACK and THOMAS R. LYNAM, University of Waterloo
Bayes – Getting StartedBayes – Getting Started
Enable Bayes in Config: use_bayes 1Enable Bayes in Config: use_bayes 1 Put aside space for Bayes DB (either file-Put aside space for Bayes DB (either file-
based or SQL)based or SQL) bayes_path /var/local/spamassassin/bayesbayes_path /var/local/spamassassin/bayes oror bayes_store_module bayes_store_module
Mail::SpamAssassin::BayesStore::SQLMail::SpamAssassin::BayesStore::SQL
Bayes – Getting StartedBayes – Getting Started
Feed Bayes “ham” and “spam”Feed Bayes “ham” and “spam” You MUST feed it samples of good and bad You MUST feed it samples of good and bad
messages to start!messages to start! At least 200 samples of each, but use as At least 200 samples of each, but use as
much as possiblemuch as possible sa-learn --spam --dir sa-learn --spam --dir
/path/to/directory/full/of/spam/msgs /path/to/directory/full/of/spam/msgs sa-learn --ham --dir sa-learn --ham --dir
/path/to/directory/full/of/ham/msgs /path/to/directory/full/of/ham/msgs
Bayes – EnhancingBayes – Enhancing
Enable automated learning:Enable automated learning: bayes_auto_learn 1 bayes_auto_learn 1 bayes_auto_learn_threshold_nonspam 0.1 bayes_auto_learn_threshold_nonspam 0.1
bayes_auto_learn_threshold_spam 6.0bayes_auto_learn_threshold_spam 6.0
““Teach” BayesTeach” Bayes Create mailbox for “ham” and “spam” and Create mailbox for “ham” and “spam” and
scan periodicallyscan periodically Note: “Resend” email, don’t forward!Note: “Resend” email, don’t forward! You can’t overtrain the Bayes database!You can’t overtrain the Bayes database!
Bayes – EnhancingBayes – Enhancing
Give more “weight” to Bayesian Give more “weight” to Bayesian ResultsResults score BAYES_00 -4 score BAYES_00 -4 score BAYES_05 -2 score BAYES_05 -2 score BAYES_95 6 score BAYES_95 6 score BAYES_99 9 score BAYES_99 9
ConclusionConclusion
World-class Spam Prevention is Possible World-class Spam Prevention is Possible with Freely Available Tools!with Freely Available Tools!
SRJC Stats:SRJC Stats: Process 30,000 – 60,000 messages per day Process 30,000 – 60,000 messages per day
with one dual-processor serverwith one dual-processor server Most messages scanned < 10 seconds ( < 1 Most messages scanned < 10 seconds ( < 1
without network tests)without network tests) < 0.007% false positives/negatives< 0.007% false positives/negatives