I am about to write a few articles about not so bad technics to fight
efficiently spam, along the past years I developped some technics to fight
spam. The latest ones seems to provide a high ratio in term of efficiency it
means high quantity of spam catched and almost no false positive. I started
developping this for my own personnal domain and due to my current job expand
and enhance this for the company where I work for.
At the beginning it was quite simple because for my personnal use, I work
with thunderbird and it includes since a long time a very good spam filter
which require not so much trainning before achieving a very good filter quality
and so I didn't worried much about the quality of filtering done right on the
server by the SPAM filter.
But, alas, thunderbird (as many other opensource project btw) is not
corporate enougth and we are stuck with outlook ... The Junk filter of the
latest is rather complicated and rather unusefull. So if you want to reduce the
cries of the users about SPAM you have to find a good solution on the
server.
The technics that I'll present are built around spamassassin and bayesian
filtering, that's not revolutionnary technologies but with a fairly good
(and not complicated) and quick tuning you can acheive a very good result.
It might seems unlogical (and it is a little bit) but I'll start this serie
by an article on how to train automaticaly an already running spam filter based
on bayesian filtering, article about how to setup it will follow but a bit
later. My reason for this is that there is tons of guides on Internet on how to
setup bayes in spamassassin, whereas articles on how to train it (without the
help of the standard users feedback) are rare.
Part 1: setting a spamtrap