Fighting spam part 0: Introduction
I am about to write a few articles about not so bad technics to fight efficiently spam, along the past years I developped some technics to fight spam. The latest ones seems to provide a high ratio in term of efficiency it means high quantity of spam catched and almost no false positive. I started developping this for my own personnal domain and due to my current job expand and enhance this for the company where I work for.
At the beginning it was quite simple because for my personnal use, I work with thunderbird and it includes since a long time a very good spam filter which require not so much trainning before achieving a very good filter quality and so I didn't worried much about the quality of filtering done right on the server by the SPAM filter.
But, alas, thunderbird (as many other opensource project btw) is not corporate enougth and we are stuck with outlook ... The Junk filter of the latest is rather complicated and rather unusefull. So if you want to reduce the cries of the users about SPAM you have to find a good solution on the server.
The technics that I'll present are built around spamassassin and bayesian filtering, that's not revolutionnary technologies but with a fairly good (and not complicated) and quick tuning you can acheive a very good result.
It might seems unlogical (and it is a little bit) but I'll start this serie by an article on how to train automaticaly an already running spam filter based on bayesian filtering, article about how to setup it will follow but a bit later. My reason for this is that there is tons of guides on Internet on how to setup bayes in spamassassin, whereas articles on how to train it (without the help of the standard users feedback) are rare.