Comparison of Bayesian spam filters

By | May 16, 2006

Spam e-mail has become an ever increasing problem, and these days it is next to impossible to use e-mail without receiving it in large amounts. Various techniques exist to combat the problem; keyword-based filters, source blacklists, signature blacklists, source verification and combinations of these to name a few. All of them have problems; keyword filters need to be constantly updated manually and are not very accurate; blacklists also need to be constantly updated, and will always lag behind spammers.

Fortunately, just as we seemed to be losing the war on spam, a new technique appeared on the scene after a paper by Paul Graham: Bayesian filters, our last, best hope for spam-free inboxes. Without going into details on how they work (more information can be found here and here), they are based on statistical methods which give a probability for an e-mail belonging to a given class (usually just two classes are used; spam and not-spam, but this is not a limitation of the technique, and indeed, POPFile supports an arbitrary number of classes).Read Full Story

Leave a Reply