Bayesian spam filtering for MT

This entry was published at least two years ago (originally posted on October 17, 2003). Since that time the information may have become outdated or my beliefs may have changed (in general, assume a more open and liberal current viewpoint). A fuller disclaimer is available.

There’s another option in the fight against weblog comment spam now that looks on first blush to be an incredibly effective solution — implementing Bayesian spam filtering on MovableType.

The advantage of this plugin over other solution (e.g. blacklist) is that after certain amount of training, it requires little and no maintence. Training is also similar then importing and exporting blacklists. In addition, it takes whitelist into consideration, not just blindly ban a host or subnet. (This is useful for those who has the misfortunate of been near a spammer). It also consider the whole content, including URLs, IPs, common words, etc into consideration.

The disadvantage is this plugin is that the AI engine is only as good as you train it. If you don’t put in some initial effort to train, it don’t work well. Secondly, if you train it wrongly, you get wrong results.

Looks quite impressive, and I’m planning on implementing it for my hosted sites once I get MT set back up again.