Anyone who knows me well can tell you that I rather enjoy fighting spam, I have a threshold of about 1-2 messages per week before I start looking to upgrade my spam defenses. Part of my low spam tolerance is due to a few throw away addresses I use for suspicious requests for my address, but by far the bulk of the spam free inbox I enjoy is due to hard work, ingenuity, and the Open Source Community. Codefix servers employ many clever techniques, but spammers are clever as well; not surprisingly, the Spam War exhibits the same sort of technological escalation one sees in any protracted conflict.
Last year, image spam was a big problem. Spammers had realized that by putting their spam in a image file and embedding that spam image in an otherwise ham (non spam) message, they could slip through many spam checkpoints. In a move that probably made them feel extra clever, the spammers also employed some effective methods to thwart attempts to use OCR (optical character recognition) to “read” the text within the image. Like many of us, the S.A.R.E. Ninjas knew that scanning every e-mail image with OCR tools would be impractical and unwieldly; fortunately they had a better plan.
Late last year the S.A.R.E. Ninjas released the ImageInfo plugin which uses readily obtained data about embedded images to help block spam. ImageInfo examines factors such as the number and type of images, the image dimensions, and the ratio of image to text to help SpamAssassin decide if a message is spam. The net result is an effective tool for blocking image spam, such as a single GIF image urging the reader to invest in suspicious stocks, even if that image is mixed with unrelated random text.
Now that image spam is being caught, spammers have begun slapping their spam in PDF attachments. I have been toying with some ideas to block this, but tonight I see that the Ninjas are planning an update to the ImageInfo plugin that will catch PDF spam attachments. I can’t wait to start testing it.