PDF module for SpamAssassin
July 19, 2007
With the recent torrent of PDF spam, we created a module for SpamAssassin that allows for the scanning of PDF files. The module, linked below this post, works in the following way:
- Email bodies are scanned upon connection, and checked for PDF attachments.
- Text is extracted from the PDF via pdftotext, and scanned by SpamAssassin.
- Should the PDF contain images, the gocr binary is called to extract the text content.
- The total spam score of the PDF is compared against the global required_score setting; if it’s higher, a score equal to the one specified in pdf.cf (default of 10) is appended to the overall score of the email message.
This approach is a departure from the usual method as it scans the content against the SpamAssassin engine, instead of using a word list filter.
Should you need to install the module, download it from: http://atmail.com/kb/attach/PDFassassin.tgz
Installation directions can be found in the README file inside the archive.
PDFassassin forum: http://forum.atmail.com/viewforum.php?id=10

Atmail RSS Feed