Content based filtering

Content filtering is often hard to explain to people, and I’m not sure I’ve yet come up with a good way to explain it.
A lot of people think content reputation is about specific words in the message. The traditional content explanation is that words like “Free” or too many exclamation points in the subject line are bad and will be filtered. But it’s not the words that are the issue it’s that the words are often found in spam. These days filters are a lot smarter than to just look at individual words, they look at the overall context of the message.
ISP_tolerances
Even when we’re talking content filters, the content is just a way to identify mail that might cause problems. Those problems are evaluated the same way IP reputation is measured: complaints, engagement, bad addresses. But there’s a lot more to content filtering than just the engagement piece. What else is part of content evaluation?

  1. Does the mail have hashbusters? Hashbusters are blocks of text, sometimes invisible to the recipient, that are put in an email in order to break some types of filtering. Ways to hide text include in HTML comments and by making foreground and background text the same color.
  2. Does the mail have valid HTML? Spammers have frequently used invalid HTML tags as a way to avoid filters by breaking up content or as hashbusters.
  3. Does this mail contain malicious content? These filters look for virus signatures or code that may compromise a recipient’s computer. Very few legitimate mailers have mail caught in virus filters, but every incoming mail is still  scanned for viruses or malicious code.
  4. Does this mail look like a phish? These filters look at the domains and authentication, but also look for common words and tricks phishers use. This filter is most likely to catch legitimate mail using tracking links with different URL content in the text portion of the HTML. An example of this kind of trigger is <a href=”http://tracking.example.com/login.html”>http://paypal.com</a>. Making sure there aren’t URLs, email addresses or hostnames in the text portion of a link generally avoids this kind of filter.
  5. Is this an industry with a bad reputation? The most obvious examples here are payday loans. There are so many horrible players in the online payday loan industry that it doesn’t really matter how good or clean individual mailers are. Payday loans are filtered heavily. Stock and financial messages also have challenges because there are so many pump-n-dump spammers out there.

Changing content can cause an improvement in delivery. But if that content was flagged because of user complaints or bad recipient profiles, the content filters will catch up. Continuing to attempt to evade filters by changing content can result in IP based filtering.
These are just a few of the things companies look at when evaluating content.
 

Related Posts

Looking for message labs help?

There’s a common bounce error from the Message Labs’ filtering appliance that goes no where.

Read More

Horses, not zebras

I was first introduced to the maxim “When you hear hoofbeats, think horses not zebras” when I worked in my first molecular biology lab 20-some-odd years ago. I’m no longer a gene jockey, but I still find myself applying this to troubleshooting delivery problems for clients.
It’s not that I think all delivery problems are caused by “horses”, or that “zebras” never cause problems for email delivery. It’s more that there are some very common causes of delivery problems and it’s a more effective use of time to address those common problems before getting into the less common cases.
This was actually something that one of the mailbox provider reps said at M3AAWG in SF last month. They have no problem with personal escalations when there’s something unusual going on. But, the majority of issues can be handled through the standard channels.
What are the horses I look for with delivery problems.

Read More

Filtering is not just about spam

A lot of filters started out just as filters against spam. But over the years they’ve morphed into more general blocks against dangerous or problematic email. There’s a lot of crime and bad behavior on the internet, much of it using email as a conduit or vector. Filtering is so much more than stopping spam now. It’s as much, or more, about stopping crime.
Email filters are essential to protect us from scammers. Sometimes I forget this, and then I read about a grandmother getting swindled by a Nigerian scammer and ending up dead.
There are real consequences to poor filtering and there is real crime facilitated by email. It’s easy to forget this as we deal with the email that gets caught in filters when they shouldn’t.
Filters are one of the first lines of defense against online crime.
Not only does filtering stop crime, but they also keep email working. An unfiltered mail stream is an ugly, unreadable, unworkable mess.

Read More