Content based filtering

Content filtering is often hard to explain to people, and I’m not sure I’ve yet come up with a good way to explain it.
A lot of people think content reputation is about specific words in the message. The traditional content explanation is that words like “Free” or too many exclamation points in the subject line are bad and will be filtered. But it’s not the words that are the issue it’s that the words are often found in spam. These days filters are a lot smarter than to just look at individual words, they look at the overall context of the message.
ISP_tolerances
Even when we’re talking content filters, the content is just a way to identify mail that might cause problems. Those problems are evaluated the same way IP reputation is measured: complaints, engagement, bad addresses. But there’s a lot more to content filtering than just the engagement piece. What else is part of content evaluation?

  1. Does the mail have hashbusters? Hashbusters are blocks of text, sometimes invisible to the recipient, that are put in an email in order to break some types of filtering. Ways to hide text include in HTML comments and by making foreground and background text the same color.
  2. Does the mail have valid HTML? Spammers have frequently used invalid HTML tags as a way to avoid filters by breaking up content or as hashbusters.
  3. Does this mail contain malicious content? These filters look for virus signatures or code that may compromise a recipient’s computer. Very few legitimate mailers have mail caught in virus filters, but every incoming mail is still  scanned for viruses or malicious code.
  4. Does this mail look like a phish? These filters look at the domains and authentication, but also look for common words and tricks phishers use. This filter is most likely to catch legitimate mail using tracking links with different URL content in the text portion of the HTML. An example of this kind of trigger is <a href=”http://tracking.example.com/login.html”>http://paypal.com</a>. Making sure there aren’t URLs, email addresses or hostnames in the text portion of a link generally avoids this kind of filter.
  5. Is this an industry with a bad reputation? The most obvious examples here are payday loans. There are so many horrible players in the online payday loan industry that it doesn’t really matter how good or clean individual mailers are. Payday loans are filtered heavily. Stock and financial messages also have challenges because there are so many pump-n-dump spammers out there.

Changing content can cause an improvement in delivery. But if that content was flagged because of user complaints or bad recipient profiles, the content filters will catch up. Continuing to attempt to evade filters by changing content can result in IP based filtering.
These are just a few of the things companies look at when evaluating content.
 

Related Posts

Abuse it and lose it

Last week I blogged about the changes at ISPs that make “ISP Relations” harder for many senders. But it’s not just ISPs that are making it a little more difficult to get answers to questions, some spam filtering companies are pulling back on offering support to senders.
For instance, Cloudmark sent out an email to some ESPs late last week informing them that Cloudmark was changing their sender support policies. It’s not that they’re overwhelmed with delisting requests, but rather that many ESPs are asking for specific data about why the mail was blocked. In December, Spamcop informed some ESPs that they would stop providing data to those ESPs about specific blocks and spam trap hits.
These decisions make it harder for ESPs to identify specific customers and lists causing them to get blocked. But I understand why the filtering companies have had to take such a radical step.
Support for senders by filtering companies is a side issue. Their customers are the users of the filtering service and support teams are there to help paying customers. Many of the folks at the filtering companies are good people, though, and they’re willing to help blocked senders and ESPs to figure out the problem.
For them, providing information that helps a company clean up is a win. If an ESP has a spamming customer and the information from the filtering company is helping the ESP force the customer to stop spamming that’s a win and that’s why the filtering companies started providing that data to ESPs.
Unfortunately, there are people who take advantage of the filtering companies. I have dozens of stories about how people are taking advantage of the filtering companies. I won’t share specifics, but the summary is that some people and ESPs ask for the same data over and over and over again. The filtering company rep, in an effort to be helpful and improve the overall email ecosystem, answers their questions and sends the data. In some cases, the ESP acts on the data, the mail stream improves and everyone is happy (except maybe the spammer). In other cases, though, the filtering company sees no change in the mail stream. All the filtering company person gets is yet another request for the same data they sent yesterday.
Repetition is tedious. Repetition is frustrating. Repetition is disheartening. Repetition is annoying.
What we’re seeing from both Spamcop and Cloudmark is the logical result from their reps being tired of dealing with ESPs that aren’t visibly fixing their customer spam problems. Both companies are sending some ESPs to the back of the line when it comes to handling information requests, whether or not those ESPs have actually been part of the problem previously.
The Cloudmark letter makes it clear what they’re frustrated about.

Read More

Ever changing filtering

One of the ongoing challenges sending email, and managing a high volume outbound mail server is dealing with the ongoing changes in filtering. Filters are not static, nor can they be. As ISPs and filtering companies identify new ways to separate out wanted email from unwanted email, spammers find new ways to make their mail look more like wanted mail.
This is one reason traps are useful to filtering companies. With traps there is no discussion about whether or not the mail was requested. No one with any connection to the email address opted in to receive mail. The mail was never requested. While it is possible for trap addresses to get on any list monitoring mail to spam traps is a way to monitor which senders don’t have good practices.
New filtering techniques are always evolving. I mentioned yesterday that Gmail was making filtering changes, and that this was causing a lot of delivery issues for senders. The other major challenge for Gmail is the personalized delivery they are doing. It’s harder and harder for senders to monitor their inbox delivery because almost every inbox is different at Gmail. I’ve seen different delivery in some of my own mailboxes at Gmail.
All of this makes email delivery an ongoing challenge.

Read More

Filtering is not just about spam

A lot of filters started out just as filters against spam. But over the years they’ve morphed into more general blocks against dangerous or problematic email. There’s a lot of crime and bad behavior on the internet, much of it using email as a conduit or vector. Filtering is so much more than stopping spam now. It’s as much, or more, about stopping crime.
Email filters are essential to protect us from scammers. Sometimes I forget this, and then I read about a grandmother getting swindled by a Nigerian scammer and ending up dead.
There are real consequences to poor filtering and there is real crime facilitated by email. It’s easy to forget this as we deal with the email that gets caught in filters when they shouldn’t.
Filters are one of the first lines of defense against online crime.
Not only does filtering stop crime, but they also keep email working. An unfiltered mail stream is an ugly, unreadable, unworkable mess.

Read More