Mentally modelling filters

When we talk about filters, we often think there is one filter. But, in many cases there are multiple stages of filters, each examining mail in a different way.

Simple model of an email filter that takes mail and puts it in the inbox or spam folder

In deliverability terms the easiest filters to ignore are the individual user filters. Mostly because there’s nothing we can do about those. These are the baysean style filters built into a lot of email clients as well as specific filters users create to handle their own mail. As bulk senders, there’s not much we can do here. Senders have to accept users will do whatever they want with mail. Sometimes it benefits senders like when a user writes a rule to mark a particular message as important. Other times it doesn’t benefit senders, like when a user decides to trash a message without reading it. In both cases, senders don’t get a say.

It’s these user filters, and individual user actions on messages, that feed back into what we generally describe as “machine learning” filters. These are the black box style filters that measure thousands of different things about an email and make decisions about the whole mailstream. Many email delivery folks understand how SpamAssassin works. I think of SA as the precursor to a lot of the machine learning filters. While ML is much more complicated, the filters basically look at everything about an email and work out a score. That score determines where an email is delivered to the “average” user that doesn’t have any specific filters for that sender.

Machine learning filters are extremely conditional and will deliver mail to different places for different recipients. They’re adaptive and they learn. They’re under constant development and refinement to catch types of bad mail they missed and to let through types of good mail that they caught.

There’s another level of filter here, the SMTP level filters. These are very non-conditional filters. They’re basically hard and fast rules that are pushed out to the MX by the machine learning filter. The questions this filter asks are almost all yes or no questions. Examples of these kinds of questions

  • Is this IP or domain is on a blocklist? If yes, reject. If no, pass it on.
  • Does this email mentions a URL we’ve seen in phishing mail? If yes, reject the message.
  • Is this email is part of a stream we like? If yes, let it in and let it in fast.

There are other parts to these filters as well, but again the MX filters really ask simple yes or no questions.

  • Does this email address exist?
  • Is this message authenticated?
  • Is there a DMARC record and does the message pass DMARC?”

This isn’t a model that encompasses all the complexity of email filters. But it does help drive what we can and should do to troubleshoot delivery problems.

Related Posts

Spamhaus rising?

Ken has a good article talking about how many ESPs have tightened their standards recently and are really hounding their customers to stop sending mail recipients don’t want and don’t like. Ken credits much of this change to Spamhaus and their new tools.

Read More

Legitimate mail in spamfilters

It can be difficult and frustrating for a sender to understand they whys and wherefores of spam filtering. Clearly the sender is not spamming, so why is their mail getting caught in spam filters?
I have a client that goes through this frustration on rare occasions. They send well crafted, fun, engaging content that their users really want. They have a solid reputation at the ISPs and their inbox stats are always above 98%. Very, very occasionally, though, they will see some filtering difficulties at Postini. It’s sad for all of us because Postini doesn’t tell us enough about what they’re doing to understand what my client is doing to trigger the filters. They get frustrated because they don’t know what’s going wrong; I get frustrated because I can’t really help them, and I’m sure their recipients are frustrated because they don’t get their wanted mail.
Why do a lot of filter vendors not communicate back to listees? Because not all senders are like my clients. Some senders send mail that recipients can take or leave. If the newsletter shows up in their inbox they may read it. If the ad gets in front of their face, they may click through. But, if the mail doesn’t show up, they don’t care. They certainly aren’t going to look for the mail in their bulk folder. Other senders send mail that users really don’t want. It is, flat out, spam.
The thing is, all these senders describe themselves as legitimate email marketers. They harvest addresses, they purchase lists, they send mail to spamtraps, and they still don’t describe themselves as spammers. Some of them have even ended up in court for violating various anti-spam laws and they still claim they’re not spammers.
Senders are competing with spammers for bandwidth and resources at the ISPs, they’re competing for postmaster attention at the ISPs and they’re competing for eyeballs in crowded inboxes.
It’s the sheer volume of spam and the crafty evilness of spammers that drives the constant change and improvement in spamfilters. It’s tough to keep up with the spamfilters because they’re trying to keep up with the spammers. And the spammers are continually looking for new ways to exploit recipients.
It can be a challenge to send relevant, engaging email while dealing with spamfilters and ISPs. But that’s what makes this job so much fun.

Read More

Content, trigger words and subject lines

There’s been quite a bit of traffic on twitter this afternoon about a recent blog post by Hubspot identifying trigger words senders should avoid in an email subject line. A number of email experts are assuring the world that content doesn’t matter and are arguing on twitter and in the post comments that no one will block an email because those words are in the subject line.
As usually, I think everyone else is a little bit right and a little bit wrong.
The words and phrases posted by Hubspot are pulled out of the Spamassassin rule set. Using those words or exact phrases will cause a spam score to go up, sometimes by a little (0.5 points) and sometimes by a lot (3+ points). Most spamassassin installations consider anything with more than 5 points to be spam so a 3 point score for a subject line may cause mail to be filtered.
The folks who are outraged at the blog post, though, don’t seem to have read the article very closely. Hubspot doesn’t actually say that using trigger words will get mail blocked. What they say is a lot more reasonable than that.

Read More