Content Filters

Can we put the FREE!!! Myth to bed?

Really. Single words in the subject line don’t hurt your delivery, despite many, many, many blog posts out there saying they do. Filters just don’t work that way. They maybe, sorta, kinda used to, but we’ve gotten way past that now.
In fact, I can prove it. Recently I received an email from Blizzard. The subject line:
Laura — Last Chance to Claim Your FREE Copy of Warlords of Draenor — Including Level 90 Boost! Offer Expires Monday! Last Chance to Claim Your FREE Copy of Warlords of Draenor — Including Level 90 Boost! Offer Ends Monday!
We have an email with

Read More

Are you blocking yourself?

One thing that catches me up with clients sometimes is their own spam filters block their own content. It happens. In some cases the client is using an appliance. The client’s reputation is bad enough that the appliance actually blocks mail. Often these clients have no idea they are blocking their own mail, until we try and send them something and the mail is rejected.
stop_at
Typically, the issue is their domains are the problem. We mention the domains in email, and the filters do what filters do. We work around this by abbreviating the domains or calling, it’s not a big deal.
It’s a great demonstration of content filters, though. The content (the client’s domain) is blocked even when it comes from an IP with a good reputation. In fact, with Gmail I can often tell “how bad” a domain reputation is. Most mail I send from WttW to my gmail address goes to the inbox, even when the client is reporting bulk foldering at Gmail. But every once in a while a domain has such a bad reputation that any mail mentioning that domain goes to bulk.
Most folks in the deliverability space know the big players in the filtering market: Barracuda, Cloudmark, ProofPoint, etc. Those same people have no idea what filters their company uses and have never even really thought about it.
Do you know what filter your company is using to protect employees from spam?
 
 
 

Read More

Pattern matching primates

Why do we see faces where there are none? Paradolia
Why do we look at random noise and see patterns? Patternicity
Why do we think we have discovered what’s causing filtering if we change one thing and email gets through?
It’s all because we’re pattern matching primates, or as Michael Shermer puts it “people believe weird things because of our evolved need to believe nonweird things.”
Our brains are amazing and complex and filter a lot of information so we don’t have to think of it. Our brains also fill in a lot of holes. We’re primed at seeing patterns, even when there’s no real pattern. Our brains can, and do, lie to us all the time. For me, some of the important part of my Ph.D. work was learning to NOT trust what I thought I saw, and rather to effectively observe and test. Testing means setting up experiments in different ways to make it easier to not draw false conclusions.
Humans are also prone to confirmation bias: where we assign more weight to things that agree with our preconceived notions.
Take the email marketer who makes a number of changes to a campaign. They change some of the recipient targeting, they add in a couple URLs, they restructure the mail to change the text to image ratio and they add the word free to the subject line. The mail gets filtered to the bulk folder and they immediately jump to the word free as the proximate cause of the filtering. They changed a lot of things but they focus on the word free. 
Then they remove the word free from the subject line and all of a sudden the emails are delivering. Clearly the filter in question is blocking mail with free in the subject line.
Well, no. Not really. Filters are bigger and more complex than any of us can really understand. I remember a couple years ago, when a few of my close friends were working at AOL on their filter team. A couple times they related stories where the filters were doing things that not even the developers really understood.
That was a good 5 or 6 years ago, and filters have only gotten more complex and more autonomous. Google uses an artificial neural network as their spam filter.  I don’t really believe that anything this complex just looks at free in the subject line and filters based on that.
It may be that one thing used to be responsible for filtering, but those days are long gone. Modern email filters evaluate dozens or hundreds of factors. There’s rarely one thing that causes mail to go to the bulk folder. So many variables are evaluated by filters that there’s really no way to pinpoint the EXACT thing that caused a filter to trigger. In fact, it’s usually not one thing. It could be any number of things all adding up to mean this may not be mail that should go to the inbox.
There are, of course, some filters that are one factor. Filters that listen to p=reject requests can and do discard mail that fails authentication. Virus filters will often discard mail if they detect a virus in the mail. Filters that use blocklists will discard mail simply due to a listing on the blocklist.
Those filters address the easy mail. They leave the hard decisions to the more complex filters. Most of those filters are a lot more accurate than we are at matching patterns. Us pattern matching primates want to see patterns and so we find them.
 

Read More

Dodging filters makes for effective spamming

Spam is still 80 – 90% of global email volume, depending on which study look at. Most of that spam doesn’t make it to the inbox; ISPs reject a lot of it during the SMTP transaction and put much of rest of it in the bulk folder. But as the volumes of spam have grown, ISPs and filters are relying more and more on automation. Gone are the days when a team of people could manually review spam and tune filters. There’s just too much of it out there for it to be cost effective to manually review filters.
In some ways, though, automatic filters are easier to avoid than manual filters. Take a spam that I received at multiple addresses today. It’s an advertisement for lists to “meet my marketing needs.” I started out looking at this mail to walk readers through all of the reasons I distrusted this mail. But some testing, the same sorts of testing I do for client mails, told me that this mail was making it to the inbox at major ISPs.
What told me this mail was spam? Let’s look at the evidence.
listsellingspam_thumb

Read More

Content based filtering

Content filtering is often hard to explain to people, and I’m not sure I’ve yet come up with a good way to explain it.
A lot of people think content reputation is about specific words in the message. The traditional content explanation is that words like “Free” or too many exclamation points in the subject line are bad and will be filtered. But it’s not the words that are the issue it’s that the words are often found in spam. These days filters are a lot smarter than to just look at individual words, they look at the overall context of the message.
ISP_tolerances
Even when we’re talking content filters, the content is just a way to identify mail that might cause problems. Those problems are evaluated the same way IP reputation is measured: complaints, engagement, bad addresses. But there’s a lot more to content filtering than just the engagement piece. What else is part of content evaluation?

Read More

Horses, not zebras

I was first introduced to the maxim “When you hear hoofbeats, think horses not zebras” when I worked in my first molecular biology lab 20-some-odd years ago. I’m no longer a gene jockey, but I still find myself applying this to troubleshooting delivery problems for clients.
It’s not that I think all delivery problems are caused by “horses”, or that “zebras” never cause problems for email delivery. It’s more that there are some very common causes of delivery problems and it’s a more effective use of time to address those common problems before getting into the less common cases.
This was actually something that one of the mailbox provider reps said at M3AAWG in SF last month. They have no problem with personal escalations when there’s something unusual going on. But, the majority of issues can be handled through the standard channels.
What are the horses I look for with delivery problems.

Read More

The death of IP based reputation

Back in the dark ages of email delivery the only thing that really mattered to get your email into the inbox was having a good IP reputation. If your IP sent good mail most of the time, then that mail got into the inbox and all was well with the world. All that mattered was that good IP reputation. Even better for the people who wanted to game the system and get their spam into the inbox, there were many ways to get around IP reputation.
Every time the ISPs and spam filtering companies would work out a way to block spam using IP addresses, spammers would figure out a way around the problem. ISPs started blocking IPs so spammers moved to open relays. Filters started blocking open relays, so spammers moved to open proxies. Filters started blocking mail open proxies so spammers created botnets. Filters started blocking botnets, so spammers started stealing IP reputation by compromising ESP and ISP user accounts.  Filters were constantly playing catchup with the next new method of getting a good IP reputation, while still sending spam.
While spammers were adapting and subverting IP based filtering a number of other things were happening. Many smart people in the email space were looking at improving authentication technology. SPF was the beginning, but problems with SPF led to Domains Keys and DKIM. Now we’re even seeing protocols (DMARC) layered on top of DKIM. Additionally, the price of data storage and processing got cheaper and data mining software got better.
The improvement in processing power, data mining and data storage made it actually feasible for ISPs and filtering companies to analyze content at standard email delivery speeds. Since all IPv4 addresses are now allocated, most companies are planning for mail services to migrate to IPv6. There are too many IPv6 IPss to rely on IP reputation for delivery decisions.
What this means is that in the modern email filtering system, IPs are only a portion of the information filters look at when making delivery decisions. Now, filters look at the overall content of the email, including images and URLs. Many filters are even following URLs to confirm the landing pages aren’t hosting malicious software, or isn’t content that’s been blocked before. Some filters are looking at DNS entries like nameservers and seeing if those nameservers are associated with bad mail. That’s even before we get to the user feedback, in the form of “this is spam” or “this is not spam” clicks, which now seem to affect both content, domain and IP reputation.
I don’t expect IP reputation to become a complete non-issue. I think it’s still valuable data for ISPs and filters to evaluate as part of the delivery decision process. That being said, IP reputation is so much less a guiding factor in good email delivery than it was 3 or 4 years ago. Just having an IP with a great reputation is not sufficient for inbox delivery. You have to have a good IP reputation and good content and good URLs.
Anyone who wants good email delivery should consider their IP reputation, but only as one piece of the delivery strategy. Focusing on a great IP reputation will not guarantee good inbox delivery. Look at the whole program, not just a small part of it.

Read More

Return Path on Content Filtering

Return Path have an interesting post up about content filtering. I like the model of 3 different kinds of filters, in fact it’s one I’ve been using with clients for over 18 months. Spamfiltering isn’t really about one number or one filter result, it’s a complex interaction of lots of different heuristics designed to answer the question: do recipients want this kind of mail?

Read More