Why does everyone tell you to avoid .biz in your emails?

… or Why do spam filters sometimes have some very strange ideas?
It’s been dogma for a long time that if you’re doing email marketing you should avoid using a .biz domain in your mails. Even if your main website was in .biz, you should use something different in your messages, perhaps a website you buy solely for use in email that redirects to your real .biz website. Last year I looked at why that was, and what could be done about it.
One main reason for avoiding it has been resolved (so if you’ve been avoiding using .biz URLs in your mail now might be a good time to re-test that decision). And enough time has gone by that I can share the ugly reasons as to why .biz was considered a sure sign of spam without good reason for so long without upsetting everyone.
The simple reason was SpamAssassin. SpamAssassin is very widely used to filter mail, both in it’s open source version and buried anonymously deep inside countless commercial spam filters and filtering appliances. Not only that, but SpamAssassin is readily available, so most people looking to do pre-mailing content checks or looking at why content-based filters are objecting to a particular email will use SpamAssassin as their model. It’s very widely deployed, and influential far beyond the size of it’s deployed base.
SpamAssassin is a score-based spam filter – it checks an email against hundreds of rules, adds up the scores of each rule that matches and, in typical setups, decides the mail is spam if the total score is five or more. Pretty reasonable, but here are a few of the rules and scores (from the 2006 version of SpamAssassin)

  • 1.392 Advance Fee Fraud (Nigerian 419)
  • 0.493 Refers to an erectile drug
  • 1.995 Subject contains G a p p y T e x t
  • 0.496 Message is 40-50% HTML
  • 2.100 From: domain has a series of 7 consonants
  • 1.635 Possible porn – Hardcore Porn
  • 2.013 Contains a URL in the BIZ top level domain
  • 1.273 Contains a URL in the INFO top level domain

You can’t quite treat the scores as SpamAssassins measure of the “spamminess” of a message (“a .biz URL is 23% spammier than hardcore porn” … “The URL microsoft.biz is about as spammy as From: Ignatious T. Aardvark <success@sdfghjkl.com>“) but it’s pretty clear that using a .biz domain in your mail had a huge effect on your SpamAssassin score, and a bad risk to take if you could easily avoid it.
So, was .biz really that spam-ridden? I recall it being pretty bad when it first launched, so it’s reasonable that SpamAssassin has that rule, but was it still bad by 2006? Bad enough to merit a score quite that high? That’s hard to measure, but a reasonable metric is the percentage of domains in each top level domain (.com, .net, .biz etc) that had been spotted as definite spam sign by the folks at SURBL.
Percentage of domains listed in SURBL
So .biz looks just fine – comparable with .com or .net, and certainly a lot better than .info. Why was SpamAssassin still treating it as so spammy?
SpamAssassin developers measure and develop their scores based on several corpuses of recently received email, hand categorised into spam mail and non-spam (“ham”) mail. Like many other spam filters, they stay fairly vague about where exactly these corpuses come from (to avoid people gaming the system) but they seem to be based mostly on the personal mailboxes of developers. Of the five corpuses SpamAssassin were using in 2006, four saw almost no .biz spam, but one saw quite a lot (graph of .biz URLs in spam). More importantly, though, none of them saw more than tiny number of .biz URLs in non-spam(graph of .biz URLs in non-spam).
The algorithm that SpamAssassin uses to assign scores to the rules is complex, but loosely speaking if a rule helps to correctly classify one of the mails in the spam corpus as spam, then the score of that rule will tend to be increased, while if a rule helps to wrongly classify non-spam as spam then the score for that rule will tend to be decreased. In the test corpuses used, .biz URLs hardly ever appear in non-spam, so there’s no pressure to reduce the score assigned to that rule.
So the final answer to the question in the title is:

  1. Long, long ago when .biz was new it was used by a lot of spammers (because it was new, so a lot of good domains were easily available).
  2. SpamAssassin added a rule to recognize .biz URLs, and increase the spam score of mails containing them
  3. SpamAssassin is very influential, even more so than it’s wide deployment makes it.
  4. Legitimate mailers saw that SpamAssassin would punish them for using a .biz URL, so they pretty much all avoid using .biz URLs in their email.
  5. With effectively no legitimate bulk mail using .biz URLs, there’s nothing to keep the SpamAssassin score for the “contains a .biz URL” from creeping up, and being even more punitive to use of .biz URLs.
  6. Go to step 4

This leads to a vicious circle where legitimate mailers don’t use .biz as SpamAssassin would punish them for doing so, and SpamAssassin continues to punish anyone using .biz URLs because they’re not used by legitimate mailers. SpamAssassin eventually broke this particular circle by removing the rule from their latest release, but not until it had had a major effect on use of .biz URLs that still persists.
The .biz issue has since been resolved, but there’s a broader deliverability conclusion to draw from this story. While on a branding and image level you want your messages to stand out from all your competitors’ messages, on a technical level you want your mails to be similar to those of other legitimate mailers. That way, if there’s an oddity in a content filter that makes it classify your mail as spam it’ll likely be classifying lots of other legitimate mail as spam too, and be fixed fairly quickly (probably before it’s deployed into production).
That includes things like the way you use HTML and MIME, the way you register the domain names you use and the way you use them as URLs in messages and a bunch of other things. Being aware of the sort of things that content-filters like SpamAssassin look at is a good place to start.

Related Posts

They’re not blocking you because they hate you.

Really. They’re blocking you because you’re doing something that is triggering their blocking mechanisms.
This has happened over and over and over again. Some political or activist website sends out an email that gets blocked by some large ISP and the political site turns it into a giant crisis that means the ISP hates them or is trying to shut them up or is trying to silence their message.
Except that’s not what is going on. The folks at the large ISPs who handle blocking and incoming mail are incredibly smart and conscientious . They take their jobs seriously. They, both personally and corporately, want their customers (the end recipients) to receive the email they want. Additionally, they do not want to deliver mail that the recipients did not ask to receive.
In almost no cases is the block a particular activist site encounters a result of the ISP not liking the content of the email. If an activist site is being blocked it’s due to complaints or reputation or something that ISPs measure and block on. Some person at the ISP didn’t read your email, decide they didn’t like what you had to say and then block that email. That email was blocked because something related to that email triggered the thresholds for blocking.
Of course, as with everything online, there are caveats. In this case it’s that the above statements really only hold true for large ISPs in free countries. There are some countries in the world that do block email based on content, and that is dictated by the government. Likewise, some small ISPs will block based on the guy in charge not liking the email.
Generally, though, if an activist site is being blocked by a large ISP in the US or other free countries it is because their mailings are somehow not complying with that ISPs standards. Instead of starting an email campaign or blog campaign to shame the ISP for suppressing speech, it is much more productive to actually contact the ISP in question and find out what went wrong.

Read More

Greylisting: that which Yahoo does not do

Over the last couple days multiple people have asserted to me that Yahoo is greylisting mail. The fact that Yahoo itself asserts it is not using greylisting as a technique to control mail seems to have no effect on the number of people who believe that Yahoo is greylisting.
Deeply held beliefs by many senders aside, Yahoo is not greylisting. Yahoo is using temporary failures (4xx) as a way to defer and control mail coming into their servers and their users.
I think much of the problem is that the definition of greylisting is not well understood by the people using the term. Greylisting generally refers to a process of refusing email with a 4xx response the first time delivery is attempted and accepting the email at the second delivery attempt. There are a number of ways to greylist, per message, per IP or per from address. The defining feature of greylisting is that the receiving MTA keeps track of the messages (IP or addresss) that it has rejected and allows the mail through the second time the mail is sent.
This technique for handling email is a direct response to some spamming software, particularly software that uses infected Windows machines to send email. The spam software will drop any email in response to a 4xx or 5xx response. Well designed software will retry any email receiving a 4xx response. By rejecting anything on the first attempt with a 4xx, the receiving ISPs can trivially block mail from spambots.
Where does this fit in with what Yahoo is doing? Yahoo is not keeping track of the mail it rejects and is not reliably allowing email through on the second attempt. There are a couple reasons why Yahoo is deferring mail.

Read More

The key to inbox delivery: make your email relevant.

Following on from previous posts here, here and here, JD Falk discusses ways to get your email into the inbox.

Read More