About that Junk Folder

I use a pretty standard mail filtering setup – a fairly vanilla SpamAssassin setup on the front end, combined with naive bayesian content filters in my mail client. So I don’t reject any mail, it just ends up in one of my inboxes or a junk folder. And I have a mix of normal consumer mail – facebook, twitter, lots of commercial newsletters, mail from friends and colleagues and spam. (As well as that I have a lot of high traffic industry mailing lists, but overall it’s a fairly normal mix.)
My bayesian filter gets trained mostly by me hitting “this is spam” when spam makes it to my inbox. If I’m expecting an email “immediately” – something like a mailing list COI confirmation or email as part of buying something online – I’ll check my spam filter and move the mail to my inbox in the rare case it ended up there. Other than that I let it and spamassassin chug along with no tweaking.
I’m starting a data analysis project, based on my own inboxes, and as part of that I’m using some tools to look for false positives in my junk folders, and manually fixing anything that’s misclassified. I’ve been doing this for a couple of hours now, and I’ve found some interesting things.

  1. Simple content filters work remarkably well out of the box, at least for my mail stream. Spectacularly well. There’s very little in the way of false positives. Very, very little.
  2. Of those false positives there’s nothing I’d have been bothered about. It’s generic, unexciting junk mail.
  3. Most of the systemic false positives seem to be correlated with the senders doing something bad. Heathrow Express, for instance, sent me mail every two weeks or so since I’d signed up. Then for no obvious reason they stopped sending for three months, then started sending again. Every mail they sent after that pause ended up in the junk folder, and I never missed them.
  4. I get regular newsletters from ThinkGeek. Every one of those goes to the inbox. I occasionally get mails from them about my account (“you’ve got 420 geek points left”) that are kinda transactional, but not something I expect to see – and they all end up in the junk folder. Several other senders do the same thing, and get the same result.
  5. Several companies have used tagged addresses to send me newsletters for a while, and also used them to send unsolicited facebook invites (to the tagged address, from facebook servers). The regular newsletters all go to the inbox, while the facebook invites all go to the junk folder. “Legitimate” facebook mail, meanwhile, keeps going to the inbox.
  6. Apple send me a lot of newsletters – I’m a Mac and iPhone developer, I get their consumer newsletters, transactional stuff from our local store – lots and lots of newsletters. They all made it to the inbox except for one. The one that ended up in the junk folder was a one-off about recycling, and it wasn’t up to their usual design standards – it had ugly big green “call to action” headlines in it, very different to their usual clean design.
  7. Just one sender hit the junk folder every time. The distinctive thing about their messages (apart from them not being something I missed) was that the plain text part of them was dreadful, just a bad lynx dump of the html section. Even a “No plain text for you! Go to this link!” would have been better.

I was surprised at how effective this simple content-based filtering setup had worked with little tuning other than hitting the this-is-spam button – both in it’s accuracy at removing spam while keeping a very low false positive rate, but also how well the false positives matched my judgement of “Meh. This mail isn’t interesting.”.
We spend a lot of time talking about the things you should do to make the mail relevant to the recipient – compelling content, consistent, predictable delivery schedules, clear consistent branding, use of a single consistent mail stream to communicate with a recipient rather than several different streams. Until I went through this exercise it wasn’t clear to me how much of an effect those things also have on fairly simple recipient-trained filters.
 

Related Posts

Hotmail fights greymail

I’ve heard a lot of marketers complaining about people like me who advocate actually purging addresses from marketing lists if those addresses are non-responsive over a long period of time. They have any number of reasons this advice is poor. Some of them can even demonstrate that they get significant revenue from mailing folks who haven’t opened an email in years.
They also point out that there isn’t a clear delivery hit to leaving those abandoned addresses on their list. It’s not like bounces or complaints. There isn’t a clear way to measure the dead addresses and even if you could there aren’t clear threshold guidelines published by the ISPs.
Nevertheless, I am seeing more and more data that convinces me the ISPs do care about companies sending mail that users never open or never read or never do anything with.
The most recent confirmation was the announcement that Hotmail was deploying more tools to help users manage “greymail.” I briefly mentioned the announcement last week. Hotmail has their own blog post up about the changes.
It seems my initial claim that these changes this won’t affect delivery may have been premature. In fact, these changes are all about making it easier for Hotmail users to deal with the onslaught of legitimate but unwanted mail.

Read More

Cyber Monday inundation

The cyber monday inundation of mail has hit my mailbox. There’s been a clear increase in marketing mail over the last week. Unfortunately for those marketers, it’s too much and I am just scanning subject lines and marking as read. I don’t have the time to read all this mail.

Read More

When the inbox isn't the inbox

There was a discussion today on the OI list about email filtering that brought up something I usually don’t mention in delivery discussions. Most email marketers treat the inbox as the holy grail of delivery. Everything about delivery is focused on getting to the magical inbox.
I think, though, that inbox is often just shorthand for “not landing in the bulk or spam folders.”
For some recipients, particularly those of us who get lots of mail, sometimes it’s better to land in a folder rather than the inbox. I have a folder set up, where most of my commercial mail goes. It’s labeled “commercial.” I check it once or twice a day.
This is beneficial to me and to the senders. Why? Because when I check that folder I’m ready to actually look at my commercial mail. I’m looking for those offers.
For someone like me, who does most of their work in their inbox, commercial interruptions are a problem. Commercial mail that ends up in my inbox, which can happen if I’ve been lazy about filters, interrupts me and usually doesn’t get read. But when it’s in my commercial folder? Well, then I can look at it, visit websites and make purchases.
So just remember, it’s not that you want mail in the inbox as much as you want mail somewhere that the recipient will notice it.

Read More