Trawling through the junk folder

As a break from writing unit tests this morning I took a few minutes to go through my Mail.app junk folder, looking for false positives for mail delivered over the past six weeks.
trashcans
We don’t do any connection level rejection here, so any mail sent to me gets delivered somewhere. Anything that looks like malware gets dumped in one folder and never read, anything that scores a ridiculously high spamassassin score gets dumped in another folder and never read, mailing lists get handled specially and everything else gets delivered to Mail.app to deal with. That means that Mail.app sees less of the ridiculously obvious spam and is mostly left to do bayesian filtering, and whatever other magic Apple implemented.
There were about thirty false positives, and they were all B2C bulk advertising mail. I receive a lot of 1:1 mail, transactional mail and B2B marketing mail and there were no false positives at all for any of those.
All the false positives were authenticated with both SPF and DKIM. All of them were for marketing lists I’d signed up for while making a purchase. All of them were “greymail” – mail that I’d agreed to receive, and that was inoffensive but not compelling. While I easily spotted all of them as false positives via the from address and subject, none of them were content I’d particularly missed.
Almost all of the false positives were sent through ESPs I recognized the name of, and about 80% of them were sent through just two ESPs (though that wasn’t immediately obvious, as one of them not only uses random four character domain names, it uses several different ones – stop doing that).
If you’d asked me to name two large, legitimate ESPs from whom I recalled receiving blatant, blatant spam recently, it would be those same two ESPs. Is Mail.app is picking up on my opinions of the mail those ESPs are sending? It’s possible – details specific to a particular ESPs mail composition and delivery pipelines are details that a bayesian learning filter may well recognize as efficient tokens.

Related Posts

Filtering more than spam

The obvious application of machine learning for email is to send spam to the junk/bulk folder. Most services use some level of machine learning for filters. Places like Gmail have extensive machine learning filters to filter spam and unwanted mail away from their users.
Some organizations are taking the filtering process a step further. Almost every mail client more advanced than PINE has the ability for users to create rules to sort mail into folders.  Late last year, Office 365 rolled out a feature, Clutter that tracks how a user interacts with mail and filters unimportant mail. This allows each user to have their own filters, but without the overhead of having to create the filters.
The Clutter engine looks at both how the user interacts with mail and things it knows about the organization. For example, if Exchange is tied into Active Directory, then mail from a manager will be prioritized while mail from a co-worker may end up in the clutter folder.
Email is a critical business tool. A significant number of companies rely on email for internal and external communication. Many users treat their inbox as a todo list, prioritizing what they work on based on what’s in their mail box. Despite the needs of users, the mail client hasn’t really changed.
Over the last few years, we’ve seen different online services attempt to build a more effective email client. Some of these features were things like tabs and priority inbox at Gmail. Microsoft created the “sweep” feature for Outlook/Hotmail users to manage inbox clutter. Third parties have created services to try and improve the mailbox experience for their users. 
Many of the email filters, up to this point, have really been focused on protecting users from spam and malicious emails. Applying that filtering knowledge to more than just spam, but to the different kinds of emails makes sense to me. I’ve always had a fairly extensive set of filters, initially procmail but now sieve, to process and organize incoming mail. But I kinda like the idea that my mail client learns how I filter messages and do the right thing on its own.
I’d love to see some improvements in the mail client, that make it easier to manage and organize incoming email. It remains to be seen if this is a feature that takes off and makes its way to other clients or not.
 
 

Read More

Email filtering: not going away.

VirusBlockI don’t do a whole lot of filtering of comments here. There are a couple people who are moderated, but generally if the comments contribute to a discussion they get to be posted. I do get the occasional angry or incoherent comment. And sometimes I get a comment that is triggers me to write an entire blog post pointing out the problems with the comment.
Today a comment from Joe King showed up for The Myth of the Low Complaint Rate.

Read More

January 2015 – The Month in Email

It’s February already! January went fast, right? At WttW, we are gearing up for MAAWG SF later this month — will we see you there?
We started the year with a set of predictions about email. Mostly we think email will continue to be great at some things and not-so-great at other things, and we’ll keep fighting the good fight to make it better.
As always, I’m interested in filters and how spammers continue to work around them to reach the inbox. I also wrote about how the language of an email impacts delivery, and wrote an expanded response to a comment suggesting email filters should be illegal. You can guess where I stand on that (and if you can’t, perhaps you might read more about how email is an inherently malicious traffic stream…)
I also took a moment to point out a trend I’m really enjoying, which is the rise of content marketing (a.k.a. giving customers useful and interesting information they can’t find elsewhere). As I said in the post, I’ll be curious to see how ROI plays out with this strategy.
We also talked about some of the less exciting content we see in email, notably the infamous Murkowski Statement, by which a spammer declares “Nope! Nothing to see over here!”
Steve also pointed out some content shenanigans in the form of hidden preview text, with some additional clarification from the original marketer in the comments.
In industry news, the big story was that Microsoft has partially implemented DMARC for Office365, and was the first to make a public statement about the specific ways they’ve chosen to implement. In my post, I did a walkthrough of a message to illustrate a bit about how this works, which might be useful if you’re trying to wrap your head around DMARC implementations.
We also talked about consolidation in the ESP space, and got a number of comments from readers about who they think might be next. Shortly thereafter, Listcast was acquired by MailerMailer.
Josh noted a few major shutdowns: Yahoo China email services and the AHBL list. The latter explores the challenges inherent in decommissioning a blacklist, and there’s a good discussion in the comments, so you might check it out if you missed that earlier this month.
Josh also pointed to the Salesforce State of Marketing report, which is always a useful set of metrics about how marketers are using email and other channels. It’s definitely worth a read.

Read More