Trawling through the junk folder

As a break from writing unit tests this morning I took a few minutes to go through my Mail.app junk folder, looking for false positives for mail delivered over the past six weeks.

We don’t do any connection level rejection here, so any mail sent to me gets delivered somewhere. Anything that looks like malware gets dumped in one folder and never read, anything that scores a ridiculously high spamassassin score gets dumped in another folder and never read, mailing lists get handled specially and everything else gets delivered to Mail.app to deal with. That means that Mail.app sees less of the ridiculously obvious spam and is mostly left to do bayesian filtering, and whatever other magic Apple implemented.
There were about thirty false positives, and they were all B2C bulk advertising mail. I receive a lot of 1:1 mail, transactional mail and B2B marketing mail and there were no false positives at all for any of those.
All the false positives were authenticated with both SPF and DKIM. All of them were for marketing lists I’d signed up for while making a purchase. All of them were “greymail” – mail that I’d agreed to receive, and that was inoffensive but not compelling. While I easily spotted all of them as false positives via the from address and subject, none of them were content I’d particularly missed.
Almost all of the false positives were sent through ESPs I recognized the name of, and about 80% of them were sent through just two ESPs (though that wasn’t immediately obvious, as one of them not only uses random four character domain names, it uses several different ones – stop doing that).
If you’d asked me to name two large, legitimate ESPs from whom I recalled receiving blatant, blatant spam recently, it would be those same two ESPs. Is Mail.app is picking up on my opinions of the mail those ESPs are sending? It’s possible – details specific to a particular ESPs mail composition and delivery pipelines are details that a bayesian learning filter may well recognize as efficient tokens.

Related Posts

Email filtering: not going away.

VirusBlockI don’t do a whole lot of filtering of comments here. There are a couple people who are moderated, but generally if the comments contribute to a discussion they get to be posted. I do get the occasional angry or incoherent comment. And sometimes I get a comment that is triggers me to write an entire blog post pointing out the problems with the comment.
Today a comment from Joe King showed up for The Myth of the Low Complaint Rate.

Read More

Politics and Delivery

Last week I posted some deliverability advice for the DNC based on their acquisition of President Obama’s 2012 campaign database. Paul asked a question on that post that I think is worth some attention.

Read More

Filtering more than spam

The obvious application of machine learning for email is to send spam to the junk/bulk folder. Most services use some level of machine learning for filters. Places like Gmail have extensive machine learning filters to filter spam and unwanted mail away from their users.
Some organizations are taking the filtering process a step further. Almost every mail client more advanced than PINE has the ability for users to create rules to sort mail into folders.  Late last year, Office 365 rolled out a feature, Clutter that tracks how a user interacts with mail and filters unimportant mail. This allows each user to have their own filters, but without the overhead of having to create the filters.
The Clutter engine looks at both how the user interacts with mail and things it knows about the organization. For example, if Exchange is tied into Active Directory, then mail from a manager will be prioritized while mail from a co-worker may end up in the clutter folder.
Email is a critical business tool. A significant number of companies rely on email for internal and external communication. Many users treat their inbox as a todo list, prioritizing what they work on based on what’s in their mail box. Despite the needs of users, the mail client hasn’t really changed.
Over the last few years, we’ve seen different online services attempt to build a more effective email client. Some of these features were things like tabs and priority inbox at Gmail. Microsoft created the “sweep” feature for Outlook/Hotmail users to manage inbox clutter. Third parties have created services to try and improve the mailbox experience for their users. 
Many of the email filters, up to this point, have really been focused on protecting users from spam and malicious emails. Applying that filtering knowledge to more than just spam, but to the different kinds of emails makes sense to me. I’ve always had a fairly extensive set of filters, initially procmail but now sieve, to process and organize incoming mail. But I kinda like the idea that my mail client learns how I filter messages and do the right thing on its own.
I’d love to see some improvements in the mail client, that make it easier to manage and organize incoming email. It remains to be seen if this is a feature that takes off and makes its way to other clients or not.
 
 

Read More