Trawling through the junk folder

As a break from writing unit tests this morning I took a few minutes to go through my Mail.app junk folder, looking for false positives for mail delivered over the past six weeks.

We don’t do any connection level rejection here, so any mail sent to me gets delivered somewhere. Anything that looks like malware gets dumped in one folder and never read, anything that scores a ridiculously high spamassassin score gets dumped in another folder and never read, mailing lists get handled specially and everything else gets delivered to Mail.app to deal with. That means that Mail.app sees less of the ridiculously obvious spam and is mostly left to do bayesian filtering, and whatever other magic Apple implemented.
There were about thirty false positives, and they were all B2C bulk advertising mail. I receive a lot of 1:1 mail, transactional mail and B2B marketing mail and there were no false positives at all for any of those.
All the false positives were authenticated with both SPF and DKIM. All of them were for marketing lists I’d signed up for while making a purchase. All of them were “greymail” – mail that I’d agreed to receive, and that was inoffensive but not compelling. While I easily spotted all of them as false positives via the from address and subject, none of them were content I’d particularly missed.
Almost all of the false positives were sent through ESPs I recognized the name of, and about 80% of them were sent through just two ESPs (though that wasn’t immediately obvious, as one of them not only uses random four character domain names, it uses several different ones – stop doing that).
If you’d asked me to name two large, legitimate ESPs from whom I recalled receiving blatant, blatant spam recently, it would be those same two ESPs. Is Mail.app is picking up on my opinions of the mail those ESPs are sending? It’s possible – details specific to a particular ESPs mail composition and delivery pipelines are details that a bayesian learning filter may well recognize as efficient tokens.

Related Posts

AHBL Wildcards the Internet

AHBL (Abusive Host Blocking List) is a DNSBL (Domain Name Service Blacklist) that has been available since 2003 and is used by administrators to crowd-source spam sources, open proxies, and open relays.  By collecting the data into a single list, an email system can check this blacklist to determine if a message should be accepted or rejected. AHBL is managed by The Summit Open Source Development Group and they have decided after 11 years they no longer wish to maintain the blacklist.
A DNSBL works like this, a mail server checks the sender’s IP address of every inbound email against a blacklist and the blacklist responses with either, yes that IP address is on the blacklist or no I did not find that IP address on the list.  If an IP address is found on the list, the email administrator, based on the policies setup on their server, can take a number of actions such as rejecting the message, quarantining the message, or increasing the spam score of the email.
The administrators of AHBL have chosen to list the world as their shutdown strategy. The DNSBL now answers ‘yes’ to every query. The theory behind this strategy is that users of the list will discover that their mail is all being blocked and stop querying the list causing this. In principle, this should work. But in practice it really does not because many people querying lists are not doing it as part of a pass/fail delivery system. Many lists are queried as part of a scoring system.
Maintaining a DNSBL is a lot of work and after years of providing a valuable service, you are thanked with the difficulties with decommissioning the list.  Popular DNSBLs like the AHBL list are used by thousands of administrators and it is a tough task to get them to all stop using the list.  RFC6471 has a number of recommendations such as increasing the delay in how long it takes to respond to a query but this does not stop people from using the list.  You could change the page responding to the site to advise people the list is no longer valid, but unlike when you surf the web and come across a 404 page, a computer does not mind checking the same 404 page over and over.
Many mailservers, particularly those only serving a small number of users, are running spam filters in fire-and-forget mode, unmaintained, unmonitored, and seldom upgraded until the hardware they are running on dies and is replaced. Unless they do proper liveness detection on the blacklists they are using (and they basically never do) they will keep querying a list forever, unless it breaks something so spectacularly that the admin notices it.
So spread the word,

Read More

Confusing the engineers

We went camping last weekend with a bunch of friends. Had a great time relaxing on the banks of the Tuolumne River, eating way too much and visiting.
On Saturday I was wearing a somewhat geeky t-shirt. It said 554: abort mission. (Thank you MessageSystems). At some point on Saturday every engineer came up to me, read my shirt and then looked at me and said “That’s not HTTP.”
That lead to various discussions about how their junior engineers don’t actually know SMTP at all. Why? Because the SMTP libraries just work. Apparently the HTTP libraries aren’t that great, so folks have to learn more about HTTP to troubleshoot and use them.
I’m sure there’s a joke in there somewhere: A Kindle engineer, an Android engineer and a robot engineer walk into a campsite…
EmailFilters_boxes_forblogIt did leave me thinking, though, about how it’s not that easy to run your own mail server these days. Gone are the days when running your own server was cost effective and easy. These days, there is just too much spam coming in. Crafting filters is a skilled job. It’s not that hard to run good filters. But to run good filters takes time to do well.
There are also a lot of challenges to sending mail. One of the discussions I had at the campsite was how hard it was to configure outbound mail. The engineer was helping a friend set up a website and trying to get the website to send notifications to the friend. But without setting up authentication the mail kept silently failing.
Of course, we do run our own mail server. But it’s our job and, in many ways, it keeps us honest. We don’t run many filters meaning we see what spammers are doing and can use our own experiences to better understand what commercial filters are dealing with.
For most people, though, I really think using a service is the right solution. Find one with filters that meet your needs and just pay them to deal with the headache.
 

Read More

URL reputation and shorteners

A bit of  a throwback post from Steve a few years ago. The problem has gotten a little better as some shortening companies are actually disabling spammed URLs, and blocking URLs with problematic content. I still don’t recommend using a public URL shortener in email messages, though.
Any time you put a URL in mail you send out, you’re sharing the reputation of everyone who uses URLs with that hostname. So if other people send unwanted email that has the same URL in it that can cause your mail to be blocked or sent to the bulk folder.
That has a bunch of implications. If you run an affiliate programme where your affiliates use your URLs then spam sent by your affiliates can cause your (clean, opt-in, transactional) email to be treated as spam. If you send a newsletter with advertisers URLs in it then bad behaviour by other senders with the same advertisers can cause your email to be spam foldered. And, as we discussed yesterday, if spammers use the same URL shortener you do, that can cause your mail to be marked as spam.
Even if the hostname you use for your URLs is unique to you, if it resolves to the same IP address as a URL that’s being used in spam, that can cause delivery problems for you.
What does this mean when it comes to using URL shorteners (such as bit.ly, tinyurl.com, etc.) in email you send out? That depends on why you’re using those URL shorteners.
The URLs in the text/html parts of my message are big and ugly
Unless the URL you’re using is, itself, part of your brand identity then you really don’t need to make the URL in the HTML part of the message visible at all. Instead of using ‘<a href=”long_ugly_url”> long_ugly_url </a>’ or ‘<a href=”shortened_url”> shortened_url </a>’ use ‘<a href=”long_ugly_url”> friendly phrase </a>’.
(Whatever you do, don’t use ‘<a href=”long_ugly_url”> different_url </a>’, though – that leads to you falling foul of phishing filters).
The URLs in the text/plain parts of my message are big and ugly
The best solution is to fix your web application so that the URLs are smaller and prettier. That will make you seem less dated and clunky both when you send email, and when your users copy and paste links to your site via email or IM or twitter or whatever. “Cool” or “friendly” URLs are great for a lot of reasons, and this is just one. Tim Berners-Lee has some good thoughts on this, and AListApart has two good articles on how to implement them.
If you can’t do that, then using your own, branded URL shortener is the next best thing. Your domain is part of your brand – you don’t want to hide it.
I want to use a catchy URL shortener to enhance my brand
That’s quite a good reason. But if you’re doing that, you’re probably planning to use your own domain for your URL shortener (Google uses goo.gl, Word to the Wise use wttw.me, etc). That will avoid many of the problems with using a generic URL shortener, whether you implement it yourself or use a third party service to run it.
I want to hide the destination URL from recipients and spam filters
Then you’re probably spamming. Stop doing that.
I want to be able to track clicks on the link, using bit.ly’s neat click track reporting
Bit.ly does have pretty slick reporting. But it’s very weak compared to even the most basic clickthrough reporting an ESP offers. An ESP can tell you not just how many clicks you got on a link, but also which recipients clicked and how many clicks there were for all the links in a particular email or email campaign, and how that correlates with “opens” (however you define that).
So bit.ly’s tracking is great if you’re doing ad-hoc posts to twitter, but if you’re sending bulk email you (or your ESP) can do so much better.
I want people to have a short URL to share on twitter
Almost all twitter clients will abbreviate a URL using some URL shortener automatically if it’s long. Unless you’re planning on using your own branded URL shortener, using someone else’s will just hide your brand. It’s all probably going to get rewritten as t.co/UgLy in the tweet itself anyway.
If your ESP offers their own URL shortener, integrating into their reporting system for URLs in email or on twitter that’s great – they’ll be policing users of that just the same as users of their email service, so you’re unlikely to be sharing it with bad spammers for long enough to matter.
All the cool kids are using bit.ly, so I need to to look cool
This one I can’t help with. You’ll need to decide whether bit.ly links really look cool to your recipient demographic (Spoiler: probably not) and, if so, whether it’s worth the delivery problems they risk causing.
And, remember, your domain is part of your brand. If you’re hiding your domain, you’re hiding your branding.
So… I really do need a URL shortener. Now what?
It’s cheap and easy to register a domain for just your own use as a URL shortener. Simply by having your own domain, you avoid most of the problems. You can run a URL shortener yourself – there are a bunch of freely available packages to do it, or it’s only a few hours work for a developer to create from scratch.
Or you can use a third-party provider to run it for you. (Using a third-party provider does mean that you’re sharing the same IP address as other URL shorteners – but everyone you’re sharing with are probably people like you, running a private URL shortener, so the risk is much, much smaller than using a freely available public URL shortener service.)
These are fairly simple fixes for a problem that’s here today, and is going to get worse in the future.

Read More