Trawling through the junk folder

As a break from writing unit tests this morning I took a few minutes to go through my Mail.app junk folder, looking for false positives for mail delivered over the past six weeks.
trashcans
We don’t do any connection level rejection here, so any mail sent to me gets delivered somewhere. Anything that looks like malware gets dumped in one folder and never read, anything that scores a ridiculously high spamassassin score gets dumped in another folder and never read, mailing lists get handled specially and everything else gets delivered to Mail.app to deal with. That means that Mail.app sees less of the ridiculously obvious spam and is mostly left to do bayesian filtering, and whatever other magic Apple implemented.
There were about thirty false positives, and they were all B2C bulk advertising mail. I receive a lot of 1:1 mail, transactional mail and B2B marketing mail and there were no false positives at all for any of those.
All the false positives were authenticated with both SPF and DKIM. All of them were for marketing lists I’d signed up for while making a purchase. All of them were “greymail” – mail that I’d agreed to receive, and that was inoffensive but not compelling. While I easily spotted all of them as false positives via the from address and subject, none of them were content I’d particularly missed.
Almost all of the false positives were sent through ESPs I recognized the name of, and about 80% of them were sent through just two ESPs (though that wasn’t immediately obvious, as one of them not only uses random four character domain names, it uses several different ones – stop doing that).
If you’d asked me to name two large, legitimate ESPs from whom I recalled receiving blatant, blatant spam recently, it would be those same two ESPs. Is Mail.app is picking up on my opinions of the mail those ESPs are sending? It’s possible – details specific to a particular ESPs mail composition and delivery pipelines are details that a bayesian learning filter may well recognize as efficient tokens.

Related Posts

AHBL Wildcards the Internet

AHBL (Abusive Host Blocking List) is a DNSBL (Domain Name Service Blacklist) that has been available since 2003 and is used by administrators to crowd-source spam sources, open proxies, and open relays.  By collecting the data into a single list, an email system can check this blacklist to determine if a message should be accepted or rejected. AHBL is managed by The Summit Open Source Development Group and they have decided after 11 years they no longer wish to maintain the blacklist.
A DNSBL works like this, a mail server checks the sender’s IP address of every inbound email against a blacklist and the blacklist responses with either, yes that IP address is on the blacklist or no I did not find that IP address on the list.  If an IP address is found on the list, the email administrator, based on the policies setup on their server, can take a number of actions such as rejecting the message, quarantining the message, or increasing the spam score of the email.
The administrators of AHBL have chosen to list the world as their shutdown strategy. The DNSBL now answers ‘yes’ to every query. The theory behind this strategy is that users of the list will discover that their mail is all being blocked and stop querying the list causing this. In principle, this should work. But in practice it really does not because many people querying lists are not doing it as part of a pass/fail delivery system. Many lists are queried as part of a scoring system.
Maintaining a DNSBL is a lot of work and after years of providing a valuable service, you are thanked with the difficulties with decommissioning the list.  Popular DNSBLs like the AHBL list are used by thousands of administrators and it is a tough task to get them to all stop using the list.  RFC6471 has a number of recommendations such as increasing the delay in how long it takes to respond to a query but this does not stop people from using the list.  You could change the page responding to the site to advise people the list is no longer valid, but unlike when you surf the web and come across a 404 page, a computer does not mind checking the same 404 page over and over.
Many mailservers, particularly those only serving a small number of users, are running spam filters in fire-and-forget mode, unmaintained, unmonitored, and seldom upgraded until the hardware they are running on dies and is replaced. Unless they do proper liveness detection on the blacklists they are using (and they basically never do) they will keep querying a list forever, unless it breaks something so spectacularly that the admin notices it.
So spread the word,

Read More

Confusing the engineers

We went camping last weekend with a bunch of friends. Had a great time relaxing on the banks of the Tuolumne River, eating way too much and visiting.
On Saturday I was wearing a somewhat geeky t-shirt. It said 554: abort mission. (Thank you MessageSystems). At some point on Saturday every engineer came up to me, read my shirt and then looked at me and said “That’s not HTTP.”
That lead to various discussions about how their junior engineers don’t actually know SMTP at all. Why? Because the SMTP libraries just work. Apparently the HTTP libraries aren’t that great, so folks have to learn more about HTTP to troubleshoot and use them.
I’m sure there’s a joke in there somewhere: A Kindle engineer, an Android engineer and a robot engineer walk into a campsite…
EmailFilters_boxes_forblogIt did leave me thinking, though, about how it’s not that easy to run your own mail server these days. Gone are the days when running your own server was cost effective and easy. These days, there is just too much spam coming in. Crafting filters is a skilled job. It’s not that hard to run good filters. But to run good filters takes time to do well.
There are also a lot of challenges to sending mail. One of the discussions I had at the campsite was how hard it was to configure outbound mail. The engineer was helping a friend set up a website and trying to get the website to send notifications to the friend. But without setting up authentication the mail kept silently failing.
Of course, we do run our own mail server. But it’s our job and, in many ways, it keeps us honest. We don’t run many filters meaning we see what spammers are doing and can use our own experiences to better understand what commercial filters are dealing with.
For most people, though, I really think using a service is the right solution. Find one with filters that meet your needs and just pay them to deal with the headache.
 

Read More

June 2015: the Month in Email

Happy July! We are back from another wonderful M3AAWG conference and enjoyed seeing many of you in Dublin. It’s always so great for us to connect with our friends, colleagues, and readers in person. I took a few notes on Michel van Eeten’s keynote on botnets, and congratulated our friend Rodney Joffe on winning the prestigious Mary Litynski Award.
In anti-spam news, June brought announcements of three ISP-initiated CAN-SPAM cases, as well as a significant fine leveled by the Canadian Radio-television and Telecommunications Commission (CRTC) against Porter Airlines. In other legal news, a UK case against Spamhaus has been settled, which continues the precedent we’ve observed that documenting a company’s practice of sending unsolicited email does not constitute libel.
In industry news, AOL started using Sender Score Certification, and Yahoo announced (and then implemented) a change to how they handle their Complaint Feedback Loop (CFL). Anyone have anything to report on how that’s working? We also noted that Google has discontinued the Google Apps for ISPs program, so we expect we might see some migration challenges along the way. I wrote a bit about some trends I’m seeing in how email programs are starting to use filtering technologies for email organization as well as fighting spam.
Steve, Josh and I all contributed some “best practices” posts this month on both technical issues and program management issues. Steve reminded us that what might seem like a universal celebration might not be a happy time for everyone, and marketers should consider more thoughtful strategies to respect that. I wrote a bit about privacy protection (and pointed to Al Iverson’s post on the topic), and Josh wrote about when senders should include a physical address, what PTR (or Reverse DNS) records are and how to use them, testing your opt-out process (do it regularly!), and advice on how to use images when many recipients view email with images blocked.

Read More