Horses, not zebras

I was first introduced to the maxim “When you hear hoofbeats, think horses not zebras” when I worked in my first molecular biology lab 20-some-odd years ago. I’m no longer a gene jockey, but I still find myself applying this to troubleshooting delivery problems for clients.
It’s not that I think all delivery problems are caused by “horses”, or that “zebras” never cause problems for email delivery. It’s more that there are some very common causes of delivery problems and it’s a more effective use of time to address those common problems before getting into the less common cases.
This was actually something that one of the mailbox provider reps said at M3AAWG in SF last month. They have no problem with personal escalations when there’s something unusual going on. But, the majority of issues can be handled through the standard channels.
What are the horses I look for with delivery problems.

  1. Technical issues. These are actually getting rarer as companies move to the designed-for-bulk MTAs like MessageSystems and Port25. But they’re still worth checking and can contribute to delivery problems. Luckily, technical issues are often the easiest to solve, and once they’re solved usually stay solved.
  2. Content issues. These are getting much more common as ISPs start looking at all the URLs and links in emails, including the landing pages.  Gmail, for instance, does almost all their filtering based on content rather than originating IP. This is not that difficult to solve, but can be harder to solve than technical issues.
  3. Address collection issues. Most delivery problems start at the point of address collection, and these can be challenging problems to solve. Not only do you have to solve the problem moving forward, you also have to decide what to do with the addresses you’ve previously collected. And the problems are only solved as long as someone knows why things are done the way they are.

 

Related Posts

Weird mail problems today? Clear your DNS cache!

A number of sources are reporting this morning that there was a problem with some domains in the .com zone yesterday. These problems caused the DNS records of these domains to become corrupted. The records are now fixed. Some of the domains, however, had long TTLs. If a recursive resolver pulled the corrupted records, it could take up to 2 days for the new records to naturally age out.
Folks can fix this by flushing their DNS cache, thus forcing the recursive resolver to pull the uncorrupted records.
EDIT: Cisco has published some more information about the problem. ‘Hijacking’ of DNS Records from Network Solutions

Read More

What is an email address? (part two)

Yesterday I talked about the technical definitions of an email address. Eventually on Monday I’m going to talk about some useful day-to-day rules about email address acquisition and analysis, but first I’m going to take a detour into tagging or mailboxing email addresses.
Tagging an email address is something the owner of an email address can do to make it easier to handle incoming email. It works by adding an extra word to the local part of the email address separated by a special character, such as “+”, “=” or “-“. So, if my email address is steve@example.com, and I’m signing up for the MAAWG mailing lists I can sign up with the email address steve+maawg@example.com. When mail is sent to steve+maawg@example.com it will be delivered to my steve@example.com mailbox, but I’ll know that it’s mail from MAAWG. I can use that tag to whitelist that mail, to filter it to it’s own mailbox and a bunch of other useful things.
In some ways this is similar to recent disposable email address services, but rather than being a third party service it’s something that’s been built in to many mailservers for well over a decade. It doesn’t require me to create each new address at a web page, instead I can make tags up on the fly. And it works at my regular mail domain.
If you’re an ESP it can be interesting to look for tagged addresses in uploaded lists. If it’s a list owned by Kraft and you see the email address steve+gevalia@example.com in the list, that’s a strong sign that that email address at least was really volunteered to the list owner. If you see the email address steve+microsoft@example.com then it’s a strong sign that it wasn’t, and you might want to look harder at where the list came from.
One reason that this is relevant to email address capture is that tagged addresses are something that you should expect people, especially more sophisticated users of email, to use to sign up to mailing lists and that they’re something you don’t want to discourage. Yet many web signup forms forbid entering email addresses with a “+” or, worse, have bugs in them that map a “+” sign in the email address to a space – leading to the signup failing at best, or the wrong email address being added to the list at worst. This really annoys people who use tagged addresses to help manage their email, and they’re often exactly the sort of tech-savvy people who make a lot of online purchases you want to have on your lists.
More on Monday.

Read More

Ad-hoc analysis

I often pull emails into a database to analyze them, but sometimes I want something simpler. Emails are typically stored in one of two ways: mbox format, where an entire mailbox is stored in a single file, and maildir format, where a mailbox is a directory with one file in it for each email.
My desktop mail application is Mail.app on OS X, and it stores messages in a maildir-ish format, so I’m going to work with that here. If you’re using mbox format mailboxes it’s a little trickier (but you can use a tool called formmail to split an mbox style format into a maildir directory and go from there).
I want to gather some statistics on mail I’ve sent to abuse desks, so the first thing I do is open up a terminal window and change directory to where my “Sent Messages” mailbox is:
cd Library/Mail/V2/IMAP-steve@misc.wordtothewise.com/Sent Messages.mbox
(Tab completion is really useful for navigating through the mailbox hierarchy.)
Then I need to go through every email (file) in that directory, for each file find the “To:” header and check to see if it was sent to an abuse desk. If it was sent to an abuse desk I want to find the email address for each one, count how many times I see that email address and find the top twenty or so abuse desks I send reports to. I can do all that with a single command line:
find . -type f -exec egrep -m1 '^To:' {} ; | egrep -o 'abuse@[a-zA-Z0-9._-]+' | sort | uniq -c | sort -nr | head -20
(Enter that all as a single line, even though it’s wrapped into two here).
That’s a bit much to understand all at once, so lets redo that in several stages, with an intermediate file so we can see what’s going on.
find . -type f -exec egrep -m1 '^To:' {} ; >tolines.txt
The find command finds all the files in a directory and does something with them. In this case we start looking in the current directory (“.”), look just for files (“-type f”) and for each file we find we run that file through another command (“-exec egrep -m1 ‘^To:’ {} ;”) and write the result of that command to a file (“>tolines.txt”). The egrep command we run for each file goes through the file and prints out the first (“-m1”) line it finds that begins with “To:” (“‘^To:'”). If you run that and take a look at the file it creates you can see one line for each message, containing the “To:” header (or at least the first line of it).
The next thing to do is to go through that and pull out just the email addresses – and just the ones that are sent to abuse desks:
egrep -o 'abuse@[a-zA-Z0-9._-]+' tolines.txt
This uses egrep a second time, this time to look for lines that look like an email address (“‘abuse@[a-zA-Z0-9._-]+'”) and when it finds one print out just the part of the line that matched the pattern (“-o”).
Running that gives us one line of output for each email we’re interested in, containing the address it was sent to. Next we want to count how many times we see each one. There’s a command line idiom for that:
egrep -o 'abuse@[a-zA-Z0-9._-]+' tolines.txt | sort | uniq -c
This takes all the lines and sorts (“sort”, reasonably enough) them – so that identical lines will be next to each other – then counts runs of identical lines (“uniq -c”). We’re nearly there – the result of this is a count and an email address on each line. We just need to find the top 20:
egrep -o 'abuse@[a-zA-Z0-9._-]+' tolines.txt | sort | uniq -c | sort -nr | head -20
Each line begins with the count, so we can use sort again, this time telling it to sort by number, high to low (“sort -nr”). Finally, “head -20” will print just the first 20 lines of the result.
The final result is this:

Read More