The Physics of the Email Universe

We talk a lot about rules and best practices in email, but we’re mostly talking about “squishy” rules-of-thumb that are based on simplified models of how mail systems, spam filters, recipients, postmasters and blacklist operators behave. They’re the biology, ecology and sociology of the email ecosystem.
There’s another set of rules we tend to only mention in passing, if at all, though. They’re the steely, sharp-edged laws that control the email universe. They’re the RFCs that define how email works and make sure that mail systems written by hundreds of different people across the globe all work and all interoperate with each other.
Building a message from Zeros and Ones
RFC 5322 – Internet Message Format
This tells you everything you need to know about crafting a simple email, with a subject line, a sender, some recipients and a simple plain-text message. It’s also the foundation of all fancier emails. If you’re creating emails, this is where to start.
A little more than plain ASCII
RFC 2047 – MIME Part 3: Message Header Extensions for Non-ASCII Text
RFC 2047 is one small part of the MIME (Multipurpose Internet Mail Extensions) suite of protocols that allow you to include pictures and attachments and prettily formatted text and comic sans in your email. This part defines how you can put things other than the plainest of plain text in your subject lines or in the “friendly from” of your message. It’s what allows you to put Hiragana, or Cyrillic, or umlauts, or cedillas, or properly matched double quotes in your subject line. It also let’s you put hearts or smiley faces or other little pictograms there – but nothing this useful is going to be perfect.
RFC 2045 – MIME Part 1: Format of Internet Message Bodies
This shows how to send an image, or a plain text mail in a different character set, or an HTML mail. It doesn’t tell you how to send plain text and HTML, or to send HTML with embedded images, or a message with an attached document. For that you need…
Finally, Modern Email
RFC 2046 – MIME Part 2: Media Types
This builds on RFC 2045 to allow you to have many different chunks in a message – this is what you need if you want to send “proper” HTML mail with a plain text alternative, or if you want embedded images or attachments.
Getting From A To B
RFC 5321 – Simple Mail Transfer Protocol
A message isn’t much use unless you send it somewhere. RFC 5321 explains the mysteries of actually sending that message over the wire to the recipient. If you need to know about the different phases of a message delivery, what “4xx” and “5xx” actually mean, why there’s not really any such thing as a hard or soft bounce defined, just temporary or permanent failures, or anything else about actually sending mail or diagnosing mail delivery, this is your starting point.
The Rest Of The Iceberg
I’ve only touched on the very smallest tip of the email iceberg here. There’s much, much more – both in RFCs and ad-hoc non-RFC standards. If you’re interested in more, this is a decent place to start.

Related Posts

Email filters

What makes the best email filter? There isn’t really a single answer to that question. Different people and different organizations have different tolerances for how false positives versus false negatives. For instance, we’re quite sensitive to false positives here, so we run extremely conservative filtering and don’t block very much at the MTA level. Other people I know are very sensitive to false negatives and run more aggressive filtering and block quite a bit of mail at the MTA level.
For the major ISPs, the people who plan, approve, design and monitor the filters usually want to maximize customer happiness. They want to deliver as much real mail as possible while blocking as much bad mail. Blocking real mail and letting through bad mail both result in unhappy customers and increase the ISP’s costs, either through customer churn or through support calls. And this is a process, filters are not static. ISPs roll out new filters all the time, sometimes they are an improvement and sometimes they’re not. When they’re not, they’re pulled out of production. This works both for positive filters like Return Path and negative filters like blocklists.
Then there is mail filtering that doesn’t have to do with spam. Business filters, for instance, often block non-business mail. Permission of the recipient often isn’t even a factor. Companies don’t often go out of their way to block personal mail, but if personal mail gets blocked (say the vacation plane ticket or the amazon receipt) they don’t often unblock it. But when you think about why a business provides email, it makes perfect sense. The business provides email to further its own business goals. Some personal usage is usually OK, but if someone notices and blocks personal email then it’s unlikely the business will unblock it, even if the employee opted in.
In the case of email filters, the free market does work. Different ISPs filter mail differently. Some people love Gmail’s filters. Other people think Hotmail has the best filtering. There are different standards for filtering, and that makes email stronger and more robust. Consumers have choices in their mail provider and spamfiltering.

Read More

Who leaked my address, and when?

Providing tagged email addresses to vendors is fascinating, and at the same time disturbing. It lets me track what a particular email address is used for, but also to see where and when they’ve leaked to spammers.
I’d really like to know who leaked an email address, and when.
All my inbound mail is sorted into “spam” and “not-spam” by a combination of SpamAssassin, some static sieve rules and a learning spam filter in my mail client. That makes it fairly easy for me to look at my “recent spam”. That’s a huge amount of data, though, something like 40,000 pieces of spam a month.
Finding the needle of interesting data in that haystack is going to take some automation. As I’ve mentioned before you can do quite a lot of useful work with a mix of some little perl scripts and some commandline tools.
I’m interested in the first time a tagged address started receiving spam, so I start off with a perl script that will take a directory full of emails, one per file, find the ones that were sent to a tagged address and print out that address and the time I received the email. I can’t rely on the Date: header, as that’s under the control of the spammer, and often bogus. But I can rely on the timestamp my server adds when it receives the email – and it records that in the first Received: header in the message.

Read More

Email is store and forward

Many of us are so used to email appearing instantaneous, we forget that the underlying protocol was never designed for instant messaging. When the SMTP protocol was originally proposed it was designed to support servers that may have had intermittent connectivity. The protocol allowed for email to be spooled to disk and then sent when resources were available. In fact, almost everyone who was around more than 10 years ago knows of a case where an email took weeks, months or even years to deliver.
These days we’re spoiled. We expect the email we send to friends and relatives to show up in their mailbox within moments of sending it. We expect that sales receipt or e-ticket to show up in our mailbox within instants of a purchase. We expect that our ISPs will get us email immediately, if not sooner.
But there are a lot of things that can slow down email delivery. At several points in the process an email may be spooled to disk. It stays on the spool until the next part of the delivery process can happen. Other points of slowdown include the various anti-spam, anti-virus and anti-phishing protections that ISPs must implement. Then add in the extreme volume of email (around 10 billion messages a day) and all of a sudden email delivery is slower than many senders and recipients expect it to be. This delay is not ideal, but the system is designed so that mail is not silently discarded.
While individual emails may be delayed, most users will rarely see that delay in the email that they send. Bulk senders, who may be sending thousands or hundreds of thousands of emails a day, may see more delays in a single send than the average user sees in years of sending one-to-one email.
Email is store and forward, not instant. Sometimes that means there is a delay in getting email into the recipients inbox. And, sometimes there isn’t anything anyone can do to speed up delivery, except to adjust expectations of how email works.

Read More