The anatomy of From:

Compared with some of the more complex pieces of the email protocol the From: header seems deceptively simple. But I’ve heard several people be confused about what it’s made up of over the past couple of months, so I thought I’d dig a bit deeper into how it’s defined and how it’s used in practice.
Here’s a simple example:
 
anatomyfrom
 
There are two interesting parts.
The first is what’s technically called the display-name, but more commonly known as the “friendly from” in the bulk email industry. It has no meaning within the email protocol, it’s just text that’s displayed to the recipient to describe who an email was sent by. Because it’s just text, you can put anything you like in there, but it’s usually either the name of the person who wrote the mail or the name of the company or brand that sent it.
The second is the actual email address, the thing with an at-sign in it. Surprisingly, this isn’t used at all during the actual delivery of the email; there’s a hidden field (called the return path or the 5321.MailFrom or the envelope sender  or the bounce address) that’s used instead. For person-to-person email it’s usually the same address, but for bulk mail it’s often different.
So what does the actual email address, the 5322.From, mean? For that we go to the document that specifies what email headers mean – RFC 5322, “Internet Message Format”. (RFC 5322 is the updated replacement of the older RFC 822 – and that’s why the actual email address is often called the 822.From or 5322.From when people are being precise about exactly which email address they’re talking about).
RFC 5322 says “The From: field specifies the author of the message, that is, the mailbox of the person or system responsible for the writing of the message.” and “In all cases, the From: field SHOULD NOT contain any mailbox that does not belong to the author of the message”. It’s the email address of the author of the message.
(In some cases the email may have been written by the author, but then sent on their behalf by someone else. RFC 5322 says that in that situation the email address in the From field is still the author of the message. The person who sent the message gets their own field, “Sender:”).
What is the 5322.From used for? During the delivery process it’s used for some sorts of filtering and authentication. In particular, if you’re reading about DMARC you’ll see “identifier alignment” mentioned a lot – which basically means “the only domain we care about authenticating is the one in the 5322.From”. It’s also the usual field that’s used in user-visible mail filtering such as whitelisting email addresses that are in the users address book.
In the mail client itself the most obvious use of the 5322.From is that when you hit reply, that’s the email address your reply will go to by default. The author of the mail can override that by adding a Reply-To field, containing one or more email addresses if they want different behaviour. It’s also commonly used to filter email and to group mails by author.
What’s displayed to the end user? Originally the entire content of the From: header was shown in the recipients mailbox but it’s now fairly common to display just the friendly from, with no mention of the email address at all. That started in mobile clients, where space is at a premium and the friendly from is just, well, friendlier – but it’s spread to desktop and webmail clients too. In Yahoo webmail the 5322.From isn’t displayed anywhere at all unless you find the View Full Header menu option and dig through the raw headers, and my phone doesn’t display it anywhere obvious and only recently made it possible to see it at all.

Related Posts

Ad-hoc analysis

I often pull emails into a database to analyze them, but sometimes I want something simpler. Emails are typically stored in one of two ways: mbox format, where an entire mailbox is stored in a single file, and maildir format, where a mailbox is a directory with one file in it for each email.
My desktop mail application is Mail.app on OS X, and it stores messages in a maildir-ish format, so I’m going to work with that here. If you’re using mbox format mailboxes it’s a little trickier (but you can use a tool called formmail to split an mbox style format into a maildir directory and go from there).
I want to gather some statistics on mail I’ve sent to abuse desks, so the first thing I do is open up a terminal window and change directory to where my “Sent Messages” mailbox is:
cd Library/Mail/V2/IMAP-steve@misc.wordtothewise.com/Sent Messages.mbox
(Tab completion is really useful for navigating through the mailbox hierarchy.)
Then I need to go through every email (file) in that directory, for each file find the “To:” header and check to see if it was sent to an abuse desk. If it was sent to an abuse desk I want to find the email address for each one, count how many times I see that email address and find the top twenty or so abuse desks I send reports to. I can do all that with a single command line:
find . -type f -exec egrep -m1 '^To:' {} ; | egrep -o 'abuse@[a-zA-Z0-9._-]+' | sort | uniq -c | sort -nr | head -20
(Enter that all as a single line, even though it’s wrapped into two here).
That’s a bit much to understand all at once, so lets redo that in several stages, with an intermediate file so we can see what’s going on.
find . -type f -exec egrep -m1 '^To:' {} ; >tolines.txt
The find command finds all the files in a directory and does something with them. In this case we start looking in the current directory (“.”), look just for files (“-type f”) and for each file we find we run that file through another command (“-exec egrep -m1 ‘^To:’ {} ;”) and write the result of that command to a file (“>tolines.txt”). The egrep command we run for each file goes through the file and prints out the first (“-m1”) line it finds that begins with “To:” (“‘^To:'”). If you run that and take a look at the file it creates you can see one line for each message, containing the “To:” header (or at least the first line of it).
The next thing to do is to go through that and pull out just the email addresses – and just the ones that are sent to abuse desks:
egrep -o 'abuse@[a-zA-Z0-9._-]+' tolines.txt
This uses egrep a second time, this time to look for lines that look like an email address (“‘abuse@[a-zA-Z0-9._-]+'”) and when it finds one print out just the part of the line that matched the pattern (“-o”).
Running that gives us one line of output for each email we’re interested in, containing the address it was sent to. Next we want to count how many times we see each one. There’s a command line idiom for that:
egrep -o 'abuse@[a-zA-Z0-9._-]+' tolines.txt | sort | uniq -c
This takes all the lines and sorts (“sort”, reasonably enough) them – so that identical lines will be next to each other – then counts runs of identical lines (“uniq -c”). We’re nearly there – the result of this is a count and an email address on each line. We just need to find the top 20:
egrep -o 'abuse@[a-zA-Z0-9._-]+' tolines.txt | sort | uniq -c | sort -nr | head -20
Each line begins with the count, so we can use sort again, this time telling it to sort by number, high to low (“sort -nr”). Finally, “head -20” will print just the first 20 lines of the result.
The final result is this:

Read More

More on the attack against Spamhaus and how you can help

While much of the attack against Spamhaus has been mitigated and their services and websites are currently up, the attack is still ongoing.  This is the biggest denial of service attack in history, with as much as 300 gigabits per second hitting Spamhaus servers and their upstream links.
This traffic is so massive, that it’s actually affecting the Internet and web surfers in some parts of the world are seeing network slowdown because of this.
While I know that some of you may be cheering at the idea that Spamhaus is “paying” for their actions, this does not put you on the side of the good. Spamhaus’ actions are legal. The actions of the attackers are clearly illegal. Not only is the attack itself illegal, but many of the sites hosted by the purported source of the attacks provide criminal services.
By cheering for and supporting the attackers, you are supporting criminals.
Anyone who thinks that an appropriate response to a Spamhaus listing is an attack on the very structure of the Internet is one of the bad guys.
You can help, though. This attack is due to open DNS resolvers which are reflecting and amplifying traffic from the attackers. Talk to your IT group. Make sure your resolvers aren’t open and if they are, get them closed. The Open Resolver Project published its list of open resolvers in an effort to shut them down.
Here are some resources for the technical folks.
Open Resolver Project
Closing your resolver by Team Cymru
BCP 38 from the IETF
Ratelimiting DNS
News Articles (some linked above, some coming out after I posted this)
NY Times
BBC News
Cloudflare update
Spamhaus dDOS grows to Internet Threatening Size
Cyber-attack on Spamhaus slows down the internet
Cyberattack on anti-spam group Spamhaus has ripple effects
Biggest DDoS Attack Ever Hits Internet
Spamhaus accuses Cyberbunker of massive cyberattack

Read More

The Physics of the Email Universe

We talk a lot about rules and best practices in email, but we’re mostly talking about “squishy” rules-of-thumb that are based on simplified models of how mail systems, spam filters, recipients, postmasters and blacklist operators behave. They’re the biology, ecology and sociology of the email ecosystem.
There’s another set of rules we tend to only mention in passing, if at all, though. They’re the steely, sharp-edged laws that control the email universe. They’re the RFCs that define how email works and make sure that mail systems written by hundreds of different people across the globe all work and all interoperate with each other.
Building a message from Zeros and Ones
RFC 5322 – Internet Message Format
This tells you everything you need to know about crafting a simple email, with a subject line, a sender, some recipients and a simple plain-text message. It’s also the foundation of all fancier emails. If you’re creating emails, this is where to start.
A little more than plain ASCII
RFC 2047 – MIME Part 3: Message Header Extensions for Non-ASCII Text
RFC 2047 is one small part of the MIME (Multipurpose Internet Mail Extensions) suite of protocols that allow you to include pictures and attachments and prettily formatted text and comic sans in your email. This part defines how you can put things other than the plainest of plain text in your subject lines or in the “friendly from” of your message. It’s what allows you to put Hiragana, or Cyrillic, or umlauts, or cedillas, or properly matched double quotes in your subject line. It also let’s you put hearts or smiley faces or other little pictograms there – but nothing this useful is going to be perfect.
RFC 2045 – MIME Part 1: Format of Internet Message Bodies
This shows how to send an image, or a plain text mail in a different character set, or an HTML mail. It doesn’t tell you how to send plain text and HTML, or to send HTML with embedded images, or a message with an attached document. For that you need…
Finally, Modern Email
RFC 2046 – MIME Part 2: Media Types
This builds on RFC 2045 to allow you to have many different chunks in a message – this is what you need if you want to send “proper” HTML mail with a plain text alternative, or if you want embedded images or attachments.
Getting From A To B
RFC 5321 – Simple Mail Transfer Protocol
A message isn’t much use unless you send it somewhere. RFC 5321 explains the mysteries of actually sending that message over the wire to the recipient. If you need to know about the different phases of a message delivery, what “4xx” and “5xx” actually mean, why there’s not really any such thing as a hard or soft bounce defined, just temporary or permanent failures, or anything else about actually sending mail or diagnosing mail delivery, this is your starting point.
The Rest Of The Iceberg
I’ve only touched on the very smallest tip of the email iceberg here. There’s much, much more – both in RFCs and ad-hoc non-RFC standards. If you’re interested in more, this is a decent place to start.

Read More