“Friendly From” addresses

When we’re looking at the technical details of email addresses there are two quite different contexts we talk about.

One is an “821 address” or “5321 address”. This is the email address as it’s used by the SMTP protocol, as part of the “MAIL FROM: <>” or “RCPT TO: <>” commands sent to the mailserver. It’s defined in RFC 821, now updated by RFC 5321, hence the name. If someone mentions the “envelope” or they’re talking about “bounce addresses”, this is the sort they mean. We’re not talking about them in this post.

The other is an “822 address” or a “5322 address’. They’re the ones the recipient sees in the To: or From: headers. They’re named after their RFC, RFC 5322. This is the sort of email address most folks mean by default, unless they’re explicitly talking about the envelope of an email, but if someone describes an email as “visible” or “friendly” it definitely means this flavour.

Back in 2014 (!) I talked about some of the obsolete-back-then formats of email address, but it’s 2023 now so I’m only going to discuss the “modern” (post-1982!) format.

That looks like this:

From: Steve Atkins <steve@wordtothewise.com>

Or like this:

From: "Steve Atkins" <steve@wordtothewise.com>

The email address itself is on the right, surrounded by angle brackets. It’s … an email address, user part, at sign, domain part. We all know how to use those. Boring.

The bit before that is the “display name”, which is the human readable text that’s displayed in the mail client. Some mail clients, especially on mobile, will show only the display name and not the email address itself.

This is often called the “friendly from” as marketers love being able to put a friendly face on their brand via the From: header, but you use exactly the same syntax to put your recipients real name in the To: or Cc: headers (and isn’t getting your recipient’s name right at least as important as getting your own right?)

The display name looks like simple human-readable text and that’s usually how it’s used. But the required syntax for it is anything but simple.

Ready? This isn’t going to be pretty.

Syntax

The display-name is a phrase that consists of one or more words separated by whitespace, where a word is either an atom or a quoted-string.

An atom consists of one or more ASCII characters that are letters of the alphabet, decimal digits or any of these characters: !, #, $, %, &, ', *, +, -, /, =, ?, ^ , {, |, } or ~.

That means that an atom can’t contain any of these characters: (, ), <, >, [, ], :, ;, @, \, double-quote, comma or period. Amongst other things that means that an email address can never be an atom, as it contains an @ sign (and at least one period), so you can never use an email address in an unquoted display-name.

Your other option for the display-name is a quoted-string. That’s much simpler, fortunately.

a quoted-string is a double quote, followed by zero or more ascii characters except double-quote or backslash, or the pairs of characters \\ or \", followed by another double quote. So you can put any ascii characters you like in a quoted string, you just have to escape double quote or backslash with a backslash.

That’s the simplified version that skips over obsolete constructs and folding whitespace. You don’t need to care about that unless you’re writing code to handle this sort of stuff, and if you are you have my sympathy and a link to RFC 5322.

But, Steve, you say “What about our non-ascii friends?”.

We handle non-ascii display names by encoding them as strings of ascii gobbledygook.

RFC 2047 gives the details and the syntax to do that, but the end result is that if we want to write “ríomhphost” in a display-name we can write it as =UTF-8?Q?r=C3=ADomhphost?=. This is called an encoded-word in the spec, and can only replace an atom, it can’t be used as part of a quoted-string. You don’t have to use UTF-8 encoding, you could use any of the other valid character sets, but try and use UTF-8 if you can. It’ll make everyone happier.

Which means?

If you’re generating friendly display names for the To: field as simple unquoted strings you’re probably doing it wrong. At some point your users are going to give you a name with one of the forbidden characters in it, you’re going to put them unchanged in your From: header, and you’re going to end up with a syntactically invalid email.

That invalid syntax may do a few different things. It might cause the mail, or at least the From and To headers, to be rendered oddly in some mail clients, and cause them to do the wrong thing when someone tries to reply or forward the mail (unquoted commas are particularly likely to break things). An intermediate mailserver might try and fix up the syntax error, likely invalidating any DKIM signature you had. And recipient mailbox providers may treat it as spam or outright reject it – increasingly likely as they tighten up protections against DKIM and DMARC bypasses.

So if you’re looking at the raw version of your email and the display name in the To: or From: field doesn’t start with “=?” and isn’t double-quoted you’re treading very dangerously.

Correct:

From: "My Brand" <brand@example.com>

From: <brand@example.com>

To: "Steve Atkins, Esq." <steve@wordtothewise.com>

To: =?UTF-8?Q?C=C3=BA Chulainn?= <guarding@yourhouse.com>

From: =?UTF-8?Q?My=20Brand?= <brand@example.com>

From: My =?UTF-8?Q=Brand?= <brand@example.com>

Correct, but suggest you could be doing something risky:

From: My Brand <brand@example.com>

From: brand@example.com (My Brand)

Syntactically correct, but definitely not what you wanted:

To: "=?UTF-8?Q?C=C3=BA Chulainn?=" <guarding@yourhouse.com>

Incorrect:

From: June Specials: <brand@example.com>

From: brand@example.com <brand@example.com>

To: steve.atkins <steve@example.com>

To: "Cú Chulainn" <guarding@yourhouse.com>

Related Posts

Changes at Yahoo

Deliverability.com has a blog post from Naeem Kayani at Adknowledge about the recent Yahoo changes. They point to the reputation of the From: address as a factor. I’m not sure anyone knows what exactly Yahoo is doing, but the suggestions from Naeem are good ones.

Read More

Which is better UTF-8 or ISO-?

Someone asked today on a mailing list whether they should be using UTF-8 or “ISO” encoding for sending email. What’s the best choice depends on some of the details of the situation, but here’s the answer I gave:
UTF-8 will work for pretty much anything, as it’s just an 8 bit encoding scheme for Unicode (which is supposed to be the one character encoding to rule them all). It’s well supported in most languages and development environments – Windows has been native UTF-16 under the covers since the mid 90s, for instance – and typical messages that use mainstream glyphs should render well from utf-8 in most western MUAs and browsers.
There are still a very few old or broken clients out there that will not handle UTF-8 well but (outside the asian language market, where there’s still some non-ASCII, non-Unicode legacy usage) they’re typically ones that don’t really handle any character set encoding well and the only thing safe to send to them is either plain ASCII or whichever ASCII superset their OS happens to support natively (which is probably an argument for sending Windows-1252 codepage, but not a terribly strong one).
The various extended ASCIIs (such as ISO-8859-*) will only work for messages that are written solely using characters from that character set. If you have even one character in a message that cannot be expressed in ISO-8859-1, then you can’t use ISO-8859-1 to send that message.
ISO-8859-1 (aka Latin1) is fairly sloppy in some respects – it has no apostrophe, nor single quotes, for instance – but it can handle an awful lot of languages, from Kurdish to Swahili. It can’t handle Dutch, Estonian, Finnish, Hungarian and Welsh particularly well, nor can it show the Euro symbol (ISO-8859-14 or -15 are needed for some characters there).
A common problem is that many people (and the software they write) think that Windows uses Latin1. It doesn’t, it uses Windows-1252. If you accept messages written on Windows, using the Windows-1252 code page, and throw them out on the wire as ISO-8859-1 what you end up with is not quite right. It mostly works, as the two codepages overlap quite a bit, but they have different glyphs in the 0x80-0x9f range. So if you use single or double quotes (“smart quotes”), or the Euro symbol, or ellipses, or bullet, or the trademark symbol in your message they’ll be garbled. This is so common that some mail clients and web browsers will actually treat a document that claims to be ISO-8859-1 as Windows-1252, but that’s a bug workaround and not something it’s really safe to rely on.
If you’re doing personalized messages, and you’re sending one of them to Győző and one of them to Eiður then you may have to use different character sets for the two messages. If you’re talking about Győző and personalizing it for Eiður then you might find things break horribly.
Someone probably has some concrete data on mail client character set support, broken down by region and language, but my understanding is that this is a reasonable approach:

Read More

Sending mail from unread email addresses

Some marketers, even large marketing companies, send mail from email addresses that are unread. Justin Premick posted a list of reasons this is a very, very bad idea. Be sure to read the comments, too.

Read More