Idn
Some email related news
A couple links to relevant things that are happening in email.
M3AAWG released the Help! I’m on a Blocklist! (PDF link) doc this week. This is the result of 4 years worth of work by a whole lot of people at M3AAWG. I was a part of the working group (“doc champion” in M3AAWG parlance) and want to thank everyone who was involved and contributed to the process. I am very excited this was approved and published so people can take advantage of the collective wisdom of M3AAWG participants.
In other announcements, Gmail announced today on their Google+ page that that they were putting a new “unsubscribe” link next to the sender name when mail is delivered to the Promotions, Social or Forums tab. This appears to be the official announcement of the functionality they announced at the SF M3AAWG last February. It likely means that all users are currently getting the “unsubscribe” link. What Gmail doesn’t mention in that blog post is that this functionality uses the “List-Unsubscribe” header, not the link in the email, but I don’t think anyone except bulk mailers really care about how it’s being done, just that it is.
Also today Gmail announced they were going to recognize usernames with non-Latin or accented characters in the name. Eventually, they claim, they’ll also allow people to get Gmail addresses with accented characters.
Internationalisation (part 2)
In part 1 I talked about internationalised domain names, and how they were mapped onto ASCII strings.
For sending email there are four bits of the message where internationalisation might need to be considered.
Internationalisation (part 1)
There’s been a gentle bit of uproar recently about ICANN finally beginning the process of rolling out support for internationalized domain names (IDN) at the DNS root and the effect that may have on email senders. Even if you haven’t noticed the uproar, it’s still a subject you probably want to be familiar with if you’re sending email.
What are internationalised domain names?
An internationalised domain name is simply a domain name that uses non-ascii characters – most anything other than a-z, 0-9 and ‘-‘ – such as those used in these URLs: http://пример.испытание/ or http://例子.測試/ (If those links are unreadable or don’t work, it means that your browser isn’t handling IDN well or doesn’t have the appropriate fonts installed yet).
They’re an obvious thing to want, especially if you’re from anywhere other than an anglophone country, but the Internet was originally built as an ascii-only network, and under the covers it still is entirely ascii-only, so layering non-ascii characters on top has taken a lot of work and time to roll out. IDN development dates back to at least 1996 and it has been supported by some top level domains since 2003. So the recent announcement to support non-ascii top level domains is just the latest step in a long and careful process.
Almost all of the underlying internet protocols are still ASCII based though, including DNS and SMTP, so a lot of the internationalisation work involves mapping non-ASCII words onto ASCII strings before they’re passed to the network, and mapping them back again before they’re displayed to the user. This is done in a fairly ad-hoc way, different in different protocols.
If you were to visit the cyrillic URL I mentioned above then the first thing your web browser would do would be to take the cyrillic string “пример.испытание” and translate it to the ASCII hostname “xn--e1afmkfd.xn--80akhbyknj4f” then look that up in the DNS to find the server handling that URL.
If you were to display that on a webpage or in an HTML email it might be converted to ASCII as”http://приме р.и с п ы тание/”.
If you were to send it as part of a plain text email, encoded as UTF8/quoted-printable, it would look like “http://%D0%BF%D1%80%D0%B8%D0%BC%D0%B5%D1%80.%D0%B8%D1%81%D0%BF%D1%8B%D1%82%D0%B0%D0%BD%D0%B8%D0%B5/”. If there’s a lot of non-ASCII characters in the message then it’s more likely to be encoded as UTF8/base64: “aHR0cDovL9C/0YDQuNC80LXRgC7QuNGB0L/Ri9GC0LDQvdC40LUvCg==”.
And all of those will (or at least should) be displayed to the end user identically.
Confused yet? That’s fine. Internationalisation on the Internet is a very complex and inconsistent subject. In my next post I’ll try and narrow down which bits of it you need to worry about when it comes to sending email and to not upsetting phishing or spam filters at the recipients ISP.