The trouble with CNAMEs

When you query DNS for something you ask your local DNS recursive resolver for all answers it has about a hostname of a certain type. If you’re going to a website your browser asks your resolver for all records for “google.com” of type “A”1or “AAAA”, but that’s not important right now and it will either return all the A records for google.com it has cached, or it will do the complex process of looking up the results from the authoritative servers, cache them for as long as the TTL field for the reply says it should, then return them to you.

There are dozens of different types of records, AAAA for IPv6 IP addresses, MX for mailservers, TXT for arbitrary text, mostly used for various sorts of authentication (including SPF, DKIM and DMARC). And then there’s CNAME.

CNAME stands for “Canonical Name” and means “Go and ask this different question instead”. If you have a DNS record that looks like “www.example.com CNAME example.net” then any time you ask your DNS resolver for records of any type for www.example.com it will see that there’s a CNAME record and do a query of the same type for example.net instead. So queries for “www.example.com A” will return whatever the answer for “example.net A” is, queries for “www.example.com MX” will return the same thing as “example.net MX”.

For a long time the main use you saw for CNAMEs was making “www.” hostnames work for webhosting, with “www.example.com CNAME example.com” records so that the www version of your website resolved to the same IP address as the non-www version.

One important thing about CNAMEs is that you should never have both CNAME records and any other sort of record for the same hostname. It breaks things, and now that we rely on DNS for more and more complex configuration and authentication it can break things in complex, inconsistent and hard to diagnose ways.

The concrete example of this today was diagnosing why SPF was failing, despite DNS apparently being set up correctly.

Two return paths – email1.example.com and email2.example.com. Both of them for use at same ESP, one that uses CNAMEs to make user onboarding easy.

email1.example.com 3600 CNAME esp.com
email2.example.com 3600 CNAME esp.com
esp.com             300 TXT "v=spf1 exists:%{i}._spf.esp.com"

Identical DNS configured for both hostnames. Doing a dig from the command line gave the correct SPF record for both hostnames. And yet email2 randomly failed SPF, while email1 always passed SPF, while they were both being sent from the same IP address. That … shouldn’t happen.

My first thought was that there was some misconfiguration at esp.com such that it wasn’t handling email2 properly. But the only macro in that SPF record is “%{i}”, the IP address. So the ESP doesn’t know anything other than the sending IP address when answering that query, so it can’t give different answers for different hostnames2%{h} is the SPF macro for that, if you do need that.

After poking at the eight authoritative nameservers for the example.com zone, and being sidetracked by some other misconfigurations in their DNS, I found the answer. And, despite causing such weird symptoms, it was surprisingly simple.

Someone had added a google-site-verification TXT record for email2.example.com. That breaks the rule that you should never have a CNAME and any other DNS record for the same hostname. The failure works like this:

If I ask my DNS resolver for the SPF TXT record for email2 – “email2.example.com TXT” – and it doesn’t have it cached, then it will go ask one of the authoritative servers – ns04.example.com, say – for “email2.example.com TXT”. ns04 is being asked for a TXT record, and it has a matching TXT record, so it ignores the CNAME and returns the Google site validation record:

email2.example.com 300 TXT "google-site-verification=ZbTqQmfwO0C4..."

There’s no SPF TXT record in that response, so SPF fails. The resolver will hang on to that record for the next 300 seconds, and SPF will fail all that time.

But what if I query for something else, the MX record for email2.example.com – “email2.example.com MX”? Again, my resolver will go ask ns04 for the answer and it’ll get back something like this:

email2.example.com 3600 CNAME esp.com
esp.com            300  MX    mail.esp.com

The resolver will then cache that result, keeping the CNAME around for the next hour, so if I now ask for a TXT record again “email2.example.com TXT” my resolver will find the CNAME record in it’s cache and go “Alright, there’s a CNAME response so I should follow it to get the answer!”

email2.example.com 3600 CNAME esp.com
esp.com            300 TXT "v=spf1 exists:%{i}._spf.esp.com"

So now the answer I get has a validly formatted SPF TXT record in the response and so SPF passes for the message.

This means that depending on the history of queries the recursive resolver at a mailbox provider has seen recently it may have the (incorrect) TXT record cached, and return that, or it may have the (correct) CNAME record cached, and return that along with the (correct) set of TXT records. From the outside it looks like you get one or the other set of answers kind of at random3and just to make it more fun, different DNS resolvers may handle this in different ways.

So the morals of this story are:

  • Avoid CNAMEs when you can
  • Never have CNAMEs on the same hostname as any other sort of DNS record4which does mean you can never put them at the root of a zone, as they’ll always clash there
  • If you have weird flaky maybe DNS related failures and a CNAME is involved, check for a clashing record

You can check for clashes like this, assuming you’re expecting to ask foo.example.com for a TXT record:

$ dig +short example.com ns
ns01.example.com
ns02.example.com

$ dig +short foo.example.com txt @ns01.example.com
foo.example.com 3600 CNAME esp.com

This is the response you hope to get – just a CNAME response, meaning there’s no conflicting TXT record. If instead you don’t get a CNAME response but do get a TXT record then that TXT record conflicts.

Related Posts

SPF: The rule of ten

Some mechanisms and modifiers (collectively, “terms”) cause DNS queries at the time of evaluation, and some do not. The following terms cause DNS queries: the “include”, “a”, “mx”, “ptr”, and “exists” mechanisms, and the “redirect” modifier. SPF implementations MUST limit the total number of those terms to 10 during SPF evaluation, to avoid unreasonable load on the DNS. If this limit is exceeded, the implementation MUST return “permerror”.

Read More

Is your website up? Are you sure?

“What would you do for 25% more sales?”
It’s panicked gift-buying season, and I got mail this morning from Boutique Academia, part of their final push before Christmas.
Inbox__18_975_messages__26_unread_
They’re hoping for some Christmas sales in the next three days. They do make some lovely jewelry – ask Laura about her necklace some time – so I clicked on their mail.
Failed_to_open_page
That’s not good. I like Boutique Academia, and fixing email and dns problems is What We Do, so I took a look.
Safari isn’t quite as bad with not-exactly-truthful error messages as Internet Explorer, but I still don’t really trust it. Perhaps the problem is with the click-tracking domain in the email, rather than with boutiqueacademia.com? So I open the base page at http://boutiqueacademia.com, get redirected immediately to https://www.boutiqueacademia.com – which fails to load.
15542402_1501169219896451_6901276936993410491_n
OK, start with the basics. DNS.

Read More

TXTing

txt
On Friday I talked a bit about the history behind TXT records, their uses and abuses.
But what’s in a TXT record? How is it used? When and where should you use them?
Here’s what you get if you query for the TXT records for exacttarget.com from a unix or OS X command line with dig exacttarget.com txt

Read More