Dodgy PDF handling at Gmail

We sent out some W-9s this week. For non-Americans and those lucky enough not to have to deal with IRS paperwork those are tax forms.
They’re simple single page forms with the company name, address and tax ID numbers on them. Because this is the 21st Century we don’t fill them in with typewriters and snail mail them out, we fill in a form online at the IRS website which gives us PDFs to download that we then send out via email.

We started to get replies from people we’d sent them to that we hadn’t included the tax ID number. Which was odd, because it was definitely there in the PDFs we’d sent.
The reports of missing numbers came from Google Apps users, so we sent a copy to one of our Gmail addresses to see. Sure enough, when you click on the attachment it’s mostly there, but some of the digits of the tax ID number are missing.

And all the spaces have been stripped from our address.

The rest of the form looked fine, but the information we’d entered was scrambled. Downloading the PDF from Gmail and displaying it – everything is there, and in the right place.
Weird. After a brief “Are gmail hiding things that look like social security numbers?” detour I realized that the IRS website was probably generating the customized forms using PDF annotations.
PDF is a very powerful, but very complex, file format. It’s not just an image, it’s a combination of different elements – images, lines, vector artwork, text, interactive forms, all sorts of things – bundled together into a single file. And you can add elements to an existing PDF file to, for example, overlay text on to it. These “annotations” are a common way to fill in a PDF form, by adding text in the right place over the top of an existing template PDF.
I cracked the PDF open with some forensics tools and sure enough, the IRS had generated the PDF form using annotations.
 

<< /Type /Annot /DV (Palo Alto, CA) /T (topmostSubform[0].Page1[0].Address[0].f1_8[0])
/Rect [ 57.6 539.968 388.8 553.969 ] /AP 81 0 R /FT /Tx /DA (/Helvetica-Bold 9 Tf 0 g)

And the Gmail PDF viewer isn’t rendering that annotated text correctly.
I’ve filed a bug sent feedback to Google, so hopefully it’ll be fixed. Meanwhile, if you’re sending customized content to recipients using PDF you should probably check that it renders correctly when previewed in Gmail.
 

Related Posts

Tell us about how you use Gmail Postmaster Tools

One of the things I hear frequently is that folks really want access to Google Postmaster Tools through an API. I’ve also heard some suggestions that we should start a petition. I thought a better idea was to put together a survey showing how people are using GPT and how high the demand is for an API.
They’re a data company, let’s give them data.

I’ve put together a survey looking at how people are using GPT. It’s 4 pages and average time to take the survey is around 7 minutes. Please give us your feedback on GPT usage.
I’m planning on leaving the survey open through the first week in November. Then I’ll pull data together and share here and with Google.

Read More

Google makes connections

One of the client projects I’m working on includes doing a lot of research on MXs, including some classification work. Part of the work involves identifying the company running the MX. Many of the times this is obvious; mail.protection.outlook.com is office365, for instance.

There are other cases where the connection between the MX and the host company is not as obvious. That’s where google comes into play. Take the domain canit.ca, it’s a MX for quite a few domains in this data set. Step one is to visit the website, but there’s no website there. Step 2 is drop the domain into google, who tells me it’s Roaring Penguin software.
In some cases, though, the domain wasn’t as obvious as the Roaring Penguin link. In those cases, Google would present me with seemingly irrelevant hosting pages. It didn’t make sense until I started digging through hosting documentation. Inevitably, whenever Google gave me results that didn’t make sense, they were right. The links were often buried in knowledge base pages telling users how to configure their setup and mentioning the domain I was searching for.
The interesting piece was that often it was the top level domain, not the support pages, that Google presented to me. I had to go find the actual pages. Based on that bit of research, it appears that Google has a comprehensive map of what domains are related to each other.
This is something we see in their handling of email as well. Gmail regularly makes connections between domains that senders don’t expect. I’ve been speaking for a while about how Gmail does this, based on observation of filtering behavior. Working through multiple searches looking at domain names was the first time I saw evidence of the connections I suspected. Gmail is able to connect seemingly disparate hostnames and relate them to one another.
For senders, it means that using different domains in an attempt to isolate different mainstreams doesn’t work. Gmail understands that domainA in acquisition mail is also the same as domainB in opt-in mail is the same as domainC in transactional mail. Companies can develop a reputation at Google which affects all email, not just a particular mail stream. This makes it harder for senders to compartmentalize their sends and requires compliance throughout the organization.
Acquisition programs do hurt all mail programs, at least at Gmail.
 

Read More

Warmup advice for Gmail

Getting to the Gmail inbox in concept is simple: send mail people want to receive. For a well established mail program with warm IPs and domains, getting to the inbox in practice is simple. Gmail uses recipient interaction with email to determine if an email is wanted or not. These interactions are easy when mail is delivered to the inbox, even if the user has tabs enabled.
When mail is in the bulk folder, even if it’s wanted, users are less likely to interact with the mail. Senders trying to change their reputation to get back to the inbox face an uphill battle. This doesn’t mean it’s impossible to get out of the bulk folder at Gmail, it’s absolutely possible. I have many clients who followed my advice and did it. Some of these clients were simply warming up new IPs and domains and needed to establish a reputation. Others were trying to repair a reputation. In both cases, the fixes are similar.

When I asked colleagues how they handled warmup at Gmail their answers were surprisingly similar to one another. They’re also very consistent with what I’ve seen work for clients.

Read More