C is for Cookie

Trekkie Monster. He’s obsessed by social media and isn’t owned by Children’s Television Workshop.

What is a Cookie?

I’m not talking about biscuits, nor about web cookies, at least not exactly.

When you’re talking to a protocol developer a cookie is a thing you’re given, that you hang on to for a while, then give back. If you leave your suitcase with your hotel concierge they’ll give you a paper ticket with a number on it. That ticket and the number on it aren’t of any intrinsic value, nor do they really mean anything. The only thing you can do with it is give it back to the concierge to get your suitcase back. The ticket is a cookie.

Conceptually a cookie isn’t something that’s meaningful except when you give it back to whoever gave it to you – so if you’re a client program and a server sends you a cookie you just hang on to it and later send it back to the server. The name came from fortune cookies – the server bakes a piece of information into the cookie, you accept the cookie and later send it back to the server. The server cracks the cookie open and has access to that bit of information again.

We’ll often talk about opaque cookies – those are cookies where not only is the only person who’s intended to have access to the information embedded in it is the person who created it originally, they’re the only person who can crack it open and get that information. The client that receives an opaque cookie from a server can’t see what it contains, and nor can anyone else who gets a copy of the cookie. (If someone is talking about web cookies they’re not opaque unless they’re explicitly described that way, if they’re talking about any other protocol they probably are opaque. ¯\_(ツ)_/¯ ).

One thing that opaque cookies are very useful for is passing information to and from third parties where you need the third parties to do things with the information, but not be able to read it. For example, when someone clicks on a link in an email you send you’d like to know the address that email was sent to. You could include their email address in the link – https://click.example.com/whatever.html?e=rishi@no10.gov.uk – but you really, really don’t want that personally identifiable information (PII) to be visible in to your third party vendors, your log files, anyone who sees referer1yes, it’s spelled that way, don’t ask me why headers from your web content and so on.

So you use https://click.example.com/whatever.html?e=12345 instead. That can go through any third party tracking you need, and when you need to generate reports on who clicked where you can map “12345” back to “rishi@no10.gov.uk”. And only you can – nobody else can decode that directly.

There are two common ways of creating this sort of opaque cookie. One is just to use the database primary key for the row in your database for that recipient. When you want to crack the cookie open you can look up who the recipient is with a database query. The other is to use encryption where you store the recipient data directly in the cookie, but you store it encrypted with a key that only you possess.

Increasingly companies are treating PII such as email addresses like they would toxic waste – they’ll handle it when they have to, but they really want to avoid doing so and they want to get rid of it as quickly as they can. As part of that they may look for things that look like PII – such as email addresses – in records they’re sent and scramble them as soon as they receive them, and provide the rest of their service based on that scrambled data. If you’re seeing garbled email addresses in reporting from your third party vendors that might be why.

You can avoid a lot of annoyance, and potentially some legally unfortunate accidents, by never putting a recipient email address in an unsubscription or click-tracking link. Always use an opaque cookie.

Related Posts

FCC notice of proposed rulemaking

The FCC recently published a notice of proposed rulemaking that will have an impact on how we fight abuse on the internet. M3AAWG has submitted a comment on the proposal (pdf link). All submissions can be found on the FCC website.

Read More

Lorem Ipsum for PII

When you’re developing code to handle data it’s almost essential to have a decent sized set of test data, so you can build a test harness to check on functionality and performance as you go.
A common way of doing that is to take a snapshot of your production database and pull out an appropriate subset from there. That works pretty well in most cases, but it’s a really bad idea if the data you’re working with is personally identifiable information, such as email addresses, phone numbers, credit cards and so on.
Test data gets spread everywhere. It’s checked in to source control systems, copied to developers laptops, included in publicly visible bug reports, shared with mailing lists when asking questions and sent to that dodgy overseas outsourcing company your CTO is evaluating. And if the code you’re developing sends email or SMS messages then sooner or later you’re going to misconfigure your test platform and send test messages to the contacts in your test data. (I’ve only done that once, and it was a memorable experience.)
But test data needs to be similar to real data, and look plausible, or it’s hard for manual testers to identify problems using it.
Enter randomuser.me – a simple API for generating random user data – name, email address, birthdate, phone numbers, postal address, social security number, even photos.
Need something more configurable, that lets you create a fake API to test your code against? Try RandomAPI for a web API returning JSON, SQL, CSV or YAML.
Just need some test JSON files you can generate and paste in to your test suite? Try JSON Generator.
Need bulk data, to load into your test database? Look at Mockaroo, DummyData or GenerateData.
Just don’t use your production PII, even if you plan on anonymizing it before use. Really.

Read More