Clicktracking link abuse
- steve
- Best practices
- October 11, 2010
If you use redirection links in the emails you send out, where a click on the link goes to your server – so you can record that someone clicked – before redirecting to the real destination, then you’ve probably already thought about how they can be abused.
Redirection links are simple in concept – you include a link that points to your webserver in email that you send out, then when recipients click on it they end up at your webserver. Instead of displaying a page, though, your webserver sends what’s called a “302 redirect” to send the recipients web browser on to the real destination. How does your webserver know where to redirect to? There are several different ways, with different tradeoffs:
The simplest approach
The simplest sort of redirection link includes the final destination in the link itself – something like http://click.example.com/cnn.com/WORLD/. The webserver at click.example.com would simply strip off the first part of the link, and redirect to the remainder – cnn.com/WORLD/.
This is nice, because it’s fairly transparent to the recipient – when they hover over the link in their mail client or webmail it’ll be fairly clear where it’s going.
But it has several limitations. One is that you can’t really record very much data about the click – you know where it was redirecting to, but almost nothing else.
The bigger problem is that it’s very easy for a spammer to abuse – they can send out spam that has the link http://click.example.com/onlinepharmacy.ru/order.html, to hide their real link from spam filters, and your webserver will happily redirect recipients to go there. Or, worse, that can be used to redirect to a website hosting viruses. That can cause all sorts of problems for your reputation, up to and including having your redirection webserver blacklisted by antivirus and antiphishing organizations, meaning it’ll be blocked by many web browsers.
Add some metadata
Some of the things you might want to be able to record about a click would be which customers mail it was found in, which mailing campaign and which recipient it was sent to. This would let you do more sensible reporting and click-tracking, and also let you spot when a link is misused in some way (for example, thousands of clicks on a url that was sent to just one recipient).
That might look like http://click.example.com/123/456/789/cnn.com/WORLD/. Your webserver would strip off the first four parts, recording a click for customer 123, campaign 456 and recipient 789, then redirect to the remainder – cnn.com/WORLD/
This lets you do better reporting and is still fairly transparent to the recipient, but can still be abused in the same way.
Use a database
If you stored every link you wanted to redirect to in a database you could simply store a unique key for each link – so you might record that key 2718 means http://cnn.com/WORLD/. Then the redirection URL might look like http://click.example.com/123/456/789/2718
This lets you do good reporting and is much more difficult for spammers to abuse (but not impossible – if the spammer signs up for a free or demo account on your system, then sends a test email to themselves, they can then reuse the links that they received in that mail).
But it’s fairly opaque to the recipient – they have no idea where the link will go. And it requires maintaining a database of every link you’ve ever used, for as long as it’s valuable (which could easily be several years if a recipient goes back to an old newsletter) and requires a database lookup for every click – which adds a fair bit of infrastructure you need to keep working 24/7 just to make links work.
Use a database and a cosmetic link
You could take the database format and add the final destination link on the end – like this http://click.example.com/123/456/789/2718/cnn.com/WORLD/ – and then just ignore everything after the url key (2718). That’ll work exactly the same way, but the final destination will be fairly transparent to the recipient.
This still can’t be abused by spammers, as if they try to use http://click.example.com/123/456/789/2718/mypharmacy.ru, it’ll still just redirect to http://cnn.com/WORLD/ as the only meaningful bit of the redirection link is the “2718“.
Cryptographically sign your links
A different approach is to record all the information you need in the link and to also add a cryptographic signature to prevent people from misusing it. This is much simpler than the word “cryptography” suggests, you just need to use a magic word (we’ll use “albatross”) and know about the md5() function.
You start off with the same destination string we used in Add some metadata – “/123/456/789/cnn.com/WORLD/“. Then you add the magic word on the end, to give “/123/456/789/cnn.com/WORLD/albatross“, and take the md5 “hash” of that. That’s some cryptographic black magic that’ll give you a string of letters and numbers that’s a “fingerprint” of that string. It’ll look something like “609a78b941bdf9f045cadcfa2e09d54c“. Then you combine that with the destination string to look like this:
http://click.example.com/609a78b941bdf9f045cadcfa2e09d54c/123/456/789/cnn.com/WORLD/
Then, when your webserver sees this link it splits it into the hash (609a78b941bdf9f045cadcfa2e09d54c) and destination string (/123/456/789/cnn.com/WORLD/). It then does exactly the same thing you did when you created the link – appends the magic word to the destination string to give “/123/456/789/cnn.com/WORLD/albatross” and takes the md5 hash of that string. If the result of that matches the hash in the link, it knows it’s a valid redirection link and it can record the click-tracking data and forward to the destination link. If the result doesn’t match it knows that the link has been tampered with, and can return an error page.
To generate the link in PHP would be something like this:
$destination = "/$customerid/$campaignid/$recipientid/$link"; $clicktrack = 'http://click.example.com/' . md5($destination . 'albatross') . $destination;
This is much cheaper to generate and validate than using a database, even a typical in-memory database.
Which to use?
Don’t use the simple approach – it’ll get abuse sooner or later and you’ll regret it. Any of the database or cryptographic approaches work just fine, though the cryptographic approach may be easier to scale up and maintain. The database approaches make it easier to disable a link, or direct it to somewhere else at a later point, in case of abuse or some other need.
What else is it good for?
You can use the same sort of approach to validate unsubscription links and VERP return paths for bounce handling. And “open tracking” using these sort of links for image URLs, if you find that a useful metric to offer.