Who leaked my address, and when?

Providing tagged email addresses to vendors is fascinating, and at the same time disturbing. It lets me track what a particular email address is used for, but also to see where and when they’ve leaked to spammers.
I’d really like to know who leaked an email address, and when.
All my inbound mail is sorted into “spam” and “not-spam” by a combination of SpamAssassin, some static sieve rules and a learning spam filter in my mail client. That makes it fairly easy for me to look at my “recent spam”. That’s a huge amount of data, though, something like 40,000 pieces of spam a month.
Finding the needle of interesting data in that haystack is going to take some automation. As I’ve mentioned before you can do quite a lot of useful work with a mix of some little perl scripts and some commandline tools.
I’m interested in the first time a tagged address started receiving spam, so I start off with a perl script that will take a directory full of emails, one per file, find the ones that were sent to a tagged address and print out that address and the time I received the email. I can’t rely on the Date: header, as that’s under the control of the spammer, and often bogus. But I can rely on the timestamp my server adds when it receives the email – and it records that in the first Received: header in the message.

#!/usr/bin/perl
use strict;
use Date::Parse;
foreach my $file (@ARGV) {
    open IF, $file or die "Failed to open '$file': $!n";
    my @headers;
    while() {
        s/[rn]//g;
        last if /^$/;
	push @headers, '' unless /^s/;
	$headers[$#headers] .= $_;
    }
    my $date;
    my $timestamp;
    foreach my $header (@headers) {
	if($header =~ /^Received:.*;([^(]+)/) {
	    $date = $1;
	    $timestamp = str2time($1);
	    last;
	}
    }
    # Replace this regex with something that
    # matches your tagged addresses
    if(join(' ', @headers) =~
                 /(foo+[a-z0-9]+@[a-z.-]+)/) {
	print "$timestamp $1 $daten";
    }
}

Dates and times are annoying to work with on the command line, so I also use the perl Date::Parse module to convert the timestamp in the received header into epoch time – the number of seconds since January 1st, 1970. I use some unix commandline magic to run this against my two spam mailboxes and dump the results in a file.

find spamassassin/ | xargs stamp-address.pl >>junk.txt
find junk/ | xargs stamp-address.pl >> junk.txt

The end result is one line per email, with the epoch time, the tagged email address and the original format of the date and time. Something like this:

1300731078 cpan-tag@addr  Mon, 21 Mar 2011 11:11:18 -0700
1300731122 vmware-tag@addr Mon, 21 Mar 2011 11:12:02 -0700
1300731122 vmware-tag@addr Mon, 21 Mar 2011 11:12:02 -0700
1300732902 unicorn-tag@addr Mon, 21 Mar 2011 11:41:42 -0700

Next, I want to find the first occurrence of each tagged address.

#!/usr/bin/perl
use strict;
my %seen;
while(<>) {
    chomp;
    my ($stamp, $address) = split / /;
    unless(exists $seen{$address}) {
	print "$_n";
	$seen{$address} = 1;
    }
}

I sort the list of addresses numerically, then use this script to display the first time each email address received spam:

sort -n 

That reduces the amount of data enough that I can look at it by hand. What did I find? Several interesting things, but I'm just going to mention one here.

1299111914 casemate-tag@addr Wed, 2 Mar 2011 16:25:14 -0800
1307104954 dell-tag@addr Fri, 3 Jun 2011 05:42:34 -0700
1307104986 codefast-tag@addr Fri, 3 Jun 2011 05:43:06 -0700 

Casemate and Codefast have only ever mailed me via iContact, so given iContact's history it seems likely that those leaks were via iContact.
Dell, on the other hand, have mailed me directly and through several ESPs - and I don't recall them using iContact. Looking at the timestamps (and the content of the spams) it's clear that the Dell and Codefast tagged addresses were both sent spam for the first time as part of the same spamrun - so it's almost certain that they leaked at the same time.
Looking for iContacts bounce domain (icpbounce.com) in my mailbox I do find that Dell used them briefly, on May 4th. So that's pretty compelling evidence that iContact leaked all three addresses. (Which means my previous theory about Dell customer addresses leaking, based on misleading statements from Intervision, was wrong.)
There's another thing that's interesting... iContact has had a history of email breaches. The data I have here (and it's matched by a couple of older data points, if I recall correctly) shows spam being sent to newly leaked addresses on the 2nd or 3rd of the month.
I wonder if iContact does a batch export to a subcontractor, or an offsite backup or something similar on the first of each month?

Related Posts

Analysing a data breach – CheetahMail

I often find myself having to analyze volumes of email, looking for common factors, source addresses, URLs and so on as part of some “forensics” work, analyzing leaked emails or received spam for use as evidence in a case.
For large volumes of mail where I might want to dig down in a lot of detail or generate graphical or statistical reports I tend to use Abacus to slurp in and analyze all the emails, store them in a SQL database in an easy to handle format and then do the ad-hoc work from a SQL commandline. For smaller work, though, you can get a long way with unix commandline tools and some basic perl scripting.
This morning I received Ukrainian bride spam to a tagged address that I’d only given to one vendor, RedEnvelope, so that address has leaked to criminal spammers from somewhere. Looking at a couple of RedEnvelope’s emails I see they’re sending from a number of sources, so I decided to dig a little deeper.
I started by searching for all emails to that tagged address in my mail client, then copied all the matching emails to a newly created folder. Then I took a copy of that folder and split it into one file per email using a shell one-liner:

Read More

Yes, we have no IP addresses, we have no addresses today

We’ve just about run out of the Internet equivalent of a natural resource – IP addresses.

Read More

What is Two Factor Authentication?

Two factor authentication, or the snappy acronym 2FA, is something that you’re going to be hearing a lot about over the next year or so, both for use by ESP employees (in an attempt to reduce the risks of data theft) and by ESP customers (attempting to reduce the chance of an account being misused to send spam). What is Authentication?
In computer security terms authentication is proving who you are – when you enter a username and a password to access your email account you’re authenticating yourself to the system using a password that only you know.
Authentication (“who you are”) is the most visible part of computer access control, but it’s usually combined with two other A’s – authorization (“what you are allowed to do”) and accounting (“who did what”) to form an access control system.
And what are the two factors?
Two factor authentication means using two independent sources of evidence to demonstrate who you are. The idea behind it is that it means an attacker need to steal two quite different bits of information, with different weaknesses and attack vectors, in order to gain access. This makes the attack scenario much more complex and difficult for an attacker to carry out.
It’s important that the different factors are independent – requiring two passwords doesn’t count as 2FA, as an attack that can get the first password can just as easily get the second password. Generally 2FA requires the user to demonstrate their identity via two out of three broad ways:

Read More