GFI/SORBS considered harmful, part 2

Act 1Act 2IntermezzoAct 3Act 4Act 5
Management Summary, Redistributable Documents and Links
Yesterday I talked about GFI responsiveness to queries and delisting requests about SORBS listings. Today I’m going to look at data accuracy.
The two issues are tightly intertwined – a blacklist that isn’t responsive to reports of false positive listings will end up with a lot of stale or inaccurate data, and a blacklist that has many false positives will likely be overwhelmed with complaints and delisting requests, and won’t be able to respond to them – leading to a spiral of dissatisfaction and inaccurate data feeding off each other.

Because it is so difficult to remove an IP address from the list, SORBS as a blacklist produces many false positives.
XS4ALL also consider SORBS harmful

GFI/SORBS maintains nine different IP address based blacklists, but they’re usually bundled together and treated as a single “Don’t accept email from this address” blacklist. Each of the nine lists have somewhat different listing policies, though there’s been some scope creep and blurring of the lines over the years.
I’m going to focus on just one of them, dul.dnsbl.sorbs.net. This is intended to list “Dynamic IP Address ranges” – consumer internet connections, such as DSL lines, cable modems and dialup modem pools where the IP address assigned to a user will change over time. These systems don’t typically send legitimate email directly to recipients (rather they send mail via their ISPs smarthost) and often contain a lot of consumer windows machines, which tend to get infected and send viruses and spam, so declining to accept mail from this sort of address pool is a fairly sensible decision. (Spamhaus maintain a list with similar goals, the PBL, as do Trend Micro).
GFI/SORBS have had a number of database accidents that have repeatedly caused false listings in a number of their lists, but because the DUL zone tends to list large ranges of IP addresses, data handling mistakes there tend to cause more visible problems.

the SORBS DUHL list has become badly broken, flagging thousands maybe millions of static IP’s as dynamic. This setting will flag as spam email from numerous legitimate sources incorrectly. Numerous attempts by mail admins the world over have failed to get Sorbs to fix the mess yet.SmarterMail Support Forum

ISPs tell me that GFI/SORBS also refuse to accept notifications about false positive listings in their DUL zone. Or they do update their database, but then reload the bad data a few weeks later. And if the ISP asks GFI for a status update about a false listing, their policy is to move that request to “the bottom of the pile”, ensuring that the inaccurate data that’s causing noticeable problems continues to be published.

If an ISP has reported to SORBS that a CIDR is no longer dynamic, and the (repeat) notifications have been ignored for 6-12 months… at what point does it go from lack of responsiveness, to data quality, to negligence, to willful malice?frustrated anonymous system administrator

The dul list was originally seeded based on data acquired from the dynablock list in 2003. I’m told that stale data, possibly dating back to 2003, is repeatedly being loaded into the GFI/SORBS DUL list, leading to a huge number of false positives. I can’t tell whether that is the case, but there’s certainly a lot of bad data leading to false listings.

Your problem in researching bad data in SORBS is not going to be finding examples of false listings, it’s going to be whittling that forest down to a manageable stack of wood.Comment from IRC

Very true. Lets choose a particular example: “n2.bullet.mail.sp2.yahoo.com” aka 67.195.134.51. This is one of the mailservers for Yahoo Groups, and sends a lot of mailing list mail. There’s nothing at all to suggest that it’s an end-user, dynamically assigned address machine. Just the opposite, it’s listed at dnswl.org as a Yahoo server that shouldn’t be blacklisted. It’s listed by ARIN as part of a /16 (65536 addresses) assigned to Yahoo. There’s nothing in the hostname to suggest it’s dynamically assigned, it even has the word “mail” in the hostname, a common sign of a legitimate mailserver. McAfee TrustedSource list it as a clean mailserver with a history of sending significant volumes of email, as do SenderBase.
And yet it’s listed in the dnsbl.sorbs.net zone, with a return value of 127.0.0.10 meaning GFI/SORBS are claiming it’s a dynamic IP address.
Looking up that IP address on the SORBS website was a fairly painful exercise (I’ll go into more detail about that, and other SORBS operational problems on Monday) but this is what I found:
That IP address is categorized, wrongly, by GFI/SORBS as a dynamic address.
Why did I choose this particular server as an example, rather than one of the countless other false positives I could have picked? Well, it’s not just a single IP address that’s listed as a false psitive: GFI/SORBS are listing all of 67.194.0.0/15 – that’s 131,072 Yahoo servers that are categorized wrongly. And i know that GFI staff were explicitly notified about that particular listing early yesterday morning. Yet GFI are still publishing that data (as well as at least dozens of other false positive listings of similar size).

An outage usually means someone works quickly to resolve it. Having it still be an issue after 5 days is gross negligence. #sorbsTwitter

A blacklist should have checks in place that make it unlikely that badly wrong data is published, though even the best blacklists will very occasionally have a problem and publish bad data. How they respond to false positives is really important. If a blacklist is notified of false positive listings of this magnitude the safe thing to do is to pull all dubious listings from the published blacklist data (or if it can’t be narrowed down, pull all listings) until the problem is resolved.
That will eliminate the loss of legitimate email to the blacklists customers (and most of the spam the blacklist might have stopped will likely be blocked or filtered by other parts of the spam filters they use). GFI/SORBS have not done this, rather they’re following the same practice they’ve used during previous database catastrophes – continuing to publish known bad data.

I do not doubt that there are mail admins rationally fearing for their jobs this week. I am lucky enough to no longer be in the sort of pathological enterprise where a burst of excess false positives is a risk to an admin’s employment, but I am sure that not everyone who was using the SORBS DUHL until this week is so fortunate.Senior Security Consultant

More on Monday.

Related Posts

Content based filtering

A spam filter looks at many things when it’s deciding whether or not to deliver a message to the recipients inbox, usually divided into two broad categories – the behaviour of the sender and the content of the message.
When we talk about sender behaviour we’ll often dive headfirst into the technical details of how that’s monitored and tracked – history of mail from the same IP address, SPF records, good reverse DNS, send rates and ramping, polite SMTP level behaviour, DKIM and domain-based reputation and so on. If all of those are OK and the mail still doesn’t get delivered then you might throw up your hands, fall back on “it’s content-based filtering” and not leave it at that.
There’s just as much detail and scope for diagnosis in content-based filtering, though, it’s just a bit more complex, so some delivery folks tend to gloss over it. If you’re sending mail that people want to receive, you’re sure you’re sending the mail technically correctly and you have a decent reputation as a sender then it’s time to look at the content.
You want your mail to look just like wanted mail from reputable, competent senders and to look different to unwanted mail, viruses, phishing emails, botnet spoor and so on. And not just to mechanical spam filters – if a postmaster looks at your email, you want it to look clean, honest and competently put together to them too.
Some of the distinctive content differences between wanted and unwanted email are due to the content as written by the sender, some of them are due to senders of unwanted email trying to hide their identity or their content, but many of them are due to the different quality software used to send each sort of mail. Mail clients used by individuals, and content composition software used by high quality ESPs tends to be well written and complies with both the email and MIME RFCs, and the unwritten best common practices for email composition. The software used by spammers, botnets, viruses and low quality ESPs tends not to do so well.
Here’s a (partial) list of some of the things to consider:

Read More

It's not illegal to block mail

My post “We’re going to party like it’s 1996” is still getting a lot of comments from people. Based on the comments, either people aren’t reading or my premise wasn’t clear.
Back in 1996 the first lawsuits were brought against ISPs to stop ISPs from blocking email. These suits were failures. Since that time, other senders have attempted to sue ISPs and lost. Laws have been written protecting the rights of the ISPs to block content they deem to be harmful.
Dela says that he was just attempting to open up a conversation, but I don’t see what he thinks the  conversation is. That ISPs shouldn’t block mail their customers want? Sure, OK. We’re agreed on that. Now, define what mail recipients want. I want what mail I want, not what someone else decides I might want.
Marketers need to get over the belief that they own end users mailboxes and that they have some right to send mail to people. You don’t.
When marketers actually start sending wanted mail, to people who actually subscribe – not just make a purchase, or register online or happen to have an easily discoverable email address – then perhaps marketers will have some standing to claim they are being treated illegally. Until and unless that happens, the ISPs are well within their rights to block mail that their users don’t want.

Read More

GFI/SORBS considered harmful

Act 1Act 2IntermezzoAct 3Act 4Act 5
Management Summary, Redistributable Documents and Links
A little over a year ago the SORBS blacklist was purchased by GFI Software. I had fairly high hopes that it would improve significantly, start behaving with some level of professionalism and competence and become a useful data source, in much the same way that the SpamCop blacklist turned into an accurate, professionally run source of data after they transitioned from being a volunteer run blacklist to a service of IronPort.
GFI’s statement a year ago was:

Read More