Gmail, machine learning, filters

laura
Industry
February 18, 2019

I’m sure by now readers have seen the article from Gmail “Spam does not bring us joy — ridding Gmail of 100 million more spam messages with TensorFlow.” If you haven’t seen it, go read it. It’s not often companies write about their filtering philosophy and what tools they’re using to manage incoming bad mail.

There were a few parts of the article that confirmed some of my theories about Gmail and a few things that were unexpected.

Open source tools

It’s no surprise that Google uses a machine learning engine built in house. What I didn’t know was it was called TensorFlow and was open sourced by Google. Many companies in the email space open source some of their tools. Exacttarget open sourced FuleUX long before they were SFMC and maintain a GitHub account with a number of tools. Mailchimp also maintains an account with their open source code. Steve releases a bunch of tools and code he writes both for work and for fun.

Open source software runs a whole lot more of the internet than many people know. Some of the primary contributors do the work on their own time. But many companies, large and small, understand how vital open source tools are to their business. They hire and support open source developers to maintain and extend the software.

Catching the hard spam

Google catches a lot of spam, and they’re always trying to catch the stuff that falls through the cracks. My recent call volume about going to spam at Gmail told me that Gmail had implemented some new filters. Many people were telling me that things were fine and then, with no change in what they were doing, mail started going to bulk. Other delivery folks were also talking about their customers getting caught up in filters.

We’ve gotten to the point, particularly with Google but also with the other webmail providers, where the bulk of egregious spam is blocked. What’s left is not some spammer sending 10MM messages, but a much more difficult problem. Spam that reaches the inbox is sent in much smaller quantities. It’s also heavily targeted. Spammers are trying to look like legitimate marketers but still sending mail without permission.

This targeted spam is something I’ve been thinking about a lot lately. Mostly because anti-spammers did a pretty good job making not-spamming look like it was beneficial to senders. Many deliverability recommendations boil down to stop spamming but phrased in a way that makes the advice more palatable. Much of the type of spam that’s getting caught in the new filters follows deliverability recommendations. The piece it misses is that it’s not being sent with the permission of the recipient.

Believe it or not, spam filters started out as protecting users from mail they didn’t ask for. As the internet as grown and email has become a channel for crime the focus of filters have changed. But, fundamentally, deep down, the original purpose of keeping mail boxes useful by stopping unsolicited mail is still there. The ML filters are giving Google, and others, tools to actually address that mail better.

The trend is clear. Filters are getting more an more able to address unsolicited email in a complex sender and user environment. Machine learning is driving a lot of that, and Google is at the front of the pack. They’re doing their best to stop the small scale spammers that have avoided a lot of the last generation of filters.

Return Path on Content Filtering

Return Path have an interesting post up about content filtering. I like the model of 3 different kinds of filters, in fact it’s one I’ve been using with clients for over 18 months. Spamfiltering isn’t really about one number or one filter result, it’s a complex interaction of lots of different heuristics designed to answer the question: do recipients want this kind of mail?

Permission and B2B spam

Two of the very first posts I wrote on the blog were about permission (part 1, part 2). Re-reading those posts is interesting. Experience has taught me that recipients are much more forgiving of implicit opt-in than that post implies.
The chance in recipient expectations doesn’t mean, however, that permission isn’t important or required. In fact, The Verge reported on a chatbot that will waste the time of spammers. Users who are fed up with spam can forward their message to Re:Scam and bots will answer the mail.
I cannot tell you how tempted I am to forward all those “Hey, just give me 10 minutes of your time…” emails I get from B2B spammers. I know, those are actually bots, but there is lovely symmetry in bots bothering one another and leaving us humans out of it.

Speaking of those annoying emails, I tweeted about one (with horrible English…) last week. I tagged the company in question and they asked for an example. After I sent it, they did nothing, and I continued to get mail. Because of course I did.
These types of messages are exactly why permission is so critical for controlling spam. Way more companies can buy my email address and add me to their spam automation software than I can opt-out of in any reasonable time frame. My inbox, particularly my business inbox, is where I do business. It’s where I talk with clients, potential clients, customers and, yes, even vendors. But every unsolicited email wastes my time.
It’s not even that the mail is simply unwanted. I get mail I don’t want regularly. Collecting white papers for my library, RSVPing to events, joining webinars all result in me getting added to companies’ mailing lists. That’s fair, I gave them an email address I’ll unsubscribe.
The B2B companies who buy my address are different. They’re spamming and they understand that. The vendors who sell the automation filters tell their customers how to avoid spam filters. Spammers are told to use different domains for the unsolicited mail and their opt-in mail to avoid blocking. The software plugs into Google and G Suite account because very few companies will block Google IPs.
I’ve had many of these companies attempt to pay me to fix their delivery problems. But, in this case there’s nothing to fix. Yes, your mail is being blocked. No, I can’t help. There is nothing I can say to a filtering company or ISP or company to make them list that block. The mail is unwanted and it’s unsolicited.
The way to get mail unblocked is to demonstrate the mail is wanted. If you can’t do that, well, the filters are working as intended.

It's not fair

In the delivery space, stuff comes in cycles. We’re currently in a cycle where people are unhappy with spam filters. There are two reasons they’re unhappy: false positives and false negatives.
False positives are emails that the user doesn’t think is spam but goes into the bulk folder anyway.
Fales negatives are emails that the user does thing is spam but is delivered to the inbox.
I’ve sat on multiple calls over the course of my career, with clients and potential clients, where the question I cannot answer comes up. “Why do I still get spam?”
I have a lot of thoughts about this question and what it means for a discussion, how it should be answered and what the next steps are. But it’s important to understand that I, and most of my deliverability colleagues, hate this question. Yet we get it all the time. ISPs get it, too.
A big part of the answer is because spammers spend inordinate amounts of time and money trying to figure out how to break filters. In fact, back in 2006 the FTC fined a company almost a million dollars for using deceptive techniques to try and get into filters. One of the things this company did would be to have folks manually create emails to test filters. Once they found a piece of text that would get into the inbox, they’d spam until the filters caught up. Then, they’d start testing content again to see what would get past the filters. Repeat.
This wasn’t some fly by night company. They had beautiful offices in San Francisco with conference rooms overlooking Treasure Island. They were profitable. They were spammers. Of course, not long after the FTC fined them, they filed bankruptcy and disappeared.
Other spammers create and cultivate vast networks of IP addresses and domains to be used in snowshoeing operations. Still other spammers create criminal acts to hijack reputation of legitimate senders to make it to the inbox.
Why do you still get spam? That’s a bit like asking why people speed or run red lights. You still get spam because spammers invest a lot of money and time into sending you spam. They’re OK with only a small percentage of emails getting through filters, they’ll just make it up in volume.
Spam still exists because spammers still exist.