DKIM2, Asynchronous Bounces and VERP
This is just expanding on some of the points Laura made last week.
There are two ways that we can be told that an email wasn’t delivered.
If the mailserver we’re sending to can tell immediately that it’s not going to deliver the mail it can respond with a 5xx or 4xx response during the SMTP transaction - a rejection (aka a synchronous bounce) or a deferral.
If the mailserver can’t - or doesn’t want to - make that decision during the delivery process then it can accept the mail then, later, send an asynchronous bounce. An asynchronous bounce is just a normal email sent to the email address in the original emails return path (aka envelope from, bounce domain or spf domain). That bounce email will usually have some part of the original email, with a bit of human-readable - if human-unfriendly - text wrapped around it.
Backscatter
One big problem with asynchronous bounces is that it’s pretty common for spammers to lie and put the email addresses of unrelated third-parties in the return path, meaning that the asynchronous bounces are sent to those people. That lead to people getting a lot of bounces for mail they’d never sent. That’s called “backscatter” and it’s been a serious constraint on engineering email systems for decades.
Backscatter looks pretty much like spam to those who receive it, and so could lead to the mailbox provider that’s sending those bounces getting blocked. And it’s also an engineering faux pas - a mailbox provider that sends a lot of backscatter will be looked down on, and may not be invited to the good parties.
Best practice for decades has been to minimize backscatter to the lowest levels possible, by avoiding sending asynchronous bounces wherever reasonable.
One way to avoid asynchronous bounces is to do all the spam filtering and decision making while the sending ESP has the SMTP connection open, so that you won’t accept an email you’ll later want to bounce. That puts very tight time constraints on making those decisions, and means you need enough filtering capacity to handle the highest spikes in traffic (such as when every marketer on the planet decides to send their mail at 8am on a Monday, so it’s “at the top of the inbox” or just to send it at the top of the hour, because they like nice round numbers). You also can’t use data that you don’t have at delivery time - such as future mail from the same sender - to make that decision.
Another is to receive the body of the email, look at it and if you don’t have the filtering capacity right now - or, more likely, if you don’t have the data to make the decision right now and want to wait until you see more mail from this mailstream to decide - defer the email with a 4xx response. When the sender resends it later you may have more data to make the delivery decision. And if not, you can always defer it again.
And a third way is just to drop mail on the floor. Don’t deliver it, don’t notify the sender you didn’t deliver it. For some sorts of email - malware, for instance - that’s a good decision. But for mail that might be important silently discarding it undermines the integrity of the email system. You accepted an email with an implicit promise that you’d do your best to either deliver it or notify the sender that you couldn’t. And then you broke that promise.
And another very common way is to deliver the mail to a spam or junk folder that’s never read. Technically not the same as dropping it on the floor, but there’s not a lot of difference for either the sender or the recipient.
Regulatory requirements to notify the sender if an email was unable to be delivered, in some circumstances and jurisdictions, make this worse.
DKIM2
By requiring that the return path align with the signing domain, and signing it, DKIM2 allows a mailbox provider to trust that a return-path really is the right place to send a bounce.
And it allows privacy-preserving forwarding systems to rewrite and resign the return-path of a forwarded email so that they will receive the asynchronous bounce and send it back to the (signed) return-path of the original sender.
All this is great. We can use asynchronous bounces again, and it removes some significant costs and constraints on mailbox provider and third-party email filter engineering.
Large consumer mailbox providers are eager to do this. And I’m sure the corporate spam filter appliances and services will jump on it too.
Some of this has been said explicitly by folks in the know; some of it is just obvious implications to anyone who groks email architecture and has read the DKIM2 drafts.
I’m fairly sure a bunch of folk will be talking about this in the next few months. I know Al will be talking about it in the next few days over at SpamResource.
Asynchronous Bounce Handling
For legitimate senders of bulk email this is all likely to convert a lot of 5xx rejections, 4xx deferrals and deliveries to the spam folder to successful deliveries that lead to a delayed asynchronous bounce.
Many of those asynchronous bounces may be sent within a few minutes of delivery, as mailbox providers spool inbound messages for later processing as capacity permits. Some of them may be the next day, after a mailbox provider has gathered more data to make a decision. But there will be a lot more of them than you see today.
This won’t really affect ESP customers much, other than by changing how some reporting works, and maybe requiring a bit of a change in mindset around delivery rates and bounce data.
ESPs, though, will need to plan for async bounce handling capacity. It’s not clear yet how much this volume will increase, but if you count all the mails you sent that were rejected or deferred that’s probably a good number to start with.
If your existing async bounce automation can handle that sort of volume, or can easily be scaled up to handle it then there’s not really anything you need to do.
If, though, you’re dropping asynchronous bounces on the floor, or you’re handling them manually, or you’re forwarding them on to customers or any other approach that’s not as smooth as your handling of 5xx rejections then it’s time to schedule a meeting with your platform engineers to chat about how to rework that bit of your process.
VERP
The other big problem with asynchronous bounces is the cost of handling them. In order to identify which original mail this asynchronous bounce was caused by you need to accept the async bounce, then parse the body of the async bounce to get the original mail, then find enough details in that mail to identify the recipient and the campaign and the customer and record that the mail bounced.
A return path is just an email address, and it’ll get harvested by spammers like any other email address, and it’ll get spam sent to it like any other email address. So your asynchronous bounce automation will have to accept a lot of spam, and dig through it looking for actual bounces.
All this is why it really should be automated at all but the tiniest scale.
VERP (Variable Envelope Return Path) avoids some of this overhead. It encodes the details you need to know to handle the bounce directly in the return-path of the email. That means you have all the data you need at the time of the “RCPT TO:” part of the SMTP session. You can record that bounce immediately and throw away the async bounce immediately, without any processing or storage. You can also reject any mail at RCPT time that’s definitely not an asynchronous bounce, or is claiming to be an asynchronous bounce for mail that was sent a week ago, or any other sort of junk being sent to your bounce handler.
VERP is great. It makes engineering asynchronous bounce automation much simpler and more reliable.
Because VERP is so good some folks treat “VERP” and “Asynchronous Bounce Processing” as the same thing - they may not have seen bounce processing pipelines that don’t use VERP, even. They’re not the same thing.
You can engineer your bounce processing any way you want to. If your current approach works reliably without VERP there’s no need to change it. Just keep on doing what works.
TL;DR Management Summary
DKIM2 rollout is likely to increase asynchronous bounce volumes by at least several orders of magnitude over the next couple of years. Plan for that now.
VERP is a term you should know, but you do not need to use VERP to handle asynchronous bounces.
You should have robust asynchronous bounce automation in place by the end of 2026. Without it you will start missing bounces, and that will damage your compliance and delivery.