?

Log in

No account? Create an account

fanf

Penalising senders who use invalid recipient addresses

« previous entry | next entry »
27th Oct 2009 | 22:18

Here's an idea for penalising senders who use out of date mailing lists: temporarily reject recipients depending on the relative proportion of valid and invalid recipients. The more invalid addresses they try to send mail to, the slower they'll be able to deliver mail.

This isn't suitable for use in all situations, because there are badly-run lists that nonetheless carry legitimate and desirable traffic. There are also well-run lists that will be unreasonably penalised by my site's annual bulk cancellations of departing users. But it might be useful in borderline situations, e.g. when the sender has passed the Spamhaus check but failed a DNSWL check.

The way to implement this is using my rate measurement equation to keep track of each sender's valid and invalid recipients, two rates per sender. When a sender addresses an invalid recipient, update their invalid recipient rate and also calculate their decayed valid recipient rate (which is just a * rold). When a sender addresses a valid recipient, do the complementary calculations. The proportion of valid recipients is rok / (rbad + rok). Pick a random number between 0 and 1 and if it's greater than that proportion, return a 450 (temporary rejection) otherwise return 250 (ok) or 550 (bad).

If you are feeling mean you can crank up the deferral rate faster than the invalid recipient rate. The rationale for this is that a list with a 50% validity rate is shockingly bad and probably deserves more like a 90% deferral rate (say). So calculate (rok / (rbad + rok))n for some n that increases with your meanness, and use the result for comparing with the random number.

If you are feeling kind you can pick your random numbers between 0 and slightly less than 1, so senders don't have to be whiter than white to avoid penalties. This tweak might make the technique less troublesome.

(I initially thought about deferring just valid recipients, but that has the effect of cleaning the invalid recipients out of the sender's queue, so when they retry the deferred valid recipients, the proportion of validity we measure for them will go up. Deferring both valid and invalid recipients solves this problem and makes the implementation simpler and more pleasingly symmetrical.)

This is actually fairly similar to some of the logic in SAUCE. It keeps a single annoyance number per sender which gets increased when they do something bad and decreased when they do something right, and which decays exponentially so past behaviour fades from memory. SAUCE's main penalty is teergrubing, though it will defer incoming mail if provoked enough, or if it is greylisting the sender. I don't think there's much point in teergrubing except to trigger bugs in spam bots (hence sendmail has a simple greet_pause feature, but no general teergrubing). However there might be some benefit from more intelligent dynamic greylisting.

| Leave a comment | Share

Comments {4}

heliumbreath

from: heliumbreath
date: 28th Oct 2009 03:19 (UTC)

I'm not certain that normal people actually monitor their outbound queues at all; they may not notice the delay, or you may annoy your own users more if it was a mailing they want. Granted, delaying uncertain stuff does still up the chance it will be whacked or blacklisted by next retry, if it was spam.

Also, if you sometimes 450 departed accounts, that might be a good way of finding list-manager software that doesn't remove a user until N consecutive hard-550 responses. Could be entertaining, for bad values of entertaining.

Reply | Thread

Tony Finch

from: fanf
date: 28th Oct 2009 12:06 (UTC)

Right, the aim is to delay the dubious stuff in the hope that it'll get blacklisted while still queued by the sender. Or to throttle dictionary attacks. I guess it might be quite useful against phishing attacks (since phishing email addresses are often blacklisted quite promptly) though we'd have to apply it to sites in the DNSWL because a lot of phishing is sent from compromised webmail accounts. So the practicalities are indeed tricky.

Reply | Parent | Thread

Nicholas

from: nwhyte
date: 28th Oct 2009 09:19 (UTC)

I wonder how huge a problem this actually is? And if there are also more positive and proactive ways of encouraging and helping your users to keep their mailing lists up to date?

Reply | Thread

Tony Finch

from: fanf
date: 28th Oct 2009 12:03 (UTC)

Sorry, I should have been clearer about the purpose. As heliumbreath rightly points out above, it'll only annoy people if it is applied to legitimate traffic [1]. What I'm aiming at is the dubious traffic that isn't yet blocked, in the hope that if it turns out to be definitely bad then the delayed messages will be blocked later by one of the many databases of email badness.

Good mailing list software (like the stuff we run for our users) automatically prunes invalid addresses from mailing lists, so a lot of the stuff that will trip this will be dictionary attacks (guessing recipient addresses) or stale address lists used by spammers and phishers. The proportion of delivery failures is a pretty good first-order mechanism for detecting spam, though some care is needed. Richard Clayton enumerated some of the practicalities about five years ago in his paper http://www.cl.cam.ac.uk/~rnc1/extrusion.pdf

[1] Having said that, we recently had a problem with traffic from JISCMAIL, the national academic mailing list service. About half of it was being delayed for hours because of a configuration problem at their end (they weren't allowing enough delivery capacity, I think). No-one complained other than me despite thousands of messages being affected each day!

Reply | Parent | Thread