Log in

No account? Create an account


Rate limiting DNS

« previous entry | next entry »
8th Jun 2012 | 09:09

We have a problem that our authoritative name servers are being used in a traffic amplification attack, along the lines of this SANS ISC report, except we have been sustaining 1000 - 3000 queries per second for nearly a week.

We need a proper solution to DNS amplification attacks. It is not enough to focus on the surface features of the current attack (recursion desired, QTYPE=*) because they are easy for attackers to change. For instance, an ANY query for cam.ac.uk returns 2KB of data, and an MX query (with DO DNSSEC) returns 1.5KB - not much less.

Some kind of per-client rate limit seems to be in order: legit recursive DNS servers should not be making high volumes of queries to our authoritative servers. Here is an outline of a rate limiting scheme thought up by Ian Jackson, Mark Wooding, and me. It uses integer arithmetic, requires a fixed amount of memory, and has a small per-query overhead.

Set up a Bloom filter, sized to give a suitable false positive rate. (There are about ten million recursive DNS servers on the Internet.) Each bucket contains an integer (rather than a bit) large enough to hold a reasonable per-interval packet count.

When a query arrives, hash its source network (/24 for IPv4, /64 for IPv6) with some random data which you obtained at startup. The hash function should be optimised for speed rather than strength. You need two independent hash values, h1 and h2. Increment buckets in the Bloom filter with indexes h1 + i*h2 for 0 <= i <= n. The client's rate is the minimum of the counts in these buckets. If this rate is too large, refuse the query.

Periodically, scan the Bloom filter and multiply each bucket by some number less than one. This ages out old data. (Or should it be a subtraction for leaky-bucket rate limiting?).

That's it. This has probably been invented before, but I haven't searched the literature yet. If anyone has any pointers I'd be grateful :-)

Edit: Paul Vixie and Vernon Schryver have produced a response rate limiting patch for BIND which deals with this problem rather sooner and more thoroughly :-)

| Leave a comment |

Comments {8}

Tony Finch

from: fanf
date: 8th Jun 2012 08:28 (UTC)

It turns out this is exactly the scheme described in section 8.1 of the survey of network applications of Bloom filters by Broder and Mitzenmacher, which includes a nice trick for reducing the bad effects of collisions.

Reply | Thread

Tony Finch

from: fanf
date: 8th Jun 2012 09:21 (UTC)

Direct link to the paper by Estan and Varghese: http://pages.cs.wisc.edu/~estan/publications/elephantsandmice.pdf

Reply | Parent | Thread

Tony Finch

from: fanf
date: 8th Jun 2012 09:17 (UTC)

Compare the Linux iptables hashlimit code which uses a normal hash table. http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=blob;f=net/netfilter/xt_hashlimit.c

Reply | Thread

Tony Finch

from: fanf
date: 8th Jun 2012 10:00 (UTC)

There's a question about what countermeasures to apply to over-limit clients: Drop the query? Send a REFUSED response? Colm MacCárthaigh points out that a minimum-size truncated reply is the same size as a REFUSED response, but has the advantage of not allowing an attacker to deny service to a legit resolver by spoofing UDP "from" them.

Reply | Thread

Ewen McNeill

DNS rate limiting

from: edm
date: 8th Jun 2012 11:01 (UTC)

Dropping the query seems safer as a default: if the IP is a genuine recursive resolver, it'll try one of the other nameservers it's already learnt about having the same authoritative information. And if the IP is a client making "recursion required" query then there's probably a second recursive resolver IP configured for it to fail over to when necessary. And it has the advantage of not adding to the backscatter, basically at the cost of enforcing a timeout on the forged IP. (You'd probably want reasonable anti-spoofing rules for your own IP blocks at your borders though, to at least ensure that outside entities can't deny service to your own IPs -- presumably inside spoofed traffic would be easier to track down and deal with in other ways, than outside originated traffic.)

Alternatively you could compromise and send a REFUSED response to 1/Nth of the dropped queries (either overall, or from a given IP/subnet). Either by counting them (which requires memory) or probabilistically (eg, hash of CPU tick count).


PS: IIRC F-root had a FreeBSD (ipf?) ruleset in front of it which dropped incoming DNS from a given IP (subnet?) over a certain packet volume (and counted the volume via the firewall packet counters). This was years back, so I'm not sure if they still do now they're anycast. And I suspect their normal traffic volumes might have been lower, and certainly the need to query the root legitimately is much lower. But I do remember Paul saying that some researchers doing bulk query load testing had complained his root server wasn't very fast, and he'd pointed out it was plenty fast, it was just ignoring their DoS attempt.

Reply | Parent | Thread

Tony Finch

from: fanf
date: 8th Jun 2012 10:06 (UTC)

Stéphane Bortzmeyer has a couple of blog posts on this topic: http://twitter.com/bortzmeyer/status/211012535321767936

Reply | Thread

Just a random swede


from: vatine
date: 8th Jun 2012 14:33 (UTC)

I think either a subtraction or a multiplication with <1 would work, you'd end up with different results in how sensitive you are to blocking and how long until you unblock.

For a constant query rate, the exponential model is more likely to have "can query / can not query" than a linear model.

In terms of operations, either is likely to disappear in cache fetching noise.

Reply | Thread

Tony Finch

Re: lytitl

from: fanf
date: 9th Jun 2012 14:36 (UTC)

The exponential model is the one I want since it allows the rate measurement code to be independent of the limits that you set. Here's a comment I just wrote in the code...

There is a periodic task to age old data out of the hash table. Every p seconds all the buckets are multiplied by an ageing factor 0 <= a < 1. If a client is querying at a constant rate r the measured value from the hash table r' comes from the sum of the resulting geometric progression, which is p * r / (1 - a). So to calculate the client's actual rate from the value in the hash table, r = r' * (1 - a) / p.

Edited at 2012-06-09 02:37 pm (UTC)

Reply | Parent | Thread