Log in

No account? Create an account



« previous entry | next entry »
6th Dec 2004 | 23:29

Back 5 years ago when I was working with adns, one of the things I played with a bit was a perl wrapper. adns is absolutely fantastic for bulk log processing - being able to do more than 10,000 concurrent queries so that your're using all your CPU and not blocking on the network is a god-send. However C makes this more painful than it ought to be.

I never finished the perl wrapper because other things became more important, and when I next had the time and the inclination to look at it Net::DNS existed, so I thought there would be little point.

I've been paying gradually more and more attention to SpamAssassin recently, and it uses Net::DNS's background query feature to run all its DNS queries concurrently with its pattern matching. As a result of this I've found out that Net::DNS's background query handling is utterly stupid: it uses a separate socket for each query, rather than stuffing them all down the same socket and using the DNS protocol's query ID field to tie responses to queries.

This causes excessive resource usage which greatly restricts the number of concurrent queries it can handle, even on a sensible OS. On Windows it dies if the concurrency goes above about 350, which occasionally happens with SpamAssassin. http://bugzilla.spamassassin.org/show_bug.cgi?id=3924

So now I have the bit between my teeth. Must f1xx0r!

| Leave a comment | Share

Comments {2}

James Antill

Probably because of Bind braindamage

from: illiterat
date: 7th Dec 2004 16:05 (UTC)

I presume you are talking about TCP sockets? I've done some experiments toward having a decent LGPL resolver and DNSD. So I can probably save you some hair pulling, the reason it uses one socket per query is that bind starts the TCP connection timeout when you connect, and each query it parses. And it does them syncronously (of course).

This means that if you send four queries down the TCP socket, and bind takes longer than the TCP connection timeout to process the second one (easiest way to simulate this is a delegation to a single server that isn't pingable) then it will "timeout" the entire TCP connection, so you then need to re-send the last two queries (and all the other queries will have had to wait the timeout).

I have a couple of ideas for how to do this sanely in my code, but the obvious one relies on Vstr so you can copy in O(1) time. Another idea is to make sure I only do large numbers of TCP requests against a half decent dnsd. Neither of which are really open to SA :).

Reply | Thread

Tony Finch

Re: Probably because of Bind braindamage

from: fanf
date: 8th Dec 2004 06:18 (UTC)

No, the Net::DNS bgsend() function only supports UDP queries and doesn't do any kind of retry or TCP fallback. It's braindead in very many ways.

Bind also returns TCP responses in the same order that the queries arrived, which the DNS protocol doesn't require.

In practice the crapness of TCP DNS implementations doesn't affect the performance of bulk lookups much, because most of them run happily over UDP.

Reply | Parent | Thread