?

Log in

No account? Create an account

fanf

The qmail ANY query bugs

« previous entry | next entry »
12th Jun 2012 | 11:52

The main interop problem with DNSSEC is that it makes large DNS packets a lot more common. This leads to problems with misconfigured firewalls, and with qmail.

When delivering a message, qmail makes an ANY query on the recipient domain. This is not, as some people have speculated, a "clever" attempt to get the MX or fallback A and AAAA records in one go - and in fact if any MTA tried to do that then it wouldn't avoid queries or save time. If a DNS cache already has any records for a domain, an ANY query won't make its resolver fetch the other types. So if there are A records but no MX records in an ANY response, the MTA cannot assume that it should use the fallback-to-A implicit MX logic. It has to make an MX query to verify if MX records exist, so trying the ANY query has not actually reduced the number of queries. The code ends up more complicated and slower than straightforwardly making MX+A+AAAA queries as RFC 5321 specifies.

So what are qmail's ANY queries for? There is exactly one point where it makes this query, which is when it is doing domain canonicalization of the envelope of outgoing messages. This is as specified by RFC 1123 section 5.2.2. However this requirement is obsolete and modern MTAs don't do it. You could fix qmail's ANY query bugs by just deleting the canonicalization code.

There are two bugs in the implementation which turn this unnecessary feature into an interoperability problem.

Originally qmail made a CNAME query in order to look up the canonical version of a domain, but this caused interop problems with BIND 4. This was replaced with an ANY query, which had fewer interop problems but is still wrong. Both of these queries are wrong because they don't trigger alias processing, so if there is a CNAME chain the response will not actually yield the canonical name. Because of this qmail has code that makes a series of queries to follow CNAME chains. If instead qmail made the correct query, an MX query (or A - it doesn't matter which), the response will include all the CNAME RRs that qmail wants to know about, and it would not need its inefficient CNAME chain handling code.

The other problem is that qmail uses a small DNS packet buffer, and does not resize and retry if a response is truncated. ANY queries make it much more likely for truncated-response failures to happen. The simplest fix is just to change the buffer size from 512 to 65536 (which is the maximum DNS message size) and let the virtual memory system do lazy memory allocation. This one-line patch is enough to fix qmail's DNSSEC problems, but it doesn't fix its CNAME chain problem. (Edit: but see the comments for a patch that does fix it by disabling the canonicalization code. That is the patch you want to work around DNS providers that have disabled support for ANY queries.)

--- dns.c~      1998-06-15 11:53:16.000000000 +0100
+++ dns.c       2013-01-10 12:33:56.000000000 +0000
@@ -21,7 +21,7 @@
 static unsigned short getshort(c) unsigned char *c;
 { unsigned short u; u = c[0]; return (u << 8) + c[1]; }

-static union { HEADER hdr; unsigned char buf[PACKETSZ]; } response;
+static union { HEADER hdr; unsigned char buf[65536]; } response;
 static int responselen;
 static unsigned char *responseend;
 static unsigned char *responsepos;

| Leave a comment | Share

Comments {13}

Res facta quae tamen fingi potuit

from: pauamma
date: 12th Jun 2012 13:33 (UTC)

The 512-byte reply buffer size (which IIRC, comes from qmail's insistence on using UDP only for DNS) has been causing problems since aol.com started using more MXs than could fit in a single datagram.

Reply | Thread

Tony Finch

from: fanf
date: 12th Jun 2012 13:49 (UTC)

Actually, qmail uses the standard resolver API, and the standard resolver does proper TCP fallback.

You are of course right that this is not a DNSSEC-specific interop bug :-)

Edited at 2012-06-12 01:50 pm (UTC)

Reply | Parent | Thread

Res facta quae tamen fingi potuit

from: pauamma
date: 12th Jun 2012 15:33 (UTC)

Actually, qmail uses the standard resolver API
Hmm. either that changed since I last looked at it, or I misremember thereason why it wouldn't accept responses larger than 512 bytes. (The latter is likely - it's been a long while since I looked at anything but postfix.)

Reply | Parent | Thread

UltraDNS starting to block UDP/ANY

from: anonymous
date: 9th Jan 2013 20:16 (UTC)

This problem gets amplified, as UltraDNS are now starting to block UDP/ANY packets (with a REFUSED). Any resolver that uses ANY packets (like qmail) and is trying to connect to a domain name w/ DNS from UltraDNS that doesn't have MX records (which are tried 1st), will be unable to resolve, and start bouncing mail.

Fixes (in addition to the one above to increase the response buffer)
- change the _ANY to a _CNAME lookup in the one place that qmail does it (dns.c)
- patch djbdns to force ANY queries via TCP (but this then breaks domains that don't listen on 53/tcp)

ideally you probably want to do the 1st, and think about the 2nd as some majors ISPs are braindead and their DNS doesn't listen on TCP (midco.net are you listening?)

Reply | Thread

Tony Finch

Re: UltraDNS starting to block UDP/ANY

from: fanf
date: 10th Jan 2013 10:07 (UTC)

UltraDNS are being stupid here, because if they returned a minimal truncated response their servers would be sending the same volume of responses without breaking qmail.

There was some (slightly badly informed) discission of this on Twitter on the 5th January between me, @solardiz, @dakami, @jedisct1 - see https://twitter.com/jedisct1/status/287349996741861376 which includes a link to a patch to djb's dnscache to force upstream ANY queries to TCP.

Here is an alternative patch (untested!) which simply disables qmail's unnecessary canonicalization code.
--- qmail-remote.c~     1998-06-15 11:53:16.000000000 +0100
+++ qmail-remote.c      2013-01-10 10:02:18.000000000 +0000
@@ -374,7 +374,7 @@
   while (*recips) {
     if (!saa_readyplus(&reciplist,1)) temp_nomem();
     reciplist.sa[reciplist.len] = sauninit;
-    addrmangle(reciplist.sa + reciplist.len,*recips,&flagalias,!relayhost);
+    addrmangle(reciplist.sa + reciplist.len,*recips,&flagalias,0);
     if (!flagalias) flagallaliases = 0;
     ++reciplist.len;
     ++recips;

Reply | Parent | Thread

Re: UltraDNS starting to block UDP/ANY

from: anonymous
date: 10th Jan 2013 18:31 (UTC)


You can also patch qmail, w/o disabling the whole canonicalization code like this:
--- qmail-1.03-original/dns.c Mon Jun 15 10:53:16 1998
+++ qmail-1.03/dns.c Wed Mar 5 11:31:13 2003
@@ -196,7 +196,7 @@
if (!sa->len) return loop;
if (sa->s[sa->len - 1] == ']') return loop;
if (sa->s[sa->len - 1] == '.') { --sa->len; continue; }
- switch(resolve(sa,T_ANY))
+ switch(resolve(sa,T_CNAME))
{
case DNS_MEM: return DNS_MEM;
case DNS_SOFT: return DNS_SOFT;

and you can list @brynen to the tweet list, on the grounds it's his original findings.

Reply | Parent | Thread

Tony Finch

Re: UltraDNS starting to block UDP/ANY

from: fanf
date: 10th Jan 2013 18:38 (UTC)

You can do that if you like but it's still horribly buggy for the reasons explained above. Remember the canonicalization code is completely worthless, so it is a waste of time to try to make it work when it is easy to disable.

Thanks for mentioning @brynen.

Reply | Parent | Thread

from: jrg.watching.org
date: 26th Mar 2013 12:10 (UTC)

There's actually already - and has been for a long time - ckd's "big dns" patch to dns.c, which makes the buffer dynamic. I have http://www.ckdhr.com/ckd/qmail-103.patch as the canonical location, and it still works.

Off to see whether the other issue is what's been "bugging" me for a while.

Reply | Thread

Tony Finch

from: fanf
date: 27th Mar 2013 11:14 (UTC)

That's a bit over-complicated :-)

Reply | Parent | Thread

from: anonymous
date: 21st Apr 2013 21:56 (UTC)

Hi,

===QOUTE BEGIN===
Originally qmail made a CNAME query in order to look up the canonical version of a domain, but this caused interop problems with BIND 4. This was replaced with an ANY query, which had fewer interop problems but is still wrong. Both of these queries are wrong because they don't trigger alias processing, so if there is a CNAME chain the response will not actually yield the canonical name. The correct query is an MX query (or A - it doesn't matter which); the response will include all the CNAME RRs that qmail wants to know about.
===QOUTE END===

I must admit that I've failed to see the point - other than MX/A query/responses being shorter
Could you elaborate on the ANY versus MX/A query in respect to qmail canonicalization?

Thanks in advance

Reply | Thread

Tony Finch

from: fanf
date: 24th Apr 2013 09:02 (UTC)

If you have a situation like the following

one.example.com CNAME two.example.com
two.example.com CNAME three.example.com
three.example.com MX mail.example.com

Then the canonical name is three.example.com. But qmail uses a query that suppresses alias processing in the name server: querying for one.example.com ANY (or CNAME) just returns the first record not the whole chain. So to fix this problem qmail's DNS code makes multiple queries to get the CNAME chain. But this is a pointless and wasteful duplication of the alias processing logic in the name server: qmail could just make an MX query and get the whole CNAME chain in one go.

Reply | Parent | Thread

from: anonymous
date: 5th May 2013 19:12 (UTC)

Hello Tony!
Thanks for this post, but I've a question...
I've fixed my QMAIL with this....

--- qmail-1.03-original/dns.c Mon Jun 15 10:53:16 1998
+++ qmail-1.03/dns.c Wed Mar 5 11:31:13 2003
@@ -196,7 +196,7 @@
if (!sa->len) return loop;
if (sa->s[sa->len - 1] == ']') return loop;
if (sa->s[sa->len - 1] == '.') { --sa->len; continue; }
- switch(resolve(sa,T_ANY))
+ switch(resolve(sa,T_CNAME))
{
case DNS_MEM: return DNS_MEM;
case DNS_SOFT: return DNS_SOFT;

Can I use your patch with this "T_CNAME" or I have to roolback to "T_ANY"?
With this patch i've solved my problem (for now..)
Thanks again

Reply | Thread

Tony Finch

from: fanf
date: 6th May 2013 13:29 (UTC)

Making a CNAME query is about as wrong as making an ANY query for the reasons discussed above. I do not think it will be more broken than ANY since BIND 4 is long gone, and that was apparently the thing that not like the CNAME queries.

Reply | Parent | Thread