?

Log in

No account? Create an account

fanf

Even the Deathstation 9000 can't screw up the BIND 9.10.4 fix

« previous entry | next entry »
20th May 2016 | 01:52

There was an interesting and instructive cockup in the BIND 9.10.4 release. The tl;dr is that they tried to improve performance by changing the layout of a data structure to use less memory. As a side effect, two separate sets of flags ended up packed into the same word. Unfortunately, these different sets of flags were protected by different locks. This overlap meant that concurrent access to one set of flags could interfere with access to the other set of flags. This led to a violation of BIND's self-consistency checks, causing a crash.

The fix uses an obscure feature of C structure bitfield syntax. If you declare a bitfield without a name and with a zero width (in this case, unsigned int :0) any following bitfields must be placed in the next "storage unit". This is described in the C standard section 6.7.2.1p12.

You can see this in action in the BIND DNS red-black tree declaration.

A :0 solves the problem, right?

So a clever idea might be to use this :0 bitfield separator so that the different sets of bitfields will be allocated in different words, and the different words will be protected by different locks.

What are the assumptions behind this fix?

If you only use a :0 separator, you are assuming that a "word" (in a vague hand-wavey sense) corresponds to both a "storage unit", which the compiler uses for struct layouts, and also to the size of memory access the CPU uses for bitfields, which matters when we are using locks to keep some accesses separate from each other.

What is a storage unit?

The C standard specifies the representation of bitfields in section 6.2.6.1p4:

Values stored in bit-fields consist of m bits, where m is the size specified for the bit-field. The object representation is the set of m bits the bit-field comprises in the addressable storage unit holding it.

Annex J, which covers portability issues, says:

The following are unspecified: [... yada yada ...]

The alignment of the addressable storage unit allocated to hold a bit-field

What is a storage unit really?

A "storage unit" is defined by the platform's ABI, which for BIND usually means the System V Application Binary Interface. The amd64 version covers bit fields in section 3.1.2. It says,

  • bit-fields must be contained in a storage unit appropriate for its declared type

  • bit-fields may share a storage unit with other struct / union members

In this particular case, the storage unit is 4 bytes for a 32 bit unsigned int. Note that this is less than the architecture's nominal 64 bit width.

How does a storage unit correspond to memory access widths?

Who knows?

What?

It is unspecified.

So...

The broken code declared a bunch of adjacent bit fields and forgot that they were accessed under different locks.

Now, you might hope that you could just add a :0 to split these bitfields into separate storage units, and the split would be enough to also separate the CPU's memory accesses to the different storage units.

But you would be assuming that the code that accesses the bitfields will be compiled to use unsigned int sized memory accesses to read and write the unsigned int sized storage units.

Um, do we have any guarantees about access size?

Yes! There is sig_atomic_t which has been around since C89, and a load of very recent atomic stuff. But none of it is used by this part of BIND's code.

(Also, worryingly, the AMD64 SVABI does not mention "atomic".)

So what is this "Deathstation 9000" thing then?

The DS9K is the canonical epithet for the most awkward possible implementation of C, which follows the standard to the letter, but also violates common assumptions about how C works in practice. It is invoked as a horrible nightmare, but in recent years it has become a disastrous reality as compilers have started exploiting undefined behaviour to support advanced optimizations.

(Sadly the Armed Response Technologies website has died, and Wikipedia's DS9K page has been deleted. About the only place you can find information about it is in old comp.lang.c posts...)

And why is the DS9K relevant?

Well, in this case we don't need to go deep into DS9K territory. There are vaguely reasonable ABIs for which a small :0 fix would not actually work.

For instance, there might be a CPU which can only do 64 bit memory accesses, but which has a 32 bit int storage unit. This type representation would probably mean the C compiler has really bad performance, so it is fairly unlikely. But it is allowed, and there are (or were) CPUs which can't do sub-word memory accesses, and which have very little RAM so they want to pack data tightly.

On a CPU like this, the storage unit doesn't match the memory access, so C's :0 syntax for skipping to the next storage unit will fail to achieve the desired effect of isolating memory accesses that have to be performed under different concurrent access locks.

DS9K defence technology

So the BIND 9.10.4 fix does two things:

The most important change is to move one of the sets of bitfields from the start of the struct to the end of it. This means there are several pointers in between the fields protected by one lock and the fields protected by the other lock, so even a DS9K can't reasonably muddle them up.

Secondly, they used magical :0 syntax and extensive comments to (hopefully) prevent the erroneous compaction from happening again. Even if the bitfields get moved back to the start of the struct (so the pointers no longer provide insulation) the :0 might help to prevent the bug from causing crashes.

(edited to add)

When I wrapped this article up originally, I forgot to return to sig_atomic_t. If, as a third fix, these bitfields were also changed from unsigned int to sig_atomic_t, it would further help to avoid problems on systems where the natural atomic access width is wider than an int, like the lesser cousin of the DS9K I outlined above. However, sig_atomic_t can be signed or unsigned so it adds a new portability wrinkle.

Conclusion

The clever :0 is not enough, and the BIND developers were right not to rely on it.

| Leave a comment | Share

Comments {8}

ewx

from: ewx
date: 20th May 2016 08:02 (UTC)

C99 tells us almost nothing about 'storage units' - pretty much just that they are 'addressable'. No alignment restrictions, no requirement that they're all the same size, no requirement that each bitfield is contained in exactly one storage unit, no explicit prohibition on sharing them with non-bitfields. We can see that they can be adjacent to one another or in a 'next' relationship but that doesn't explicitly rule out gaps between them.
As such I think an implementation could state that the storage units of a given struct are bounded only by its endpoints and any :0 bit-fields. Yuck.

Reply | Thread

Gerald the cuddly duck

from: gerald_duck
date: 20th May 2016 11:46 (UTC)

Even sig_atomic_t is helpless in the face of multi-threading. Until comparatively recently, the C++ standard was entirely silent on how two threads manipulating the same memory would interact. C++11 started talking about it, and I assume recent C standards do, too?

But I'll bet Bind expects to compile and run on older compilers, too!

Multi-threading is an intensely subtle and perilous field. Getting it right keeps me in good money.

Reply | Thread

Tony Finch

from: fanf
date: 20th May 2016 11:50 (UTC)

C11 basically adopted the same memory model as C++11.

I'm not sure what BIND's portability target is; for instance, it includes its own atomics implementation, and I don't know what happens if you are on a platform they haven't ported to...

Reply | Parent | Thread

from: rsedmonds
date: 21st May 2016 20:38 (UTC)

The machine-specific atomics implementations are an optional optimization. (Heck, even threading is still an optional optimization in BIND.) I think it falls back to a generic implementation if you're on a platform they haven't ported it to. Personally, I think they're more trouble than they're worth -- see, e.g., https://bugs.debian.org/778720 -- and ought to be replaced with calls to the equivalent machine-independent compiler-provided intrinsics, on compilers that support those intrinsics.

I'm not sure if ISC has a formally defined portability target for BIND (at least, I don't think they did when I was there). But you'll occasionally see commits targeting systems that have long since been abandoned by their vendors, e.g.:

https://source.isc.org/cgi-bin/gitweb.cgi/?p=bind9.git;a=commitdiff;h=6f6b1abb106d970f1b0c5e959cddfabc9cc3b4ae

Reply | Parent | Thread

Tony Finch

from: fanf
date: 21st May 2016 21:32 (UTC)

Hmm yes, though apparently the custom locks are still being improved - https://source.isc.org/cgi-bin/gitweb.cgi?p=bind9.git;a=commitdiff;h=cf24cbd8

I am a bit surprised that the common platform libraries don't include good adaptive spinlock implementations, at least a lot better than 17 years ago...

Reply | Parent | Thread

Evan

from: mr_cellaneous
date: 22nd May 2016 06:15 (UTC)

BIND's portability target when it was originally written was C90 and POSIX.1, but there has unavoidably been some drift, as we can't test everywhere.

The practical answer is, if there's a portability problem on any current Linux, any current BSD, MacOSX, Solaris, or Windows 2008 and higher, we'll be in a position to notice and fix it.

Reply | Parent | Thread

cartesiandaemon

from: cartesiandaemon
date: 20th May 2016 12:37 (UTC)

How have I not read about DS9K before? That's a perfect description of what architecture you aim for (or preferably not aim for).

Is there a requirement that DS9K endianness is consistent...?

Reply | Thread

Tony Finch

from: fanf
date: 20th May 2016 12:55 (UTC)

David Malone found a copy of the Armed Response Technologies site on archive.org - https://twitter.com/dwmal1/status/733539073864785920 - https://web.archive.org/web/20140208152223/http://dspace.dial.pipex.com/town/green/gfd34/art/ - lots of DS9K jokes in there.

AFAICT, C requires endianness to be specified or implementation-defined, but it doesn't require different types to have consistent representations. In practice, middle-endian floating point exists in several real-world architectures; a DS9K could have an almost arbitrarily crazy mapping from bits to values.

The ART DS9K randomly changes the value of CHAR_BIT for each compilation of your program.

Reply | Parent | Thread