Top.Mail.Ru
? ?

fanf

21st century lighting: LED tubes

28th Aug 2020 | 00:28

on my Dreamwidth

| Leave a comment | | Flag

fanf

Open source as a byproduct

11th Jan 2019 | 01:53

On Dreamwidth: https://fanf.dreamwidth.org/130864.html

| Leave a comment | | Flag

fanf

Notes on DNS server upgrades

2nd Jan 2019 | 20:14

On my work blog: https://www.dns.cam.ac.uk/news/2019-01-02-upgrade-notes.html

| Leave a comment | | Flag

fanf

Boxing day ham with bubble and squeak

26th Dec 2018 | 20:48

On Dreamwidth: https://fanf.dreamwidth.org/130405.html

| Leave a comment | | Flag

fanf

Yak shaving

1st Aug 2018 | 23:37

On Dreamwidth: https://fanf.dreamwidth.org/127865.html

| Leave a comment | | Flag

fanf

Beer Festival week hacking notes - Tuesday

23rd May 2018 | 00:35

On Dreamwidth: https://fanf.dreamwidth.org/126764.html

| Leave a comment | | Flag

fanf

Beer Festival week! Hacking notes for Monday

22nd May 2018 | 01:00

On Dreamwidth: https://fanf.dreamwidth.org/126678.html

| Leave a comment | | Flag

fanf

More on web server smoke tests

11th May 2018 | 01:24

On Dreamwidth: https://fanf.dreamwidth.org/126334.html

| Leave a comment | | Flag

fanf

How to test a web server before pointing the DNS at it

9th May 2018 | 10:37

On Dreamwidth: https://fanf.dreamwidth.org/126057.html

| Leave a comment | | Flag

fanf

DNSSEC localized validation

9th Apr 2018 | 23:16

On Dreamwidth: https://fanf.dreamwidth.org/125531.html

| Leave a comment | | Flag

fanf

My comments on the ICANN root DNSSEC key rollover

28th Mar 2018 | 10:12

On Dreamwidth: https://fanf.dreamwidth.org/125335.html

| Leave a comment | | Flag

fanf

IPv6 DAD-die issues

27th Mar 2018 | 17:10

On Dreamwidth: https://fanf.dreamwidth.org/125001.html

| Leave a comment | | Flag

fanf

IETF 101 - Thursday and Friday

23rd Mar 2018 | 15:45

On Dreamwidth: https://fanf.dreamwidth.org/124833.html

| Leave a comment | | Flag

fanf

IETF 101 - Wednesday

22nd Mar 2018 | 14:15

On Dreamwidth: https://fanf.dreamwidth.org/124571.html

| Leave a comment | | Flag

fanf

IETF 101 - Tuesday

21st Mar 2018 | 13:48

On Dreamwidth: https://fanf.dreamwidth.org/124221.html

| Leave a comment | | Flag

fanf

IETF 101 - Monday

20th Mar 2018 | 10:09

On Dreamwidth: https://fanf.dreamwidth.org/124007.html

| Leave a comment | | Flag

fanf

IETF 101 Hackathon, day 2

18th Mar 2018 | 23:09

On Dreamwidth: https://fanf.dreamwidth.org/123738.html

| Leave a comment | | Flag

fanf

IETF 101 Hackathon, day 1

17th Mar 2018 | 23:36

On Dreamwidth: https://fanf.dreamwidth.org/123507.html

| Leave a comment | | Flag

fanf

Secure Let's Encrypt ACME DNS challenges with BIND

7th Mar 2018 | 23:15

On Dreamwidth: https://fanf.dreamwidth.org/123294.html

| Leave a comment | | Flag

fanf

Bloggone

17th Feb 2018 | 19:30

I’ve posted an entry at https://fanf.dreamwidth.org/122985.html

| Leave a comment | | Flag

fanf

Moving to Dreamwidth

6th Apr 2017 | 10:00

I am in the process of moving this blog to https://fanf.dreamwidth.org/ - follow me there!

| Leave a comment | | Flag

fanf

A review of Ansible Vault

30th Mar 2017 | 15:38

Ansible is the configuration management tool we use at work. It has built-in support for encrypted secrets, called ansible-vault, so you can safely store secrets in version control.

I thought I should review the ansible-vault code.

Summary

It's a bit shoddy but probably OK, provided you have a really strong vault password.

HAZMAT

The code starts off with a bad sign:

    from cryptography.hazmat.primitives.hashes import SHA256 as c_SHA256
    from cryptography.hazmat.primitives.kdf.pbkdf2 import PBKDF2HMAC
    from cryptography.hazmat.backends import default_backend

I like the way the Python cryptography library calls this stuff HAZMAT but I don't like the fact that Ansible is getting its hands dirty with HAZMAT. It's likely to lead to embarrassing cockups, and in fact Ansible had an embarrassing cockup - there are two vault ciphers, "AES" (the cockup, now disabled except that for compatibility you can still decrypt) and "AES256" (fixed replacement).

As a consequence of basing ansible-vault on relatively low-level primitives, it has its own Python implementations of constant-time comparison and PKCS#7 padding. Ugh.

Good

Proper random numbers:

    b_salt = os.urandom(32)

Poor

Iteration count:

    b_derivedkey = PBKDF2(b_password, b_salt,
                          dkLen=(2 * keylength) + ivlength,
                          count=10000, prf=pbkdf2_prf)

PBKDF2 HMAC SHA256 takes about 24ms for 10k iterations on my machine, which is not bad but also not great - e.g. 1Password uses 100k iterations of the same algorithm, and gpg tunes its non-PBKDF2 password hash to take (by default) at least 100ms.

The deeper problem here is that Ansible has hard-coded the PBKDF2 iteration count, so it can't be changed without breaking compatibility. In gpg an encrypted blob includes the variable iteration count as a parameter.

Ugly

ASCII armoring:

    b_vaulttext = b'\n'.join([hexlify(b_salt),
                              to_bytes(hmac.hexdigest()),
                              hexlify(b_ciphertext)])
    b_vaulttext = hexlify(b_vaulttext)

The ASCII-armoring of the ciphertext is as dumb as a brick, with hex-encoding inside hex-encoding.

File handling

I also (more briefly) looked through ansible-vault's higher-level code for managing vault files.

It is based on handing decrypted YAML files to $EDITOR, so it's a bit awkward if you don't want to wrap secrets in YAML or if you don't want to manipulate them in your editor.

It uses mkstemp(), so the decrypted file can be placed on a ram disk, though you might have to set TMPDIR to make sure.

It shred(1)s the file after finishing with it.

| Leave a comment | | Flag

fanf

Better terminology

27th Mar 2017 | 17:25

Because I just needed to remind myself of this, here are a couple of links with suggestions foe less accidentally-racist computing terminology:

Firstly, Kerri Miller suggests blocklist / safelist for naming deny / allow data sources.

Less briefly, Bryan Liles got lots of suggestions for better distributed systems terminology. If you use a wider vocabulary to describe your systems, you can more quickly give your reader a more precise idea of how your systems work, and the relative roles of their parts.

| Leave a comment | | Flag

fanf

keepalived DNS health checker revamp

2nd Feb 2017 | 16:07

Today I rolled out a significant improvement to the automatic recovery system on Cambridge University's recursive DNS servers. This change was because of three bugs.

BIND RPZ catatonia

The first bug is that sometimes BIND will lock up for a few seconds doing RPZ maintenance work. This can happen with very large and frequently updated response policy zones such as the Spamhaus Domain Block List.

When this happens on my servers, keepalived starts a failover process after a couple of seconds - it is deliberately configured to respond quickly. However, BIND soon recovers, so a few seconds later keepalived fails back.

BIND lost listening socket

This brief keepalived flap has an unfortunate effect on BIND. It sees the service addresses disappear, so it closes its listening sockets, then the service addresses reappear, so it tries to reopen its listening sockets.

Now, because the server is fairly busy, it doesn't have time to clean up all the state from the old listening socket before BIND tries to open the new one, so BIND gets an "address already in use" error.

Sadly, BIND gives up at this point - it does not keep trying periodically to reopen the socket, as you might hope.

Holy health check script, Bat Man!

At this point BIND is still listening on most of the interface addresses, except for a TCP socket on the public service IP address. Ideally this should have been spotted by my health check script, which should have told keepalived to fail over again.

But there's a gaping hole in the health checker's coverage: it only tests the loopback interfaces!

In a fix

Ideally all three of these bugs should be fixed. I'm not expert enough to fix the BIND bugs myself, since they are in some of the gnarliest bits of the code, so I'll leave them to the good folks at ISC.org. Even if they are fixed, I still need to fix my health check script so that it actually checks the user-facing service addresses, and there's no-one else I can leave that to.

Previously...

I wrote about my setup for recursive DNS server failover with keepalived when I set it up a couple of years ago. My recent work leaves the keepalived configuration bascially unchanged, and concentrates on the health check script.

For the purpose of this article, the key feature of my keepalived configuration is that it runs the health checker script many times per second, in order to fake up dynamically reconfigurable server priorities. The old script did DNS queries inline, which was OK when it was only checking loopback addresses, but the new script needs to make typically 16 queries which is getting a bit much.

Daemonic decoupling

The new health checker is split in two.

The script called by keepalived now just examines the contents of a status file, so it runs predictably fast regardless of the speed of DNS responses.

There is a separate daemon which performs the actual health checks, and writes the results to the status file.

The speed thing is nice, but what is really important is that the daemon is naturally stateful in a way the old health checker could not be. When I started I knew statefulness was necessary because I clearly needed some kind of hysteresis or flap damping or hold-down or something.

This is much more complex

https://www.youtube.com/watch?v=DNb4VKln1uw

There is this theory of the Möbius: a twist in the fabric of space where time becomes a loop

  • BIND observes the list of network interfaces, and opens and closes listening sockets as addresses come and go.

  • The health check daemon verifies that BIND is responding properly on all the network interface addresses.

  • keepalived polls the health checker and brings interfaces up and down depending on the results.

Without care it is inevitable that unexpected interactions between these components will destroy the Enterprise!

Winning the race

The health checker gets into races with the other daemons when interfaces are deleted or added.

The deletion case is simpler. The health checker gets the list of addresses, then checks them all in turn. If keepalived deletes an address during this process then the checker can detect a failure - but actually, it's OK if we don't get a respose from a missing address! Fortunately there is a distinctive error message in this case which the health checker can treat as an alternative successful response.

New interfaces are more tricky, because the health checker needs to give BIND a little time to open its sockets. It would be really bad if the server appears to be healthy, so keepalived brings up the addresses, which the health checker tests before BIND is ready, causing it to immediately fail - a huge flap.

Back off

The main technique that the new health checker uses to suppress flapping is exponential backoff.

Normally, when everything is working, the health checker queries every network interface address, writes an OK to the status file, then sleeps for 1 second before looping.

When a query fails, it immediately writes BAD to the status file, and sleeps for a while before looping. The sleep time increases exponentially as more failures occur, so repeated failures cause longer and longer intervals before the server tries to recover.

Exponential backoff handles my original problem somewhat indirectly: if there's a flap that causes BIND to lose a listening socket, there will then be a (hopefully short) series of slower and slower flaps until eventually a flap is slow enough that BIND is able to re-open the socket and the server recovers. I will probably have to tune the backoff parameters to minimize the disruption in this kind of event.

Hold down

Another way to suppress flapping is to avoid false recoveries.

When all the test queries succeed, the new health checker decreases the failure sleep time, rather than zeroing it, so if more failures occur the exponential backoff can continue. It still reports the success immediately to keepalived, because I want true recoveries to be fast, for instance if the server accidentally crashes and is restarted.

The hold-down mechanism is linked to the way the health checker keeps track of network interface addresses.

After an interface goes away the checker does not decrease the sleep time for several seconds even if the queries are now working OK. This hold-down is supposed to cover a flap where the interface immediately returns, in which case we want exponential backoff to continue.

Similarly, to avoid those tricky races, we also record the time when each interface is brought up, so we can ignore failures that occur in the first few seconds.

Result

It took quite a lot of headscratching and trial and error, but in the end I think I came up with something resonably simple. Rather than targeting it specifically at failures I have observed in production, I have tried to use general purpose robustness techniques, and I hope this means it will behave OK if some new weird problem crops up.

Actually, I hope NO new weird problems crop up!

PS. the ST:TNG quote above is because I have recently been listening to my old Orbital albums again - https://www.youtube.com/watch?v=RlB-PN3M1vQ

| Leave a comment (4) | | Flag

fanf

Even more compact encoding of the leap seconds list

11th Jan 2017 | 20:54

Following on from my recent item about the leap seconds list I have come up with a better binary encoding which is half the size of my previous attempt. (Compressing it with deflate no longer helps.)

Here's the new binary version of the leap second list, 15 bytes displayed as hexadecimal in network byte order. I have not updated it to reflect the latest Bulletin C which was promulgated this week, because the official leap second lists have not yet been updated.

    001111111211343 12112229D5652F4

This is a string of 4-bit nybbles, upper nybble before lower nybble of each byte.

If the value of this nybble V is less than 0x8, it is treated as a pair of nybbles 0x9V. This abbreviates the common case.

Otherwise, we consider this nybble Q and the following nybble V as a pair 0xQV.

Q contains four flags,

    +-----+-----+-----+-----+
    |  W  |  M  |  N  |  P  |
    +-----+-----+-----+-----+

W is for width:

  • W == 1 indicates a QV pair in the string.
  • W == 0 indicates this is a bare V nybble.

M is the month multiplier:

  • M == 1 indicates the nybble V is multiplied by 1
  • M == 0 indicates the nybble V is multiplied by 6

NP together are NTP-compatible leap indicator bits:

  • NP == 00 indicates no leap second
  • NP == 01 == 1 indicates a positive leap second
  • NP == 10 == 2 indicates a negative leap second
  • NP == 11 == 3 indicates an unknown leap second

The latter is equivalent to the ? terminating the text version.

The event described by NP occurs a number of months after the previous event, given by

    (M ? 1 : 6) * (V + 1).

That is, a 6 month gap can be encoded as M=1, V=5 or as M=0, V=0.

The "no leap second" option comes into play when the gap between leap seconds is too large to fit in 4 bits. In this situation you encode a number of "no leap second" gaps until the remaining gap fits.

The recommended way to break up long gaps is as follows. Gaps up to 16 months can be encoded in one QV pair. Gaps that are a multiple of 6 months long should be encoded as a number of 16*6 month gaps, followed by the remainder. Other gaps should be rounded down to a whole number of years and encoded as a X*6 month gap, which is followed by a gap for the remaining few months.

To align the list to a whole number of bytes, add a redundant 9 nybble to turn a bare V nybble into a QV pair.

In the current leap second list, every gap is encoded as a single V nybble, except for the 84 month gap which is encoded as QV = 0x9D, and the last 5 months encoded as QV = 0xF4.

Once the leap second lists have been updated, the latest Bulletin C will change the final nybble from 4 to A.

The code now includes decoding as well as encoding functions, and a little fuzz tester to ensure they are consistent with each other. http://dotat.at/cgi/git/leapseconds.git.

| Leave a comment | | Flag

fanf

Named and optional function arguments in C99

10th Jan 2017 | 15:37

Here's an amusing trick that I was just discussing with Mark Wooding on IRC.

Did you know you can define functions with optional and/or named arguments in C99? It's not even completely horrible!

The main limitation is there must be at least one non-optional argument, and you need to compile with -std=c99 -Wno-override-init.

(I originally wrote that this code needs C11, but Miod Vallat pointed out it works fine in C99)

The pattern works like this, using a function called repeat() as an example.

    #include <stdio.h>
    #include <stdlib.h>
    #include <string.h>

    // Define the argument list as a structure. The dummy argument
    // at the start allows you to call the function with either
    // positional or named arguments.

    struct repeat_args {
        void *dummy;
        char *str;
        int n;
        char *sep;
    };

    // Define a wrapper macro that sets the default values and
    // hides away the rubric.

    #define repeat(...) \
            repeat((struct repeat_args){ \
                    .n = 1, .sep = " ", \
                    .dummy = NULL, __VA_ARGS__ })

    // Finally, define the function,
    // but remember to suppress macro expansion!

    char *(repeat)(struct repeat_args a) {
        if(a.n < 1)
            return(NULL);
        char *r = malloc((a.n - 1) * strlen(a.sep) +
                         a.n * strlen(a.str) + 1);
        if(r == NULL)
            return(NULL);
        strcpy(r, a.str);
        while(a.n-- > 1) { // accidentally quadratic
            strcat(r, a.sep);
            strcat(r, a.str);
        }
        return(r);
    }

    int main(void) {

        // Invoke it like this
        printf("%s\n", repeat(.str = "ho", .n = 3));

        // Or equivalently
        printf("%s\n", repeat("ho", 3, " "));

    }

| Leave a comment (2) | | Flag

fanf

QP trie news roundup

10th Jan 2017 | 00:39

Firstly, I have to say that it's totally awesome that I am writing this at all, and it's entirely due to the cool stuff done by people other than me. Yes! News about other people doing cool stuff with my half-baked ideas, how cool is that?

CZ.NIC Knot DNS

OK, DNS is approximately the ideal application for tries. It needs a data structure with key/value lookup and lexically ordered traversal.

When qp tries were new, I got some very positive feedback from Marek Vavrusa who I think was at CZ.NIC at the time. As well as being the Czech DNS registry, they also develop their own very competitive DNS server software. Clearly the potential for a win there, but I didn't have time to push a side project to production quality, nor any expectation that anyone else would do the work.

But, in November I got email from Vladimír Čunát telling me he had reimplemented qp tries to fix the portability bugs and missing features (such as prefix searches) in my qp trie code, and added it to Knot DNS. Knot was previously using a HAT trie.

Vladimír said qp tries could reduce total server RSS by more than 50% in a mass hosting test case. The disadvantage is that they are slightly slower than HAT tries, e.g. for the .com zone they do about twice as many memory indirections per lookup due to checking a nybble per node rather than a byte per node.

On balance, qp tries were a pretty good improvement. Thanks, Vladimír, for making such effective use of my ideas!

(I've written some notes on more memory-efficient DNS name lookups in qp tries in case anyone wants to help close the speed gap...)

Rust

Shortly before Christmas I spotted that Frank Denis has a qp trie implementation in Rust!

Sadly I'm still only appreciating Rust from a distance, but when I find some time to try it out properly, this will be top of my list of things to hack around with!

I think qp tries are an interesting test case for Rust, because at the core of the data structure is a tightly packed two word union with type tags tucked into the low order bits of a pointer. It is dirty low-level C, but in principle it ought to work nicely as a Rust enum, provided Rust can be persuaded to make the same layout optimizations. In my head a qp trie is a parametric recursive algebraic data type, and I wish there were a programming language with which I could express that clearly.

So, thanks, Frank, for giving me an extra incentive to try out Rust! Also, Frank's Twitter feed is ace, you should totally follow him.

Time vs space

Today I had a conversation on Twitter with @tef who has some really interesting ideas about possible improvements to qp tries.

One of the weaknesses of qp-tries, at least in my proof-of-concept implementation, is the allocator is called for every insert or delete. C's allocator is relatively heavyweight (compared to languages with tightly-coupled GCs) so it's not great to call it so frequently.

(Bagwell's HAMT paper was a major inspiration for qp tries, and he goes into some detail describing his custom allocator. It makes me feel like I'm slacking!)

There's an important trade-off between small memory size and keeping some spare space to avoid realloc() calls. I have erred on the side of optimizing for simple allocator calls and small data structure size at the cost of greater allocator stress.

@tef suggested adding extra space to each node for use as a write buffer, in a similar way to "fractal tree" indexes.. As well as avoiding calls to realloc(), a write buffer could avoid malloc() calls for inserting new nodes. I was totally nerd sniped by his cool ideas!

After some intensive thinking I worked out a sketch of how write buffers might amortize allocation in qp tries. I don't think it quite matches what tef had in mind, but it's definitely intriguing. It's very tempting to steal some time to turn the sketch into code, but I fear I need to focus more on things that are directly helpful to my colleagues...

Anyway, thanks, tef, for the inspiring conversation! It also, tangentially, led me to write this item for my blog.

| Leave a comment | | Flag

fanf

Compact encoding of the leap seconds list

8th Jan 2017 | 00:58

The list of leap seconds is published in a number of places. Most authoritative is the IERS list published from the Paris Observatory and most useful is the version published by NIST. However neither of them are geared up for distributing leap seconds to (say) every NTP server.

For a couple of years, I have published the leap second list in the DNS as a set of fairly human-readable AAAA records. There's an example in my message to the LEAPSECS list though I have changed the format to avoid making any of them look like real IPv6 addresses.

Poul-Henning Kamp also publishes leap second information in the DNS though he only publishes an encoding of the most recent Bulletin C rather than the whole history.

I have recently added a set of PHK-style A records at leapsecond.dotat.at listing every leap second, plus an "illegal" record representing the point after which the difference between TAI and UTC is unknown. I have also added a next.leapsecond.dotat.at which should be the same as PHK's leapsecond.utcd.org.

There are also some HINFO records that briefly explain the other records at leapsecond.dotat.at.

One advantage of using the DNS is that the response can be signed and validated using DNSSEC. One disadvantage is that big lists of addresses rapidly get quite unwieldy. There can also be problems with DNS messages larger than 512 bytes. For leapsecond.dotat.at the answers can get a bit chunky:

plainDNSSEC
A485692
AAAA8211028

Terse

I've thought up a simple and brief way to list the leap seconds, which looks like this:

    6+6+12+12+12+12+12+12+12+18+12+12+24+30+24+12+18+12+12+18+18+18+84+36+42+36+18+5?

ABNF syntax:

    leaps  =  *leap end
    leap   =  gap delta
    end    =  gap "?"
    delta  =  "-" / "+"
    gap    =  1*DIGIT

Each leap second is represented as a decimal number, which counts the months since the previous leap second, followed by a "+" or a "-" to indicate a positive or negative leap second. The sequence starts at the beginning of 1972, when TAI-UTC was 10s.

So the first leap is "6+", meaning that 6 months after the start of 1972, a leap second increases TAI-UTC to 11s. The "84+" leap represents the gap between the leap seconds at the end of 1998 and end of 2005 (7 * 12 months).

The list is terminated by a "?" to indicate that the IERS have not announced what TAI-UTC will be after that time.

Rationale

ITU recommendation TF.460-6 specifies in paragraph 2.1 that leap seconds can happen at the end of any month, though in practice the preference for the end of December or June has never been overridden. Negative leap seconds are also so far only a theoretical possibility.

So, counting months between leaps is the largest radix permitted, giving the smallest (shortest) numbers.

I decided to use decimal and mnemonic separators to keep it simple.

Binary

If you want something super compact, then bit-banging is the way to do it. Here's a binary version of the leap second list, 29 bytes displayed as hexadecimal in network byte order.

46464c4c 4c4c4c4c 4c524c4c 585e584c 524c4c52 52523c58 646a6452 85

Each byte has a 2-bit two's complement signed delta in the most significant bits, and a 6-bit unsigned month count in the least significant bits.

The meaning of the delta field is as follows (including the hex, binary, and decimal values):

  • 0x40 = 01 = +1 = positive leap second
  • 0x00 = 00 = 0 = no leap second
  • 0xC0 = 11 = -1 = negative leap second
  • 0x80 = 10 = -2 = end of list, like "?"

So, for example, a 6 month gap followed by a positive leap second is 0x40 + 6 == 0x46. A 12 month gap is 0x40 + 12 == 0x4c.

The "no leap second" option comes into play when the gap between leap seconds is too large to fit in 6 bits, i.e. more than 63 months, e.g. the 84 month gap between the ends of 1998 and 2005. In this situation you encode a number of "no leap second" gaps until the remaining gap is less than 63.

I have encoded the 84 month gap as 0x3c58, i.e. 0x00 + 0x3c is 60 months followed by no leap second, then 0x40 + 0x18 is 24 months followed by a positive leap second.

Compression

There's still quite a lot of redundancy in the binary encoding. It can be reduced to 24 bytes using RFC 1951 DEFLATE compression.

Publication

There is now a TXT record at leapsecond.dotat.at containing the human-readable terse form of the leap seconds list. This is gives you a 131 byte plain DNS response, or a 338 byte DNSSEC signed response.

I've published the deflated binary version using a private-use TYPE65432 record which saves 58 bytes.

There is code to download and check the consistency of the leapseconds files from the IERS, NIST, and USNO, generate the DNS records, and update the DNS if necessary, at http://dotat.at/cgi/git/leapseconds.git.

| Leave a comment (2) | | Flag

fanf

Domain transfers are shocking

19th Oct 2016 | 14:25

I have a bit of a bee in my bonnet about using domain names consistently as part of an organization's branding and communications. I don't much like the proliferation of special-purpose or short-term vanity domains.

They are particularly vexing when I am doing something security-sensitive. For example, domain name transfers. I'd like to be sure that someone is not trying to race with my transfer and steal the domain name, say.

Let's have a look at a practical example: transfering a domain from Gandi to Mythic Beasts.

(I like Gandi, but getting the University to pay their domain fees is a massive chore. So I'm moving to Mythic Beasts, who are local, friendly, accommodating, and able to invoice us.)

Edited to add: The following is more ranty and critical than is entirely fair. I should make it clear that both Mythic Beasts and Gandi are right at the top of my list of companies that it is good to work with.

</p>This just happens to be an example where I get to see both ends of the transfer. In most cases I am transferring to or from someone else, so I don't get to see the whole process, and the technicalities are trivial compared to the human co-ordination!</p>

First communication

    Return-Path: <opensrs-bounce@registrarmail.net>
    Message-Id: <DIGITS.DATE-osrs-transfers-DIGITS@cron01.osrs.prod.tucows.net>
    From: "Transfer" <do_not_reply@ns-not-in-service.com>
    Subject: Transfer Request for EXAMPLE.ORG

    https://approve.domainadmin.com/transfer/?domain=EXAMPLE.ORG

A classic! Four different domain names, none of which identify either of our suppliers! But I know Mythic Beasts are an OpenSRS reseller, and OpenSRS is a Tucows service.

Let's see what whois has to say about the others...

    Domain Name: REGISTRARMAIL.NET
    Registrant Name: Domain Admin
    Registrant Organization: Yummynames.com
    Registrant Street: 96 Mowat Avenue
    Registrant City: Toronto
    Registrant Email: whois@yummynames.com

"Yummynames". Oh kaaaay.

    Domain Name: YUMMYNAMES.COM
    Registrant Name: Domain Admin
    Registrant Organization: Tucows.com Co.
    Registrant Street: 96 Mowat Ave.
    Registrant City: Toronto
    Registrant Email: tucowspark@tucows.com

Well I suppose that's OK, but it's a bit of a rabbit hole.

Also,

    $ dig +short mx registrarmail.net
    10 mx.registrarmail.net.cust.a.hostedemail.com.

Even more generic than Fastmail's messagingengine.com infrastructure domain :-)

    Domain Name: HOSTEDEMAIL.COM
    Registrant Name: Domain Admin
    Registrant Organization: Tucows Inc
    Registrant Street: 96 Mowat Ave.
    Registrant City: Toronto
    Registrant Email: domain_management@tucows.com

The domain in the From: address, ns-not-in-service.com is an odd one. I have seen it in whois records before, in an obscure context. When a domain needs to be cancelled, there can sometimes be glue records inside the domain which also need to be cancelled. But they can't be cancelled if other domains depend on those glue records. So, the registrar renames the glue records into a place-holder domain, allowing the original domain to be cancelled.

So it's weird to see one of these cancellation workaround placeholder domains used for customer communications.

    Domain Name: NS-NOT-IN-SERVICE.COM
    Registrant Name: Tucows Inc.
    Registrant Organization: Tucows Inc.
    Registrant Street: 96 Mowat Ave
    Registrant City: Toronto
    Registrant Email: corpnames@tucows.com

Tucows could do better at keeping their whois records consistent!

Finally,

    Domain Name: DOMAINADMIN.COM
    Registrant Name: Tucows.com Co. Tucows.com Co.
    Registrant Organization: Tucows.com Co.
    Registrant Street: 96 Mowat Ave
    Registrant City: Toronto
    Registrant Email: corpnames@tucows.com

So good they named it twice!

Second communication

    Return-Path: <bounce+VERP@bounce.gandi.net>
    Message-ID: <DATE.DIGITS@brgbnd28.bi1.0x35.net>
    From: "<noreply"@domainnameverification.net
    Subject: [GANDI] IMPORTANT: Outbound transfer of EXAMPLE.ORG to another provider

    http://domainnameverification.net/transferout_foa/?fqdn=EXAMPLE.ORG

The syntactic anomaly in the From: line is a nice touch.

Both 0x35.net and domainnameverification.net belong to Gandi.

    Registrant Name: NOC GANDI
    Registrant Organization: GANDI SAS
    Registrant Street: 63-65 Boulevard MASSENA
    Registrant City: Paris
    Registrant Email: noc@gandi.net

Impressively consistent whois :-)

Third communication

    Return-Path: <opensrs-bounce@registrarmail.net>
    Message-Id: <DIGITS.DATE-osrs-transfers-DIGITS@cron01.osrs.prod.tucows.net>
    From: "Transfers" <dns@mythic-beasts.com>
    Subject: Domain EXAMPLE.ORG successfully transferred

OK, so this message has the reseller's branding, but the first one didn't?!

The web sites

To confirm a transfer, you have to paste an EPP authorization code into the old and new registrars' confirmation web sites.

The first site https://approve.domainadmin.com/transfer/ has very bare-bones OpenSRS branding. It's a bit of a pity they don't allow resellers to add their own branding.

The second site http://domainnameverification.net/transferout_foa/ is unbranded; it isn't clear to me why it isn't part of Gandi's normal web site and user interface. Also, it is plain HTTP without TLS!

Conclusion

What I would like from this kind of process is an impression that it is reassuringly simple - not involving loads of unexpected organizations and web sites, difficult to screw up by being inattentive. The actual experience is shambolic.

And remember that basically all Internet security rests on domain name ownership, and this is part of the process of maintaining that ownership.

Here endeth the rant.

| Leave a comment (1) | | Flag

fanf

Version number comparisons in Apache and GNU coreutils

5th Oct 2016 | 23:07

So I have a toy DNS server which runs bleeding edge BIND 9 with a bunch of patches which I have submitted upstream, or which are work in progress, or just stupid. It gets upgraded a lot.

My build and install script puts the version number, git revision, and an install counter into the name of the install directory. So, when I deploy a custom hack which segfaults, I can just flip a symlink and revert to the previous less incompetently modified version.

More recently I have added an auto-upgrade feature to my BIND rc script. This was really simple, stupid: it just used the last install directory from the ls lexical sort. (You can tell it is newish because it could not possibly have coped with the BIND 9.9 -> 9.10 transition.)

It broke today.

BIND 9.11 has been released! Yay! I have lots of patches in it!

Unfortunately 9.11.0 sorts lexically before 9.11.0rc3. So my dumb auto update script refused to update. This can only happen in the transition period between a major release and the version bump on the master branch for the next pre-alpha version. This is a rare edge case for code only I will ever use.

But, I fixed it!

And I learned several cool things in the process!

I started off by wondering if I could plug dpkg --compare-versions into a sorting algorithm. But then I found that GNU sort has a -V version comparison mode. Yay! ls | sort -V!

But what is its version comparison algorithm? Is it as good as dpkg's? The coreutils documentation refers to the gnulib filevercmp() function. This appears not to exist, but gnulib does have a strverscmp() function cloned from glibc.

So look at the glibc man page for strverscmp(). It is a much simpler algorithm than dpkg, but adequate for my purposes. And, I see, built in to GNU ls! (I should have spotted this when reading the coreutils docs, because that uses ls -v as an example!)

Problem solved!

Add an option in ls | tail -1 so it uses ls -v and my script upgrades from 9.11.0rc3 to 9.11.0 without manual jiggery pokery!

And I learned about some GNU carnival organ bells and whistles!

But ... have I seen something like this before?

If you look at the isc.org BIND distribution server, its listing is sorted by version number, not lexically, i.e. 10 sorts after 9 not after 1.

They are using the Apache mod_autoindex IndexOptions VersionSort directive, which works the same as GNU version number sorting.

Nice!

It's pleasing to see widespread support for version number comparisons, even if they aren't properly thought-through elaborate battle hardened dpkg version number comparisons.

| Leave a comment (1) | | Flag

fanf

Aperiodic shower curtain

1st Oct 2016 | 17:09

We have a periodic table shower curtain. It mostly uses Arial for its lettering, as you can see from the "R" and "C" in the heading below, though some of the lettering is Helvetica, like the "t" and "r" in the smaller caption.


(bigger) (huge)

The lettering is very inconsistent. For instance, Chromium is set in a lighter weight than other elements - compare it with Copper

(bigger) (huge) (bigger) (huge)

Palladium is particularly special


(bigger) (huge)

Platinum is Helvetica but Meitnerium is Arial - note the tops of the "t"s.

(bigger) (huge) (bigger) (huge)

Roentgenium is all Arial; Rhenium has a Helvetica "R" but an Arial "e"!

(bigger) (huge) (bigger) (huge)

It is a very distracting shower curtain.

| Leave a comment (7) | | Flag

fanf

DNS ADDITIONAL data is disappointingly poorly used

26th Sep 2016 | 11:08

Recently there was a thread on bind-users about "minimal responses and speeding up queries. The discussion was not very well informed about the difference between the theory and practice of addidional data in DNS responses, so I wrote the following.

Background: DNS replies can contain "additional" data which (according to RFC 1034 "may be helpful in using the RRs in the other sections." Typically this means addresses of servers identified in NS, MX, or SRV records. In BIND you can turn off most additional data by setting the minimal-responses option.

Reindl Harald <h.reindl at thelounge.net> wrote:

additional responses are part of the inital question and may save asking for that information - in case the additional info is not needed by the client it saves traffic

Matus UHLAR - fantomas <uhlar at fantomas.sk>

If you turn mimimal-responses on, the required data may not be in the answer. That will result into another query send, which means number of queries increases.

There are a few situations in which additional data is useful in theory, but it's surprisingly poorly used in practice.

End-user clients are generally looking up address records, and the additional and authority records aren't of any use to them.

For MX and SRV records, additional data can reduce the need for extra A and AAAA records - but only if both A and AAAA are present in the response. If either RRset is missing the client still has to make another query to find out if it doesn't exist or wouldn't fit. Some code I am familiar with (Exim) ignores additional sections in MX responses and always does separate A and AAAA lookups, because it's simpler.

The other important case is for queries from recursive servers to authoritative servers, where you might hope that the recursive server would cache the additional data to avoid queries to the authoritative servers.

However, in practice BIND is not very good at this. For example, let's query for an MX record, then the address of one of the MX target hosts. We expect to get the address in the response to the first query, so the second query doesn't need another round trip to the authority.

Here's some log, heavily pruned for relevance.

2016-09-23.10:55:13.316 queries: info:
        view rec: query: isc.org IN MX +E(0)K (::1)
2016-09-23.10:55:13.318 resolver: debug 11:
        sending packet to 2001:500:60::30#53
;; QUESTION SECTION:
;isc.org.                       IN      MX
2016-09-23.10:55:13.330 resolver: debug 10:
        received packet from 2001:500:60::30#53
;; ANSWER SECTION:
;isc.org.               7200    IN      MX      10 mx.pao1.isc.org.
;isc.org.               7200    IN      MX      20 mx.ams1.isc.org.
;; ADDITIONAL SECTION:
;mx.pao1.isc.org.       3600    IN      A       149.20.64.53
;mx.pao1.isc.org.       3600    IN      AAAA    2001:4f8:0:2::2b
2016-09-23.10:56:13.150 queries: info:
        view rec: query: mx.pao1.isc.org IN A +E(0)K (::1)
2016-09-23.10:56:13.151 resolver: debug 11:
        sending packet to 2001:500:60::30#53
;; QUESTION SECTION:
;mx.pao1.isc.org.               IN      A

Hmf, well that's disappointing.

Now, there's a rule in RFC 2181 about ranking the trustworthiness of data:

5.4.1. Ranking data

[ snip ]
Unauthenticated RRs received and cached from the least trustworthy of those groupings, that is data from the additional data section, and data from the authority section of a non-authoritative answer, should not be cached in such a way that they would ever be returned as answers to a received query. They may be returned as additional information where appropriate. Ignoring this would allow the trustworthiness of relatively untrustworthy data to be increased without cause or excuse.

Since my recursive server is validating, and isc.org is signed, it should be able to authenticate the MX target address from the MX response, and promote its trustworthiness, instead of making another query. But BIND doesn't do that.

There are other situations where BIND fails to make good use of all the records in a response, e.g. when you get a referral for a signed zone, the response includes the DS records as well as the NS records. But BIND doesn't cache the DS records properly, so when it comes to validate the answer, it re-fetches them.

In a follow-up message, Mark Andrews says these problems are on his to-do list, so I'm looking forward to DNSSEC helping to make DNS faster :-)

| Leave a comment | | Flag

fanf

Listing zones with jq and BIND's statistics channel

9th Sep 2016 | 13:04

The jq tutorial demonstrates simple filtering and rearranging of JSON data, but lacks any examples of how you might combine jq's more interesting features. So I thought it would be worth writing this up.

I want a list of which zones are in which views in my DNS server. I can get this information from BIND's statistics channel, with a bit of processing.

It's quite easy to get a list of zones, but the zone objects returned from the statistics channel do not include the zone's view.

	$ curl -Ssf http://[::1]:8053/json |
	  jq '.views[].zones[].name' |
	  head -2
	"authors.bind"
	"hostname.bind"

The view names are keys of an object further up the hierarchy.

	$ curl -Ssf http://[::1]:8053/json |
	  jq '.views | keys'
	[
	  "_bind",
	  "auth",
	  "rec"
	]

I need to get hold of the view names so I can use them when processing the zone objects.

The first trick is to_entries which turns the "views" object into an array of name/contents pairs.

	$ curl -Ssf http://[::1]:8053/json |
	  jq -C '.views ' | head -6
	{
	  "_bind": {
	    "zones": [
	      {
	        "name": "authors.bind",
	        "class": "CH",
	$ curl -Ssf http://[::1]:8053/json |
	  jq -C '.views | to_entries ' | head -8
	[
	  {
	    "key": "_bind",
	    "value": {
	      "zones": [
	        {
	          "name": "authors.bind",
	          "class": "CH",

The second trick is to save the view name in a variable before descending into the zone objects.

	$ curl -Ssf http://[::1]:8053/json |
	  jq -C '.views | to_entries |
		.[] | .key as $view |
		.value' | head -5
	{
	  "zones": [
	    {
	      "name": "authors.bind",
	      "class": "CH",

I can then use string interpolation to print the information in the format I want. (And use array indexes to prune the output so it isn't ridiculously long!)

	$ curl -Ssf http://[::1]:8053/json |
	  jq -r '.views | to_entries |
		.[] | .key as $view |
		.value.zones[0,1] |
		"\(.name) \(.class) \($view)"'
	authors.bind CH _bind
	hostname.bind CH _bind
	dotat.at IN auth
	fanf2.ucam.org IN auth
	EMPTY.AS112.ARPA IN rec
	0.IN-ADDR.ARPA IN rec

And that's it!

| Leave a comment | | Flag

fanf

A regrettable rant about politics

4th Sep 2016 | 00:39

Oh good grief, reading this interview with Nick Clegg.

http://www.theguardian.com/politics/2016/sep/03/nick-clegg-did-not-cater-tories-brazen-ruthlessness

I am cross about the coalition. A lot of my friends are MUCH MORE cross than me, and will never vote Lib Dem again. And this interview illustrates why.

The discussion towards the end about the university tuition fee debacle really crystallises it. The framing is about the policy, in isolation. Even, even! that the Lib Dems might have got away with it by cutting university budgets, and making the universities scream for a fee increase!

(Note that this is exactly the tactic the Tories are using to privatise the NHS.)

The point is not the betrayal over this particular policy.

The point is the political symbolism.

Earlier in the interview, Clegg is very pleased about the stunning success of the coalition agreement. It was indeed wonderful, from the policy point of view. But only wonks care about policy.

From the non-wonk point of view of many Lib Dem voters, their upstart radical lefty party suddenly switched to become part of the machine.

Many people who voted for the Lib Dems in 2010 did so because they were not New Labour and - even more - not the Tories. The coalition was a huge slap in the face.

Tuition fees were just the most blatant simple example of this fact.

Free university education is a symbol of social mobility, that you can improve yourself by work regardless of parenthood. Educating Rita, a bellwether of the social safety net.

And I am hugely disappointed that Clegg still seems to think that getting lost in the weeds of policy is more important than understanding the views of his party's supporters.

He says near the start of the interview that getting things done was his main motivation. And in May 2010 that sounded pretty awesome.

But the consequence was the destruction of much of his party's support, and the loss of any media acknowledgment that the Lib Dems have a distinct political position. Both were tenuous in 2010 and both are now laughable.

The interviewer asks him about tuition fees, and still, he fails to talk about the wider implications.

| Leave a comment (16) | | Flag

fanf

An exciting visit to the supermarket

2nd Sep 2016 | 17:35

Towards the end of this story, a dear friend of ours pointed out that when I tweeted about having plenty of "doctors" I actually meant "medics".

Usually this kind of pedantry is not considered to be very polite, but I live in Cambridge; pedantry is our thing, and of course she was completely correct that amongst our friends, "Dr" usually means PhD, and even veterinarians outnumber medics.

And, of course, any story where it is important to point out that the doctor is actually someone who works in a hospital, is not an entirely happy story.

At least it did not involve eye surgeons!

Happy birthday

My brother-in-law's birthday is near the end of August, so last weekend we went to Sheffield to celebrate his 30th. Rachel wrote about our logistical cockups, but by the time of the party we had it back under control.

My sister's house is between our AirB&B and the shops, so we walked round to the party, then I went on to Sainsburys to get some food and drinks.

A trip to the supermarket SHOULD be boring.

Prosectomy!

While I was looking for some wet wipes, one of the shop staff nearby was shelving booze. He was trying to carry too many bottles of prosecco in his hands at once.

He dropped one.

It hit the floor about a metre from me, and smashed - exploded - something hit me in the face!

"Are you ok?" he said, though he probably swore first, and I was still working out what just happened.

"Er, yes?"

"You're bleeding!"

"What?" It didn't feel painful. I touched my face.

Blood on my hand.

Dripping off my nose. A few drops per second.

He started guiding me to the first aid kit. "Leave your shopping there."

We had to go past some other staff stacking shelves. "I just glassed a customer!" he said, probably with some other explanation I didn't hear so clearly.

I was more surprised than anything else!

What's the damage?

In the back of the shop I was given some tissue to hold against the wound. I could see in the mirror in the staff loo it was just a cut on the bridge of my nose.

I wear specs.

It could have been a LOT worse.

I rinsed off the blood and Signor Caduto Prosecco found some sterile wipes and a chair for me to sit on.

Did I need to go to hospital, I wondered? I wasn't in pain, the bleeding was mostly under control. Will I need stitches? Good grief, what a faff.

I phoned my sister, the junior doctor.

"Are you OK?" she asked. (Was she amazingly perceptive or just expecting a minor shopping quandary?)

"Er, no." (Usual bromides not helpful right now!) I summarized what had happened.

"I'll come and get you."

Suddenly!

I can't remember the exact order of events, but before I was properly under control (maybe before I made the phone call?) the staff call bell rang several times in quick succession and they all ran for the front of house.

It was a shoplifter alert!

I gathered from the staff discussions that this guy had failed to steal a chicken and had escaped in a car. Another customer was scared by the would-be-thief following her around the shop, and took refuge in the back of the shop.

Departure

In due course my sister arrived, and we went with Signor Rompere Bottiglie past large areas of freshly-cleaned floor to get my shopping. He gave us a £10 discount (I wasn't in the mood to fight for more) and I pointedly told him not to try carrying so many bottles in the future.

At the party there were at least half a dozen medics :-)

By this point my nose had stopped bleeding so I was able to inspect the damage and tweet about what happened.

My sister asked around to find someone who had plenty of A&E experience and a first aid kit. One of her friends cleaned the wound and applied steri-strips to minimize the scarring.

It's now healing nicely, though still rather sore if I touch it without care.

I just hope I don't have another trip to the supermarket which is quite so eventful...

| Leave a comment (8) | | Flag

fanf

Single-character Twitter usernames

9th Aug 2016 | 21:45

Thinking of "all the best names are already taken", I wondered if the early adopters grabbed all the single character Twitter usernames. It turns out not, and there's a surprisingly large spread - one single-character username was grabbed by a user created only last year. (Missing from the list below are i 1 2 7 8 9 which give error responses to my API requests.)

I guess there has been some churn even amongst these lucky people...

         2039 : 2006-Jul-17 : w : Walter
         5511 : 2006-Sep-08 : f : Fred Oliveira
         5583 : 2006-Sep-08 : e : S VW
        11046 : 2006-Oct-30 : p : paolo i.
        11222 : 2006-Nov-01 : k : Kevin Cheng
        11628 : 2006-Nov-07 : t : ⚡️
       146733 : 2006-Dec-22 : R : Rex Hammock
       628833 : 2007-Jan-12 : y : reY
       632173 : 2007-Jan-14 : c : Coley Pauline Forest
       662693 : 2007-Jan-19 : 6 : Adrián Lamo
       863391 : 2007-Mar-10 : N : Naoki  Hiroshima
       940631 : 2007-Mar-11 : a : Andrei Zmievski
      1014821 : 2007-Mar-12 : fanf : Tony Finch
      1318181 : 2007-Mar-16 : _ : Dave Rutledge
      2404341 : 2007-Mar-27 : z : Zach Brock
      2890591 : 2007-Mar-29 : x : gene x
      7998822 : 2007-Aug-06 : m : Mark Douglass
      9697732 : 2007-Oct-25 : j : Juliette Melton
     11266532 : 2007-Dec-17 : b : jake h
     11924252 : 2008-Jan-07 : h : Helgi Þorbjörnsson
     14733555 : 2008-May-11 : v : V
     17853751 : 2008-Dec-04 : g : Greg Leding
     22005817 : 2009-Feb-26 : u : u
     50465434 : 2009-Jun-24 : L : L. That is all.
    132655296 : 2010-Apr-13 : Q : Ariel Raunstien
    287253599 : 2011-Apr-24 : 3 : Blair
    347002675 : 2011-Aug-02 : s : Science!
    400126373 : 2011-Oct-28 : 0 : j0
    898384208 : 2012-Oct-22 : 4 : 4oto
    966635792 : 2012-Nov-23 : 5 : n
   1141414200 : 2013-Feb-02 : O : O
   3246146162 : 2015-Jun-15 : d : D
(Edited to add @_)

| Leave a comment (6) | | Flag

fanf

Domain registry APIs and "superglue"

28th Jul 2016 | 16:01

On Tuesday (2016-07-26) I gave a talk at work about domain registry APIs. It was mainly humorous complaining about needlessly complicated and broken stuff, but I slipped in a few cool ideas and recommendations for software I found useful.

I have published the slides and notes (both PDF).

The talk was based on what I learned last year when writing some software that I called superglue, but the talk wasn't about the software. That's what this article is about.

What is superglue?

Firstly, it isn't finished. To be honest, it's barely started! Which is why I haven't written about it before.

The goal is automated management of DNS delegations in parent zones - hence "super" (Latin for over/above/beyond, like a parent domain) "glue" (address records sometimes required as part of a delegation).

The basic idea is that superglue should work roughly like nsdiff | nsupdate.

Recall, nsdiff takes a DNS master file, and it produces an nsupdate script which makes the live version of the DNS zone match what the master file says it should be.

Similarly, superglue takes a collection of DNS records that describe a delegation, and updates the parent zone to match what the delegation records say it should be.

A uniform interface to several domain registries

Actually superglue is a suite of programs.

When I wrote it I needed to be able to update parent delegations managed by RIPE, JISC, Nominet, and Gandi; this year I have started moving our domains to Mythic Beasts. They all have different APIs, and only Nominet supports the open standard EPP.

So superglue has a framework which helps each program conform to a consistent command line interface.

Eventually there should be a superglue wrapper which knows which provider-specific script to invoke for each domain.

Refining the user interface

DNS delegation is tricky even if you only have the DNS to deal with. However, superglue also has to take into account the different rules that various domain registries require.

For example, in my initial design, the input to superglue was going to simply be the delegations records that should go in the parent zone, verbatim.

But some domain registries to not accept DS records; instead you have to give them your DNSKEY records, and the registry will generate their preferred form of DS records in the delegation. So superglue needs to take DNSKEY records in its input, and do its own DS generation for registries that require DS rather than DNSKEY.

Sticky problems

There is also a problem with glue records: my idea was that superglue would only need address records for nameservers that are within the domain whose delegation is being updated. This is the minimal set of glue records that the DNS requires, and in many cases it is easy to pull them out of the master file for the delegated zone.

However, sometimes a delegation includes name servers in a sibling domain. For example, cam.ac.uk and ic.ac.uk are siblings.

    cam.ac.uk.  NS  authdns0.csx.cam.ac.uk.
    cam.ac.uk.  NS  ns2.ic.ac.uk.
    cam.ac.uk.  NS  sns-pb.isc.org.
    ; plus a few more

In our delegation, glue is required for nameservers in cam.ac.uk, like authdns0; glue is forbidden for nameservers outside ac.uk, like sns-pb.isc.org. Those two rules are fixed by the DNS.

But for nameservers in a sibling domain, like ns2.ic.ac.uk, the DNS says glue may be present or omitted. Some registries say this optional sibling glue must be present; some registries say it must be absent.

In registries which require optional sibling glue, there is a quagmire of problems. In many cases the glue is already present, because it is part of the sibling delegation and, therefore, required - this is the case for ns2.ic.ac.uk. But when the glue is truly optional it becomes unclear who is responsible for maintaining it: the parent domain of the nameserver, or the domain that needs it for their delegation?

I basically decided to ignore that problem. I think you can work around it by doing a one-off manual set up of the optional glue, after which superglue will work. So it's similar to the other delegation management tasks that are out of scope for superglue, like registering or renewing domains.

Captain Adorable

The motivation for superglue was, in the short term, to do a bulk update of out 100+ delegations to add sns-pb.isc.org, and in the longer term, to support automatic DNSSEC key rollovers.

I wrote barely enough code to help with the short term goal, so what I have is undocumented, incomplete, and inconsistent. (Especially wrt DNSSEC support.)

And since then I haven't had time or motivation to work on it, so it remains a complete shambles.

| Leave a comment (2) | | Flag

fanf

Uplift from SCCS to git, again

19th Jul 2016 | 16:32

Nearly two years ago I wrote about my project to convert the 25-year history of our DNS infrastructure from SCCS to git - "uplift from SCCS to git"

In May I got email from Akrem ANAYA asking my SCCS to git scripts. I was pleased that they might be of use to someone else!

And now I have started work on moving our Managed Zone Service to a new server. The MZS is our vanity domain system, for research groups who want domain names not under cam.ac.uk.

The MZS also uses SCCS for revision control, so I have resurrected my uplift scripts for another outing. This time the job only took a couple of days, instead of a couple of months :-)

The most amusing thing I found was the cron job which stores a daily dump of the MySQL database in SCCS...

| Leave a comment | | Flag

fanf

Tactile paving addenda

15th Jun 2016 | 09:38

I have found out the likely reason that tactile paving markers on dual-use paths are the wrong way round!

A page on the UX StackExchange discusses tactile paving on dual-use paths and the best explanation is that it can be uncomfortable to walk on longitudinal ridges, whereas transverse ones are OK.

(Apparently the usual terms are "tram" vs "ladder".)

I occasionally get a painful twinge in my ankle when I tread on the edge of a badly laid paving stone, so I can sympathise. But surely tactile paving is supposed to be felt by the feet, so it should not hurt people who tread on it even inadvertently!

| Leave a comment (4) | | Flag

fanf

Tactile paving

13th Jun 2016 | 12:24

In Britain there is a standard for tactile paving at the start and end of shared-use foot / cycling paths. It uses a short section of ridged paving slabs which can be laid with the ridges either along the direction of the path or across the direction of the path, to indicate which side is reserved for which mode of transport.

Transverse ridges

If you have small hard wheels, like many pushchairs or the front castors on many wheelchairs, transverse ridges are very bumpy and uncomfortable.

If you have large pneumatic wheels, like a bike, the wheel can ride over the ridges so it doesn't feel the bumps.

Transverse ridges are better for bikes

Longitudinal ridges

If you have two wheels and the ground is a bit slippery, longitudinal ridges can have a tramline effect which disrupts you steering and therefore balance, so they are less safe.

If you have four wheels, the tramline effect can't disrupt your balance and can be nice and smooth.

Longitudinal ridges are better for pushchairs

The standard

So obviously the standard is transverse ridges for the footway, and longitudinal ridges for the cycleway.

(I have a followup item with a plausible explanation!)

| Leave a comment (31) | | Flag

fanf

Experimenting with _Generic() for parametric constness in C11

1st Jun 2016 | 19:40

One of the new features in C99 was tgmath.h, the type-generic mathematics library. (See section 7.22 of n1256.) This added ad-hoc polymorphism for a subset of the library, but C99 offered no way for programmers to write their own type-generic macros in standard C.

In C11 the language acquired the _Generic() generic selection operator. (See section 6.5.1.1 of n1570.) This is effectively a type-directed switch expression. It can be used to implement APIs like tgmath.h using standard C.

(The weird spelling with an underscore and capital letter is because that part of the namespace is reserved for future language extensions.)

When const is ugly

It can often be tricky to write const-correct code in C, and retrofitting constness to an existing API is much worse.

There are some fearsome examples in the standard library. For instance, strchr() is declared:

    char *strchr(const char *s, int c);

(See section 7.24.5.2 of n1570.)

That is, it takes a const string as an argument, indicating that it doesn't modify the string, but it returns a non-const pointer into the same string. This hidden de-constifying cast allows strchr() to be used with non-const strings as smoothly as with const strings, since in either case the implicit type conversions are allowed. But it is an ugly loop-hole in the type system.

Parametric constness

It would be much better if we could write something like,

    const<A> char *strchr(const<A> char *s, int c);

where const<A> indicates variable constness. Because the same variable appears in the argument and return types, those strings are either both const or both mutable.

When checking the function definition, the compiler would have to treat parametric const<A> as equivalent to a normal const qualifier. When checking a call, the compiler allows the argument and return types to be const or non-const, provided they match where the parametric consts indicate they should.

But we can't do that in standard C.

Or can we?

When I mentioned this idea on Twitter a few days ago, Joe Groff said "_Generic to the rescue", so I had to see if I could make it work.

Example: strchr()

Before wrapping a standard function with a macro, we have to remove any existing wrapper. (Standard library functions can be wrapped by default!)

    #ifdef strchr
    #undef strchr
    #endif

Then we can create a replacement macro which implements parametric constness using _Generic().

    #define strchr(s,c) _Generic((s),                    \
        const char * : (const char *)(strchr)((s), (c)), \
        char *       :               (strchr)((s), (c)))

The first line says, look at the type of the argument s.

The second line says, if it is a const char *, call the function strchr and use a cast to restore the missing constness.

The third line says, if it is a plain char *, call the function strchr leaving its return type unchanged from char *.

The (strchr)() form of call is to avoid warnings about attempts to invoke a macro recursively.

    void example(void) {
        const char *msg = "hello, world\n";
        char buf[20];
        strcpy(buf, msg);

        strchr(buf, ' ')[0] = '\0';
        strchr(msg, ' ')[0] = '\0';
        strchr(10,20);
    }

In this example, the first call to strchr is always OK.

The second call typically fails at runtime with the standard strchr, but with parametric constness you get a compile time error saying that you can't modify a const string.

Without parametric constness the third call gives you a type conversion warning, but it still compiles! With parametric constness you get an error that there is no matching type in the _Generic() macro.

Conclusion

That is actually pretty straightforward, which is nice.

As well as parametric constness for functions, in the past I have also wondered about parametric constness for types, especially structures. It would be nice to be able to use the same code for read-only static data as well as mutable dynamic data, and have the compiler enforce the distinction. But _Generic() isn't powerful enough, and in any case I am not sure how such a feature should work!

| Leave a comment (8) | | Flag

fanf

Even the Deathstation 9000 can't screw up the BIND 9.10.4 fix

20th May 2016 | 01:52

There was an interesting and instructive cockup in the BIND 9.10.4 release. The tl;dr is that they tried to improve performance by changing the layout of a data structure to use less memory. As a side effect, two separate sets of flags ended up packed into the same word. Unfortunately, these different sets of flags were protected by different locks. This overlap meant that concurrent access to one set of flags could interfere with access to the other set of flags. This led to a violation of BIND's self-consistency checks, causing a crash.

The fix uses an obscure feature of C structure bitfield syntax. If you declare a bitfield without a name and with a zero width (in this case, unsigned int :0) any following bitfields must be placed in the next "storage unit". This is described in the C standard section 6.7.2.1p12.

You can see this in action in the BIND DNS red-black tree declaration.

A :0 solves the problem, right?

So a clever idea might be to use this :0 bitfield separator so that the different sets of bitfields will be allocated in different words, and the different words will be protected by different locks.

What are the assumptions behind this fix?

If you only use a :0 separator, you are assuming that a "word" (in a vague hand-wavey sense) corresponds to both a "storage unit", which the compiler uses for struct layouts, and also to the size of memory access the CPU uses for bitfields, which matters when we are using locks to keep some accesses separate from each other.

What is a storage unit?

The C standard specifies the representation of bitfields in section 6.2.6.1p4:

Values stored in bit-fields consist of m bits, where m is the size specified for the bit-field. The object representation is the set of m bits the bit-field comprises in the addressable storage unit holding it.

Annex J, which covers portability issues, says:

The following are unspecified: [... yada yada ...]

The alignment of the addressable storage unit allocated to hold a bit-field

What is a storage unit really?

A "storage unit" is defined by the platform's ABI, which for BIND usually means the System V Application Binary Interface. The amd64 version covers bit fields in section 3.1.2. It says,

  • bit-fields must be contained in a storage unit appropriate for its declared type

  • bit-fields may share a storage unit with other struct / union members

In this particular case, the storage unit is 4 bytes for a 32 bit unsigned int. Note that this is less than the architecture's nominal 64 bit width.

How does a storage unit correspond to memory access widths?

Who knows?

What?

It is unspecified.

So...

The broken code declared a bunch of adjacent bit fields and forgot that they were accessed under different locks.

Now, you might hope that you could just add a :0 to split these bitfields into separate storage units, and the split would be enough to also separate the CPU's memory accesses to the different storage units.

But you would be assuming that the code that accesses the bitfields will be compiled to use unsigned int sized memory accesses to read and write the unsigned int sized storage units.

Um, do we have any guarantees about access size?

Yes! There is sig_atomic_t which has been around since C89, and a load of very recent atomic stuff. But none of it is used by this part of BIND's code.

(Also, worryingly, the AMD64 SVABI does not mention "atomic".)

So what is this "Deathstation 9000" thing then?

The DS9K is the canonical epithet for the most awkward possible implementation of C, which follows the standard to the letter, but also violates common assumptions about how C works in practice. It is invoked as a horrible nightmare, but in recent years it has become a disastrous reality as compilers have started exploiting undefined behaviour to support advanced optimizations.

(Sadly the Armed Response Technologies website has died, and Wikipedia's DS9K page has been deleted. About the only place you can find information about it is in old comp.lang.c posts...)

And why is the DS9K relevant?

Well, in this case we don't need to go deep into DS9K territory. There are vaguely reasonable ABIs for which a small :0 fix would not actually work.

For instance, there might be a CPU which can only do 64 bit memory accesses, but which has a 32 bit int storage unit. This type representation would probably mean the C compiler has really bad performance, so it is fairly unlikely. But it is allowed, and there are (or were) CPUs which can't do sub-word memory accesses, and which have very little RAM so they want to pack data tightly.

On a CPU like this, the storage unit doesn't match the memory access, so C's :0 syntax for skipping to the next storage unit will fail to achieve the desired effect of isolating memory accesses that have to be performed under different concurrent access locks.

DS9K defence technology

So the BIND 9.10.4 fix does two things:

The most important change is to move one of the sets of bitfields from the start of the struct to the end of it. This means there are several pointers in between the fields protected by one lock and the fields protected by the other lock, so even a DS9K can't reasonably muddle them up.

Secondly, they used magical :0 syntax and extensive comments to (hopefully) prevent the erroneous compaction from happening again. Even if the bitfields get moved back to the start of the struct (so the pointers no longer provide insulation) the :0 might help to prevent the bug from causing crashes.

(edited to add)

When I wrapped this article up originally, I forgot to return to sig_atomic_t. If, as a third fix, these bitfields were also changed from unsigned int to sig_atomic_t, it would further help to avoid problems on systems where the natural atomic access width is wider than an int, like the lesser cousin of the DS9K I outlined above. However, sig_atomic_t can be signed or unsigned so it adds a new portability wrinkle.

Conclusion

The clever :0 is not enough, and the BIND developers were right not to rely on it.

| Leave a comment (8) | | Flag

fanf

6 reasons I like Wintergatan

13th May 2016 | 01:06

  1. If you have not yet seen the Wintergatan Marble Machine video, you should. It is totally amazing: a musical instrument and a Heath Robinson machine and a marble run and a great tune. It has been viewed nearly 19 million times since it was published on the 29th February.

    Wintergatan have a lot of videos on YouTube most of which tell the story of the construction of the Marble Machine. I loved watching them all: an impressive combination of experimentation and skill.

  2. The Starmachine 2000 video is also great. It interleaves a stop-motion animation (featuring Lego) filmed using a slide machine (which apparently provides the percussion riff for the tune) with the band showing some of their multi-instrument virtuosity.

    The second half of the video tells the story of another home-made musical instrument: a music box driven by punched tape, which you can see playing the melody at the start of the video. Starmachine 2000 was published in January 2013, so it's a tiny hint of their later greatness.

    I think the synth played on the shoulder is also home-made.

  3. Sommarfågel is the band's introductory video, also featuring stop-motion animation and the punched-tape music box.

    Its intro ends with a change of time signature from 4/4 to 6/8 time!

    The second half of the video is the making of the first half - the set, the costumes, the animation. The soundtrack sounds like it was played on the music box.

  4. They have an album which you can listen to on YouTube or buy on iTunes.

    By my reckoning, 4 of the tracks are in 3/4 time, 3 in 4/4 time, and 2 in 6/8 time. I seem to like music in unusual time signatures.

  5. One of the tracks is called "Biking is better".

  6. No lyrics.

| Leave a comment (3) | | Flag

fanf

Apple calendar apps amplifying .ics VALARM ACTION:EMAIL

8th May 2016 | 15:30

This took me hours to debug yesterday so I thought an article would be a good idea.

The vexation

My phone was sending email notifications for events in our shared calendar.

Some of my Macs were trying and failing to send similar email notifications. (Mac OS Mail knows about some of my accounts but it doesn't know the passwords because I don't entirely trust the keychain.)

The confusion

There is no user interface for email notifications in Apple's calendar apps.

  • They do not allow you to create events with email notifications.
  • They do not show the fact that an event (created by another client) has an email notification configured.
  • They do not have a preferences setting to control email notifications.

The escalation

It seems that if you have a shared calendar containing an event with an email notification, each of your Apple devices which have this calendar will separately try to send their own duplicate copy of the notification email.

(I dread to think what would happen in an office environment!)

The explanation

I couldn't get a CalDAV client to talk to Fastmail's CalDAV service in a useful way. I managed to work out what was going on by exporting a .ics file and looking at the contents.

The VEVENT clause saying ACTION:EMAIL was immediately obvious towards the end of the file.

The solution

Sadly I don't know of a way to stop them from doing this other than to avoid using email notification on calendar events.

| Leave a comment (7) | | Flag

fanf

Belated job update

8th May 2016 | 13:53

I mentioned obliquely on Twitter that I am no longer responsible for email at Cambridge. This surprised a number of people so I realised I should do a proper update for my non-Cambridge friends.

Since Chris Thompson retired at the end of September 2014, I have been the University's hostmaster. I had been part of the hostmaster team for a few years before then, but now I am the person chiefly responsible. I am assisted by my colleagues in the Network Systems team.

Because of my increased DNS responsibilities, I was spending a lot less time on email. This was fine from my point of view - I had been doing it for about 13 years, which frankly is more than enough.

Also, during the second half of 2015, our department reorganization at long last managed to reach down from rarefied levels of management and started to affect the technical staff.

The result is that at the start of 2016 I moved into the Network Systems team. We are mainly responsible for the backbone and data centre network switches and routers, plus a few managed institution networks. (Most departments and colleges manage their own switches.) There are various support functions like DHCP, RADIUS, and now DNS, and web interfaces to some of these functions.

My long-time fellow postmaster David Carter continues to be responsible for Hermes and has moved into a new team that includes the Exchange admins. There will be more Exchange in our future; it remains to be decided what will happen to Hermes.

So my personal mail is no longer a dodgy test domain and reconfiguration canary on Hermes. Instead I am using Fastmail which has been hosting our family domain for a few years.

| Leave a comment | | Flag

fanf

A colophon for my link log

6th May 2016 | 02:36

The most common question I get asked about my link log (LJ version) is

How do you read all that stuff?!

The answer is,

I don't!

My link log is basically my browser bookmarks, except public. (I don't have any private bookmarks to speak of.) So, for it to be useful, a link description should be reasonably explanatory and have enough keywords that I can find it years later.

The links include things I have read and might want to re-read, things I have not read but I think I should keep for future reference, things I want to read soon, things that look interesting which I might read, and things that are aspirational which I feel I ought to read but probably never will.

(Edited to add) I should also say that I might find an article to be wrong or problematic, but it still might be interesting enough to be worth logging - especially if I might want to refer back to this wrong or problematic opinion for future discussion or criticism. (end of addition)

But because it is public, I also use it to tell people about links that are cool, though maybe more ephemeral. This distorts the primary purpose a lot.

It's my own miscellany or scrapbook, for me and for sharing.

Failures

A frustrating problem is that I sometimes see things which, at the time, seem to be too trivial to log, but which later come up in conversation and I can't cite my source because I didn't log it or remember the details.

Similarly I sometimes fail to save links to unusually good blog articles or mailing list messages, and I don't routinely keep my own copies of either.

Frequently, my descriptions lack enough of the right synonym keywords for me to find them easily. (I'm skeptical that I would be able to think ahead well enough to add them if I tried.)

I make no effort to keep track of where I get links from. This is very lazy, and very anti-social. I am sorry about this, but not sorry enough to fix it, which makes me sorry about being too lazy to fix it. Sorry.

Evolution

What I choose to log changes over time. How I phrase the descriptions also changes. (I frequently change the original title to add keywords or to better summarize the point. My usual description template is "Title or name: spoiler containing keywords.")

Increasingly in recent years I have tried to avoid the political outrage of the day. I prefer to focus on positive or actionable things.

A lot (I think?) of the negative "OMG isn't this terrible" links on my log recently are related to computer security or operations. They fall into the "actionable" category by being cautionary tales: can I learn from their mistakes? Please don't let me repeat them?

Sources

Mailing lists. I don't have a public list of the ones I subscribe to. The tech ones are mail / DNS / network / time related - IETF, operations, software I use, projects I contribute to, etc.

RSS/Atom feeds. I also use LJ as my feed reader (retro!) and sadly they don't turn my feed list into a proper blog roll, but you might find something marginally useful at http://fanf.livejournal.com/profile?socconns=yfriends

Hacker News. I use Colin Percival's Hacker News Daily to avoid the worst of it. It is more miss than hit - a large proportion of the 10 "best" daily links are about Silly Valley trivia or sophomoric politics. But when HN has a link with discussion about something technical it can provide several good related links and occasionally some well-informed comments.

Mattias Geniar's cron.weekly tech newsletter is great. Andrew Ducker's links are great.

Twitter. I like its openness, the way it is OK to follow someone who might be interesting. On Facebook that would be creepy, on Linkedin that would be spammy.

| Leave a comment (6) | | Flag

fanf

Capability myths demolished

29th Apr 2016 | 02:16

Blogging after the pub is a mixed blessing. A time when I want to write about things! But! A time when I should be sleeping.

Last week I wrote nearly 2000 boring words about keyboards and stayed up till nearly 4am. A stupid amount of effort for something no-one should ever have to care about.

Feh! what about something to expand the mind rather than dull it?

CAPABILITIES

Years and years ago I read practically everything on http://erights.org - the website for the E programming language - which is a great exposition of how capabilities are the way to handle access control.

I was reminded of these awesome ideas this evening when talking to Colin Watson about how he is putting Macaroons into practice.

One of the great E papers is Capability Myths Demolished. It's extremely readable, has brilliant diagrams, and really, it's the whole point of this blog post.

Go and read it: http://srl.cs.jhu.edu/pubs/SRL2003-02.pdf

Ambient authority

One of the key terms it introduces is "ambient authority". This is what you have when you are able to find a thing and do something to it, without being given a capability to do so.

In POSIX terms, file descriptors are capabilities, and file names are ambient authority. A capability-style POSIX would, amongst other things, eliminate the current root directory and the current working directory, and limit applications to openat() with relative paths.

Practical applications of capabilities to existing systems usually require intensive "taming" of the APIs to eliminate ambient authority - see for example Capsicum, and CaPerl and CaJa.

From the programming point of view (rather than the larger systems point of view) eliminating ambient authority is basically eliminating global variables.

From the hipster web devops point of view, the "12 factor application" idea of storing configuration in the environment is basically a technique for reducing ambient authority.

Capability attenuation

Another useful idea in the Capability Myths Demolished paper is capability attenuation: if you don't entirely trust someone, don't give them the true capabilty, give them a capability on a proxy. The proxy can then restrict or revoke access to the true object, according to whatever rules you want.

(userv proxies POSIX file descriptors rather than passing them as-is, to avoid transmitting unwanted capabilities - it always gives you a pipe, not a socket, nor a device, etc.)

And stuff

It is too late and my mind is too dull now to do justice to these ideas, so go and read their paper instead of my witterings.

| Leave a comment (1) | | Flag

fanf

Synergy vs xmodmap: FIGHT!

22nd Apr 2016 | 03:41

Why you can't use xmodmap to change how Synergy handles modifier keys

As previously mentioned I am doing a major overhaul of my workstation setup, the biggest since 1999. This has turned out to be a huge can of worms.

I will try to explain the tentacular complications caused by some software called Synergy...

I use an Apple trackpad and keyboard, and 1Password

I made a less-big change several years ago - big in terms of the physical devices I use, but not the software configuration: I fell in love with Apple, especially the Magic Trackpad. On the downside, getting an Apple trackpad to talk directly to a FreeBSD or Debian box is (sadly) a laughable proposition, and because of that my setup at work is a bit weird.

(Digression 1: I also like Apple keyboards. I previously used a Happy Hacking Keyboard (lite 2), which I liked a lot, but I like the low profile and short travel of the Apple keyboards more. Both my HHKB and Apple keyboards use rubber membranes, so despite appearances I have not in fact given up on proper keyswitches - I just never used them much.)

(Digression 2: More recently I realised that I could not continue without a password manager, so I started using 1Password, which for my purposes basically requires having a Mac. If you aren't using a password manager, get one ASAP. It will massively reduce your password security worries.)

At work I have an old-ish cast-off Mac, which is responsible for input devices, 1Password, web browser shit, and disractions like unread mail notifications, iCalendar, IRC, Twitter. The idea (futile and unsuccessful though it frequently is) was that I could place the Mac to one side and do Real Work on the (other) Unix workstation.

Synergy is magic

The key software that ties my workstations together is Synergy.

Synergy allows you to have one computer which owns your input devices (in my case, my Mac) which can seamlessly control your other computers. Scoot your mouse cursor from one screen to another and the keyboard input focus follows seamlessly. It looks like you have multiple monitors plugged into one computer, with VMs running different operating systems displayed on different monitors. But they aren't VMs, they are different bare metal computers, and Synergy is forwarding the input events from one to the others.

Synergy is great in many ways. It is also a bit too magical.

How I want my keyboard to work

My main criterion is that the Meta key should be easy to press.

For reference look at this picture of an Apple keyboard on Wikimedia Commons.

(Digression 3: Actually, Control is more important than Meta, and Control absolutely has to be the key to the left of A. I also sometimes use the key labelled Control as part of special key chord operations, for which I move my fingers down from the home row. In X11 terminology I need ctrl:nocaps not ctrl:swapcaps. If I need caps it's easier for me to type with my little finger holding Shift than to waste a key on Caps Lock confusion . Making Caps Lock into another Control is so common that it's a simple configuration feature in Mac OS.)

I use Emacs, which is why the Meta key is also important. Meta is a somewhat abstract key modifier which might be produced by various different keys (Alt, Windows, Option, Command, Escape, Ctrl+[, ...) depending on the user's preferences and the vagaries of the software they use.

I press most modifier keys (Ctrl, Shift, Fn) with my left little finger; the exception is Meta, for which I use my thumb. For comfort I do not want to have to curl my thumb under my palm very much. So Meta has to come from the Command key.

For the same reasons, on a PC keyboard Meta has to come from the Alt key. Note that on a Mac keyboard, the Option key is also labelled Alt.

So if you have a keyboard mapping designed for a PC keyboard, you have Meta <- Alt <- non-curly thumb, but when applied to a Mac keyboard you get Meta <- Alt <- Option <- too-curly thumb.

This is an awkward disagreement which I have to work around.

X11 keyboard modifiers

OK, sorry, we aren't quite done with the tedious background material yet.

The X11 keyboard model (at least the simpler pre-XKB model) basically has four layers:

  • keycodes: numbers that represent physical keys, e.g. 64

  • keysyms: symbols that represent key labels, e.g. Alt_L

  • modifiers: a few kesyms can be configured as one of 8 modifiers (shift, ctrl, etc.)

  • keymap: how keysyms plus modifiers translate to characters

I can reset all these tables so my keyboard has a reasonably sane layout with:

    $ setxkbmap -layout us -option ctrl:nocaps

After I do that I get a fairly enormous variety of modifier-ish keysyms:

    $ xmodmap -pke | egrep 'Shift|Control|Alt|Meta|Super|Hyper|switch'
    keycode  37 = Control_L NoSymbol Control_L
    keycode  50 = Shift_L NoSymbol Shift_L
    keycode  62 = Shift_R NoSymbol Shift_R
    keycode  64 = Alt_L Meta_L Alt_L Meta_L
    keycode  66 = Control_L Control_L Control_L Control_L
    keycode  92 = ISO_Level3_Shift NoSymbol ISO_Level3_Shift
    keycode 105 = Control_R NoSymbol Control_R
    keycode 108 = Alt_R Meta_R Alt_R Meta_R
    keycode 133 = Super_L NoSymbol Super_L
    keycode 134 = Super_R NoSymbol Super_R
    keycode 203 = Mode_switch NoSymbol Mode_switch
    keycode 204 = NoSymbol Alt_L NoSymbol Alt_L
    keycode 205 = NoSymbol Meta_L NoSymbol Meta_L
    keycode 206 = NoSymbol Super_L NoSymbol Super_L
    keycode 207 = NoSymbol Hyper_L NoSymbol Hyper_L

These map to modifiers as follows. (The higher modifers have unhelpfully vague names.)

    $ xmodmap -pm
    xmodmap:  up to 4 keys per modifier, (keycodes in parentheses):

    shift       Shift_L (0x32),  Shift_R (0x3e)
    lock      
    control     Control_L (0x25),  Control_L (0x42),  Control_R (0x69)
    mod1        Alt_L (0x40),  Alt_R (0x6c),  Meta_L (0xcd)
    mod2        Num_Lock (0x4d)
    mod3      
    mod4        Super_L (0x85),  Super_R (0x86),  Super_L (0xce),  Hyper_L (0xcf)
    mod5        ISO_Level3_Shift (0x5c),  Mode_switch (0xcb)

How Mac OS -> Synergy -> X11 works by default

If I don't change things, I get

    Command -> 133 -> Super_L -> Mod4 -> good for controlling window manager

    Option -> 64 -> Alt_L -> Mod1 -> meta in emacs

which is not unexpected, given the differences between PC and Mac layouts, but I want to swap the effects of Command and Option

Note that I get (roughly) the same effect when I plug the Apple keyboard directly into the PC and when I use it via Synergy. I want the swap to work in both cases.

xmodmap to the rescue! not!

Eliding some lengthy and frustrating debugging, the important insight came when I found I could reset the keymap using setxkbmap (as I mentioned above) and test the effect of xmodmap on top of that in a reproducible way. (xmodmap doesn't have a reset-to-default option.)

What I want should be relatively simple:

    Command -> Alt -> Mod1 -> Meta in emacs

    Option -> Super -> Mod4 -> good for controlling window manager

So in xmodmap I should be able to just swap the mappings of keycodes 64 and 133.

The following xmodmap script is a very thorough version of this idea. First it completely strips the higher-order modifier keys, then it rebuilds just the config I want.

    clear Mod1
    clear Mod2
    clear Mod3
    clear Mod4
    clear Mod5

    keycode  64 = NoSymbol
    keycode  92 = NoSymbol
    keycode 108 = NoSymbol
    keycode 133 = NoSymbol
    keycode 134 = NoSymbol
    keycode 203 = NoSymbol
    keycode 204 = NoSymbol
    keycode 205 = NoSymbol
    keycode 206 = NoSymbol
    keycode 207 = NoSymbol

    ! command -> alt
    keycode 133 = Alt_L
    keycode 134 = Alt_R
    add Mod1 = Alt_L Alt_R

    ! option -> super
    keycode 64 = Super_L
    keycode 108 = Super_R
    add Mod4 = Super_L Super_R

WAT?! That does not work

The other ingredient of the debugging was to look carefully at the output of xev and Synergy's debug logs.

When I fed that script into xmodmap, I saw,

    Command -> 64 -> Super_L -> Mod4 -> good for controlling window manager

    Option -> 133 -> Alt_L -> Mod1 -> Meta in emacs

So Command was STILL being Super, and Option was STILL being Alt.

I had not swapped the effect of the keys! But I had swapped their key codes!

An explanation for Synergy's modifier key handling

Synergy's debug logs revealed that, given a keypress on the Mac, Synergy thought it should have a particular effect; it looked at the X11 key maps to work out what key codes it should generate to produce that effect.

So, when I pressed Command, Synergy thought, OK, I need to make a Mod4 on X11, so I have to artificially press a keycode 113 (or 64) to have this effect.

This also explains some other weird effects.

  • Synergy produces one keycode for both left and right Command, and one for both left and right Option. Its logic squashes a keypress to a desired modifier, which it then has to map back to a keysym - and it loses the left/right distinction in the process.

  • If I use my scorched-earth technique but tell xmodmap to map keysyms to modifiers that Synergy isn't expecting, the Command or Option keys will mysteriously have no effect at all. Synergy's log complains that it can't find a mapping for them.

  • Synergy has its own modifier key mapping feature. If you tell it to map a key to Meta, and the X11 target has a default keymap, it will try to create Meta from a crazy combination of Shift+Alt. The reason why is clear if you work backwards from these lines of xmodmap output:

    keycode  64 = Alt_L Meta_L Alt_L Meta_L
    mod1        Alt_L (0x40),  Alt_R (0x6c),  Meta_L (0xcd)
    

My erroneous mental model

This took a long time to debug because I thought Synergy was mapping keypresses on the Mac to keycodes or keysyms on X11. If that had been the case then xmodmap would have swapped the keys as I expected. I would also have seen different keysyms in xev for left and right. And I would not have seen the mysterious disappearing keypresses nor the crazy multiple keypresses.

It took me a lot of fumbling to find some reproducible behaviour from which I could work out a plausible explanation :-(

The right way to swap modifier keys with Synergy

The right way to swap modifier keys with Synergy is to tell the Synergy server (which runs on the computer to which the keyboard is attached) how they keyboard modifiers should map to modifiers on each client computer. For example, I run synergys on a Mac called white with a synergyc on a PC called grey:

    section: screens
        grey:
            super = alt
            alt = super
        white:
            # no options
    end

You can do more complicated mapping, but a simple swap avoids craziness.

I also swap the keys with xmodmap. This has no effect for Synergy, because it spots the swap and unswaps them, but it means that if I plug a keyboard directly into the PC, it has the same layout as it does when used via Synergy.

Coda

I'm feeling a bit more sad about Synergy than I was before. It isn't just because of this crazy behaviour.

The developers have been turning Synergy into a business. Great! Good for them! Their business model is paid support for open source software, which is fine and good.

However, they have been hiding away their documentation, so I can't find anything in the current packages or on the developer website which explains this key mapping behaviour. Code without documentation doesn't make me want to give them any money.

| Leave a comment (2) | | Flag

fanf

Warning, incoming Dilbert

15th Apr 2016 | 18:16

Just a quick note to say I have fixed the dilbert_zoom and dilbert_24 feeds (after a very long hiatus) so if you follow them there is likely to be a sudden spooge of cartoons.

| Leave a comment (6) | | Flag

fanf

Using a different X11 window manager with XQuartz

13th Apr 2016 | 22:42

(In particular, i3 running on a remote machine!)

I have been rebuilding my workstation, and taking the opportunity to do some major housecleaning (including moving some old home directory stuff from CVS to git!)

Since 1999 I have used fvwm as my X11 window manager. I have a weird configuration which makes it work a lot like a tiling window manager - I have no title bars, very thin window borders, keypresses to move windows to predefined locations. The main annoyance is that this configuration is not at all resolution-independent and a massive faff to redo for different screen layouts - updating the predefined locations and the corresponding keypresses.

I have heard good things about i3 so I thought I would try it. Out of the box it works a lot like my fvwm configuration, so I decided to make the switch. It was probably about the same faff to configure i3 the way I wanted (40 workspaces) but future updates should be much easier!

XQuartz and quartz-wm

I did some of this configuration at home after work, using XQuartz on my MacBook as the X11 server. XQuartz comes with its own Mac-like window manager called quartz-wm.

You can't just switch window managers by starting another one - X only allows one window manager at a time, and other window managers will refuse to start if one is already running. So you have to configure your X session if you want to use a different window manager.

X session startup

Traditionally, X stashes a lot of configuration stuff in /usr/X11R6/lib/X11. I use xdm which has a subdirectory of ugly session management scripts; there is also an xinit subdirectory for simpler setups. Debian sensibly moves a lot of this gubbins into /etc/X11; XQuartz puts them in /opt/X11/lib/X11.

As well as their locations, the contents of the session scripts vary from one version of X11 to another. So if you want to configure your session, be prepared to read some shell scripts. Debian has sensibly unified them into a shared Xsession script which even has a man page!

XQuartz startup

XQuartz does not use a display manager; it uses startx, so the relevant script is /opt/X11/lib/X11/xinit/xinitrc. This has a nice run-parts style directory, inside which is the script we care about, /opt/X11/lib/X11/xinit/xinitrc.d/98-user.sh. This in turn invokes scripts in a per-user run-parts directory, ~/.xinitrc.d.

Choose your own window manager

So, what you do is,

    $ mkdir .xinitrc.d
    $ cat >.xinitrc.d/99-wm.sh
    #!/bin/sh
    exec twm
    ^D
    $ chmod +x .xinitrc.d/99-wm.sh
    $ open /Applications/Utilities/XQuartz.app

(The .sh and the chmod are necessary.)

This should cause an xterm to appear with twm decoration instead of quartz-wm decoration.

My supid trick

Of course, X is a network protocol, so (like any other X application) you don't have to run the window manager on the same machine as the X server. My 99-wm.sh was roughly,

    #!/bin/sh
    exec ssh -AY workstation i3

And with XQuartz configured to run fullscreen this was good enough to have a serious hack at .i3/config :-)

| Leave a comment (2) | | Flag