dotat

Tony Finch

Friday 5th February 2010

Generalized literal syntax for programming languages

For a while I have had a fantasy programming language syntax idea that I have been failing to write up. This week I found out that it has already been implemented twice, which is cheering news :-)

The main inspiration is the special support for regular expression literals in languages such as Perl and Javascript. It's annoying that regexes are privileged with special syntax, but library writers can't define domain-specific syntax for their own purposes.

In fact Perl has several special literal syntaxes: single-quotes and q{} for verbatim strings; double quotes and qq{} for backslash-escaped interpolated strings; backticks and qx{} for shell commands; qw{} for word lists; slashes, m{}, s{}{}, and qr{} for regular expressions; tr{}{} for character substitution; and << for "here" documents.

The D programming language also has some extra flavourful string literals: r"" or `` for verbatim strings; "" for backslash-escaped strings; x"" for hex-encoded data; and bare backslash escape sequences.

What I'd like to be able to do is define a library for handling my special syntax. It would work as a plugin for the compiler that would parse and check my special literals at compile time (no run-time syntax errors!) and emit code that implements their special semantics. This framework means that, instead of being built into the language, libraries can provide features like converting backslash escape sequences into control characters, turning interpolated strings into a series of concatenations, arranging for regular expressions to be compiled once, and so forth.

You could then provide support for (say) XML literals, XPath expressions, better pattern matching syntax, or whatever else you might fancy.

One possibility that is very enticing is to make string interpolation context-aware, so that interpolated strings can be automatically escaped properly. The mixture of languages and syntaxes in a web page makes this fiendishly complicated. Different SQL engines have different escaping requirements. If this kind of hard-won knowledge can be implemented once in a library, security vulnerabilities such as cross-site scripting and SQL injection would be easier to avoid. The Caja project includes a proposal for secure string interpolation in Javascript which explains these issues very well.

The syntax I had in mind is inspired by Perl (and D's letter prefixes are along similar lines), for example, re$/^ *#/ is a regular expression for matching comment lines. A literal starts off with an identifier that specifies the literal's compiler. These identifiers have their own namespace so they can be very terse without clashing with variable names. The identifier is followed by a $ which indicates this is a string. The contents of the literal are delimited in the same way that Perl's generic literals, either with nesting {} () <> [] brackets or with matching punctuation. There's no way to escape delimiters within the literal, so that the literal can be passed unmodified to its compiler. For longer literals, or if single character delimiters are awkward, you can use $$ instead of $ and the rest of the line forms the delimiter, a bit like a here document. A literal compiler may or may not support interpolation, and defines its own syntax for doing so.

This week I was pleased to find out that the Glasgow Haskell Compiler supports my generalized literal idea. They call it "quasiquoting" after Lisp. The syntax it supports looks like [$re|^ *#|] though this is likely to change to [re|^ *#|]. The brackets are based on Template Haskell's [| |] quotation brackets. The quasiquoting paper has lots of good rationale for the feature and great examples of how easy Haskell makes it to write certain kinds of literal compilers. It also allows literal compilers to define their own interpolation syntax.

The E programming language has a feature called "quasi-literals" which look like rx`^ *#` with back-ticks for delimiters. The secure string interpolation document criticizes them for being executed at run-time, not compile time, so perhaps they aren't quite what I have in mind; also I hope they are wrong to say that the feature can only compile quasi-literals to parse trees. The E documentation is sparse so it's hard to tell.

I should also mention Lisp here, since it's the king of metaprogramming. I expect you could implement a feature like this using Common Lisp reader macros...

Do any other languages have a feature like this?

(14 comments | Leave a comment)

More coroutines

Following my previous entry I have put up a git repository containing a few little coroutine implementations. They are all pure ANSI/ISO C 89 and don't depend on magic stack pointer arithmetic, but instead maintain a special coroutine at the top of the stack which is responsible for allocating space for new coroutines.
(2 comments | Leave a comment)

Friday 22nd January 2010

Coroutines in less than 20 lines of standard C

Here's a bit of evil hackery to brighten your weekend.

It's quite simple to switch between coroutines using standard C: use setjmp() to save the state of the current coroutine, and lonjmp() to jump to the target coroutine. It's also useful to be able to transfer a bit of data while transferring control. The function coto() (short for "coroutine goto") does this. First it saves its argument somewhere the target coroutine can find it, then it calls setjmp() which saves the current coroutine's state and returns 0, so it goes on to longjmp() to the target coroutine, where setjmp() returns 1 so coto() returns the saved argument.

    static void *coarg;

    void *coto(jmp_buf here, jmp_buf there, void *arg) {
	coarg = arg;
        if (setjmp(here)) return(coarg);
	longjmp(there, 1);
    }

The other thing you need to be able to do is create coroutines. This requires allocating space for the coroutine's stack, initializing a stack frame there, and jumping to it. This is usually considered impossible to do without special support from the operating system or the run-time system - for example, see this paper (postscript). However if we can assume the stack is laid out contiguously in memory and that variable length arrays are allocated on the stack, we can do it using pure standard C99. (If your compiler doesn't support variable length arrays, alloca() might be available as an alternative.)

Each coroutine's stack is a chunk of the memory area used for the normal process stack. To allocate a new chunk of stack, we add the chunk size to the current top of stack (which allows space for future function calls by the current topmost coroutine) to give us a target location for the new topmost coroutine's stack. We then need to move the stack pointer to this new location, which we can do by creating a variable length array of the appropriate length. We can use the address of a local variable to approximate the current stack pointer when calculating this length. Finally, a simple function call creates the coroutine's initial stack frame and jumps to it.

The function cogo() (short for "coroutine start") implements this. It also saves the current coroutine's state before starting the new one.

    #define STACKDIR - // set to + for upwards and - for downwards
    #define STACKSIZE (1<<12)

    static char *tos; // top of stack

    void *cogo(jmp_buf here, void (*fun)(void*), void *arg) {
        if (tos == NULL) tos = (char*)&arg;
        tos += STACKDIR STACKSIZE;
        char n[STACKDIR (tos - (char*)&arg)];
        coarg = n; // ensure optimizer keeps n
        if (setjmp(here)) return(coarg);
	fun(arg);
	abort();
    }

Here's a little test program to demonstrate these functions in action.

    #define MAXTHREAD 10000

    static jmp_buf thread[MAXTHREAD];
    static int count = 0;

    static void comain(void *arg) {
	int *p = arg, i = *p;
	for (;;) {
		printf("coroutine %d at %p arg %p\n", i, (void*)&i, arg);
		int n = arc4random() % count;
		printf("jumping to %d\n", n);
		arg = coto(thread[i], thread[n], (char*)arg + 1);
	}
    }

    int main(void) {
	while (++count < MAXTHREAD) {
		printf("spawning %d\n", count);
		cogo(thread[0], comain, &count);
	}
	return 0;
    }
(6 comments | Leave a comment)

Wednesday 16th December 2009

stupid email disclaimers

Dear morons, I'm glad to know that this important marketing message is confidential, and that I shouldn't tell anyone about your branding change even though it's obvious because you are changing the signs on your buildings.

PS. if you are going to use an email service provider to send this shit, at least you could hire one that is able to delete your misleading disclaimers first.

Received: from mail.highford.com ([213.210.16.63]:53904)
    by ppsw-6.csi.cam.ac.uk (mx.cam.ac.uk [131.111.8.146]:25)
Message-Id: <curu/yjjezx3xvv8q6i5ew@hearfrom.com>
From: Bradford & Bingley <news@savings.bradford-bingley.co.uk>
To: Mr Finch <dot@dotat.at>
Subject: Update Bradford & Bingley Savings changing to Santander

Dear Mr Finch,

As of 11 January 2010 Abbey National plc which includes the
Bradford & Bingley savings business will change its name to
Santander UK plc and operate under the brand name of Santander.

[...]

This email and any files transmitted with it are confidential
and intended solely for the use of the individual or entity to
whom they are addressed. If you have received this email in
error please notify the system manager.

[...]
(3 comments | Leave a comment)

Friday 11th December 2009

Signing the root zone.

If you are a DNS, network, or firewall operator, you need to be aware that the root zone of the DNS is going to be signed with DNSSEC in stages during the first half of 2010.

You need to ensure that any packet filters between your recursive DNS resolvers and the public Internet do not block UDP DNS packets larger than 512 bytes, and that they do not block fragmented UDP packets, and that they do not block ICMP "fragmentation needed" packets, and that they do not block DNS-over-TCP.

The reason for this is that DNSSEC makes DNS packets larger, since as well as the answer they must also contain a cryptographic proof that the answer is correct. Misconfigurations that are benign with insecure DNS can cause problems with the move towards DNSSEC. The DNS Operations and Analysis Research Centre has a reply size tester which you can use to check that your systems are compatible with large DNS reply packets.

See these presentation slides for some details on the process of signing the root zone. See this blog post from RIPE, operators of the K root server, for some information about how they are preparing for the change.

ICANN have published a paper about the predicted effects of DNSSEC on broadband routers and firewalls. Gaurab Raj Upadhaya has published a few slides about EDNS0, the DNS extension protocol that enables large packets.

Please go out and check your DNS resolvers before they break!

(1 comment | Leave a comment)

Wednesday 9th December 2009

iCalendar is wrong

(This article is a much-expanded version of a comment I wrote months ago on mathew's blog.)

There's a programmers' rule of thumb that timestamps should always be stored in a form that's unambiguously inter-convertible with UTC, or some reasonable approximation such as POSIX time_t. In particular, you should never store local time without also storing its timezone, and you should represent timezones as UTC offsets instead of using a familiar but ambiguous abbreviation. For textual representations the right answer is usually ISO 8601 / RFC 3339.

This rule of thumb is good if you are storing the times of events that happened in the past, such as in logs or in message headers. However it isn't good for events that happen in the future, when those events have any bearing on time as used by people in the outside world. The reason for this is the instability of time zones.

Bad solutions to timezone problems )
My solution )
Complications )
Conclusion

Sadly it seems that the world is stuck with iCalendar, and when timezone-related problems occur calendar programmers blame politics or DST, rather than their inadequate data model. What is worse is that it appears to be very unlikely that a properly designed calendar program could interoperate with iCalendar data without loads of ad-hockery and lossage because of the mis-match between the data models.

How annoying.

(12 comments | Leave a comment)

Thursday 26th November 2009

FreeBSD unifdef(1) and factor(6)

Earlier this week I got a bug report for unifdef from Jonathan Nieder, who now maintains the Debian package. This prompted me to do some work on it, which it sorely needed - I hadn't touched it since March last year, and I had a bunch of unincorporated patches from Anders Kaseor, and I haven't backported the fix for an embarrassing bug from FreeBSD-8 to FreeBSD-7.

I've dealt with all but the last of these, and the latest version unifdef-1.188 is available from http://dotat.at/prog/unifdef/. I have also committed it to FreeBSD-9-CURRENT. When the code freeze for FreeBSD-8-STABLE is lifted (FreeBSD-8.0 should be released RSN!) I'll backport unifdef to the 8-STABLE and 7-STABLE branches.

This evening I also did a bit of work on FreeBSD's version of factor. I noticed a while back that it has a performance bug: it's very slow to factorize some numbers, for example my phone number. I got its bignum factoring code from NetBSD back in 2002, so I went back there to see if they knew anything about this bug. It turns out that jsm28 fixed the bug in 2004. Five and a half years later I've finally committed the fix to FreeBSD...

(2 comments | Leave a comment)

Saturday 7th November 2009

More spam bot signatures

There's another spam bot heuristic which is the exact complement to the one described in my prevous post. The idea is to keep rack of how many different HELO domains an SMTP client uses.

  defer
    message   = Probable spam bot HELO varies between $sender_rate domains
    # whitelist checks go here
    ratelimit = 2.2 / 1w / per_conn / strict \
      / unique=$sender_helo_name / $sender_host_address

This is mostly the same as the code in my previous post, but the lookup key is the client's IP address, and we only increase the measured rate if the client uses a different unique HELO domain.

This check also works very well at detecting spam bots. When testing it I noticed that one particular bot likes to use a HELO domain consisting entirely of random upper-case letters, and it only talks to one of our servers with the lowest IP address. So I added a specific (cheaper) check to deal with it. This causes the bot to go away, and legitimate senders will retry with a different server.

  defer
    message   = Probable spam bot HELO - please try another server
    condition = ${if and{{ eq{$primary_hostname}{ppsw-0.csi.cam.ac.uk} } \
                              { match{$sender_helo_name}{^[A-Z]+\$} }} }

There is a small risk of false positives with the variable HELO domain test. Some outgoing mail server clusters are behind a NAT, so we see HELO domains from multiple servers coming from the same IP address. I also found a number of false positives for the popular HELO domain check (most prominently rediffmail.com and easyjet.com). The way I'm dealing with them (which I hope will work in the long term) is as follows.

For the popular HELO domain check described in my previous post, I maintain a lookup table of legitimate HELO domains that trigger the check. The following replaces the hard-coded check for localhost.localdomain that appears in my previous entry.

    condition = ${if !match_domain{$sender_helo_name}{cdb;DB/helo_ok.cdb} }

As well as allowing through clients whose HELO domain can be verified, I now also check dnswl.org for well-known legitimate senders, and I maintain a table of sending hosts that slip through the other checks.

  ! verify   = helo
  ! hosts    = +helo_ok
  ! dnslists = list.dnswl.org

To avoid problems with false positives, my anti-spam-bot checks return a temporary error code ("defer" in Exim-speak instead of "deny"). I then have a nightly audit script which looks for hosts that appear to be repeatedly retrying a messages and which should be added to my whitelist tables. It might be possible to automate this table maintenance, again using the ratelimit feature, but I expect that will require a bit more experience to determine the right thresholds.

(2 comments | Leave a comment)

Thursday 5th November 2009

Spam bot signatures

Recently I have been investigating spam bot signatures, specifically the characteristic domain names they choose to put in their SMTP HELO commands. A lot of spam bots use the same HELO domains from lots of different compromised PCs, which makes them quite easy to spot and block without any risk of blocking legitimate email. This kind of block can take care of about 15%-20% of spam without relying on 3rd party services like DNSBLs. Of course, this is one of the techniques used to populate the Spamhaus XBL so the only advantage of doing it yourself is if you want to spot spam bots that have not yet been spotted by the Spamhaus guys.

Steve Champeon's Enemieslist service is the highly developed commercial implementation of this idea. His patterns are much more comprehensive, covering spam bot signatures, domestic IP connectivity (like the Spamhaus PBL), and spam-infested netblocks.

Yesterday evening I was thinking about how to automatically identify spam bot signatures, when I realised that I had already written the code to do the job! I wanted to count how many different IP addresses were using the same HELO domain, and block connections that used excessively popular domains. All I needed was a few lines of Exim configuration:

  deny
    message   = Probable spam bot HELO seen from $sender_rate networks
    condition = ${if !eqi{localhost.localdomain}{$sender_helo_name} }
  ! verify    = helo
    ratelimit = 4 / 1w / per_conn / strict \
      / unique=${mask:$sender_host_address/24} / ${lc:$sender_helo_name}

Let's unpack this in reverse order.

We're measuring the rate of use of HELO domains, so the ratelimit key is ${lc:$sender_helo_name}. It's forced to lower case so that SERVER and server are treated as the same thing.

But we don't care about the total usage rate, only the rate of uses from different unique IP addresses. The unique= option invokes the Bloom filter code to avoid counting each spam bot more than once. In fact we only count different unique /24 network blocks, in order to avoid false positives from mail clusters in which all servers use the same name. For example, Facebook's MTAs all say HELO mx-out.facebook.com though they are spread across about 100 IP addresses on a couple of /24 networks.

The strict option means keep counting even when the measured rate has passed the limit. The per_conn option means only count once for each connection (which mainly helps with efficiency).

The smoothing period is set to one week, which should mean that Exim doesn't easily forget which HELO domains have been abused.

I've currently got the limit set to 4 blocks of /24. It might even be reasonable to reduce this to 3. I'll need to run it a bit longer to see if any more odd false positives sneak out of the woodwork.

We do not apply this check if the DNS agrees with the HELO domain. There are some legitimate host names which are being heavily abused by spam bots, such as mail.aol.com and mx54.mail.com, so we want to block them if the connection comes from anywhere other than the host itself. (Sadly Facebook's MTAs are misconfigured so they don't pass this check.)

The only exception to this (so far) is localhost.localdomain which is the result of a popular misconfiguration (or lack of configuration) on legitimate Unix MTAs. If I find any other false positives they'll get checked in a similar way.

ETA rediffmail.com also needs whitelisting - it's the mail service of rediff.com which is a portal for Indian expats. Also easyjet.com.

This heuristic seems to catch about three different kinds of spam bot behaviour.

  • The HELO domain is the same as the domain in the MAIL FROM address. Spammers like to forge email "from" surprisingly few popular sites.
  • The HELO domain is one of a few bare hostnames, such as pc or computer - I guess they use the name of the compromised host.
  • The HELO domain is a parent domain of an ISP's edge network, e.g. telesp.net.br.

I'm really pleased by how easy and effective this has turned out to be. The only annoyance is that it took me 20 months to realise that my Bloom filter ratelimit code could do this! Also I hope there aren't too many lurking gotchas that I haven't spotted yet.

This check really shows up a long-standing weakness in Exim's hints database implementation. It just uses a local DBM file to store ratelimit data, so each individual server in my SMTP cluster has to accumulate data on spam bot HELO domains without being able to benefit from the experience of the rest of the cluster. I suppose I should spend some quality time with Tokyo Tyrant...

(6 comments | Leave a comment)

Friday 30th October 2009

bogon spam

Someone from Craigslist posted to NANOG saying they were suffering from a lot of spam from unallocated IP address space. So I wrote a program which downloads the RIR allocation statistics tables, compiles a list of allocated /24 blocks, and scans Exim's logs for connections from unallocated space. It turns out we are not suffering from a lot of spam from unallocated IP address space - we get on average less than one bogon connection each day.

(2 comments | Leave a comment)

ENHANCEDSTATUSCODES FAIL

I sort of expect an SMTP server that claims to support enhanced status codes to, y'know, put enhanced status codes in its responses...

220 bill.internal.[redacted] ESMTP Symantec Mail Security
EHLO ppsw-0.csi.cam.ac.uk
250-bill.internal.[redacted] says EHLO to 10.0.64.17:58126
250-ENHANCEDSTATUSCODES
250-PIPELINING
250 8BITMIME
MAIL FROM:<dot@dotat.at>
250 MAIL FROM accepted
RCPT TO:<asldkja@sjkfsajhg>
554 Recipient address rejected: User unknown
RSET
250 RSET OK
QUIT
221 bill.internal.[redacted] closing connection
(2 comments | Leave a comment)

Tuesday 27th October 2009

Penalising senders who use invalid recipient addresses

Here's an idea for penalising senders who use out of date mailing lists: temporarily reject recipients depending on the relative proportion of valid and invalid recipients. The more invalid addresses they try to send mail to, the slower they'll be able to deliver mail.

This isn't suitable for use in all situations, because there are badly-run lists that nonetheless carry legitimate and desirable traffic. There are also well-run lists that will be unreasonably penalised by my site's annual bulk cancellations of departing users. But it might be useful in borderline situations, e.g. when the sender has passed the Spamhaus check but failed a DNSWL check.

The way to implement this is using my rate measurement equation to keep track of each sender's valid and invalid recipients, two rates per sender. When a sender addresses an invalid recipient, update their invalid recipient rate and also calculate their decayed valid recipient rate (which is just a * rold). When a sender addresses a valid recipient, do the complementary calculations. The proportion of valid recipients is rok / (rbad + rok). Pick a random number between 0 and 1 and if it's greater than that proportion, return a 450 (temporary rejection) otherwise return 250 (ok) or 550 (bad).

If you are feeling mean you can crank up the deferral rate faster than the invalid recipient rate. The rationale for this is that a list with a 50% validity rate is shockingly bad and probably deserves more like a 90% deferral rate (say). So calculate (rok / (rbad + rok))n for some n that increases with your meanness, and use the result for comparing with the random number.

If you are feeling kind you can pick your random numbers between 0 and slightly less than 1, so senders don't have to be whiter than white to avoid penalties. This tweak might make the technique less troublesome.

(I initially thought about deferring just valid recipients, but that has the effect of cleaning the invalid recipients out of the sender's queue, so when they retry the deferred valid recipients, the proportion of validity we measure for them will go up. Deferring both valid and invalid recipients solves this problem and makes the implementation simpler and more pleasingly symmetrical.)

This is actually fairly similar to some of the logic in SAUCE. It keeps a single annoyance number per sender which gets increased when they do something bad and decreased when they do something right, and which decays exponentially so past behaviour fades from memory. SAUCE's main penalty is teergrubing, though it will defer incoming mail if provoked enough, or if it is greylisting the sender. I don't think there's much point in teergrubing except to trigger bugs in spam bots (hence sendmail has a simple greet_pause feature, but no general teergrubing). However there might be some benefit from more intelligent dynamic greylisting.

(4 comments | Leave a comment)

Wednesday 16th September 2009

DNSSEC

Delegation, delegation! Delegation,
that's what you need.
If you want your names to nest,
and you don't want to host the rest,
ooh-ooh, delegation's what you need!

So last night I laughed out loud at a DNSSEC RFC. I blame Douglas Adams.

Near the start of Life, the Universe, and Everything, Arthur Dent and Ford Prefect escape from prehistoric Earth on Eddy's time-travelling sofa, and find themselves at Lord's cricket ground. There, shortly before all hell breaks loose, they spot Slartibartfast's spaceship hidden with a SEP. An S.E.P. is a kind of cloaking device which works at a psychological level. It prevents you from seeing something, or rather it forces you to disregard it because it is Somebody Else's Problem and nothing you need to worry about.

In DNSSEC there are (by convention) two kinds of key. Key Signing Keys are as secure as possible (the keys are big and their private parts can be kept offline) so that they can have a long lifetime. Zone Signing Keys are smaller to reduce packet sizes and CPU usage, and their private parts need to be kept online to support dynamic DNS; to compensate for their relative lack of security they have a shorter lifetime. The KSK's public part is used as a "trust anchor" that you give to other people so they can authenticate the contents of your zone. You want it to have a long lifetime so you don't have to keep bothering them with updates. Although the KSK/ZSK arrangement is just a convention, it turns out to be useful to have some explicit signalling in the protocol to distinguish them, so they added a flag bit to DNSKEYs for this purpose. However they chose a different name for the bit to emphasize that you can manage your keys in unconventional ways, so it is called the "secure entry point" bit. This indicates the key is intended to be used as a trust anchor (e.g. in RFC 5011 rollover).

To me the SEP bit is the "somebody else's problem" bit. But it has exactly the opposite effect of Douglas Adams's SEP, since it's the bit you set to make other people pay extra attention. I thought this was funny, but then I had been drinking Ardbeg.

(1 comment | Leave a comment)

Wednesday 2nd September 2009

Firewall fallout / Exchange oddity

This morning my colleagues turned off their Cisco PIX SMTP fuxup mode and most of the failing email finally made it through. But not all. So I broke out tcpdump again.

BTW, a handy way to split a tcpdump output file called dump into separate TCP connections is something like the following. This assumes that one of the endpoints of each connection is always the same port on the same server IP address.

    tcpdump -vvv -r dump |
    sed '/.* clientname[.]\([0-9]*\) .*/!d;s//\1/' |
    sort -u |
    while read p; do tcpdump -r dump -w dump.$p port $p; done

I looked at the connections which ended with a timeout during the data transmission phase. It turned out that just two messages were still having problems. And what strange beasts they were. The following is a paraphrase of the key features of the messages.

MAIL FROM:<mailserver@dept.cam.ac.uk> SIZE=12345
250 OK
RCPT TO:<exchange@dept.cam.ac.uk>
250 Accepted
DATA
354 Enter message, ending with "." on a line by itself
X-MimeOLE: Produced By Microsoft Exchange V6.5
Content-class: urn:content-classes:message
MIME-Version: 1.0
Content-Type: application/ms-tnef;
        name="winmail.dat"
Content-Transfer-Encoding: binary
Subject: Service Unavailable
Date: Wed, 2 Sep 2009 12:34:56 +0100
Message-ID: <i6wnzxxs3ctmf9qb@mailserver.dept.cam.ac.uk>
X-MS-Has-Attach:
X-MS-TNEF-Correlator: <i6wnzxxs3ctmf9qb@mailserver.dept.cam.ac.uk>
Thread-Topic: Service Unavailable
Thread-Index: GV7mWrV6mdBQLdgGFAfkSCsA
From: "/O=IT Support/OU=Department/cn=Configuration"
      "/cn=Servers/cn=MAILSERVER/cn=Microsoft Public MDB"
      <mailserver@dept.cam.ac.uk>

... loads of binary TNEF guff with lots of UTF-16 ...

The X.400 address is funny, but what was actually causing the problem was the attempt to send a binary message. Our servers don't support the BINARYMIME extension, so this is verboten. The reason it caused a timeout is that (being binary rather than lines of text) the message didn't end with a CRLF newline sequence, so the DATA dot-CRLF terminator did not appear at the start of a line, so our server thought it was part of the data and continued waiting for more.

Bizarre. Why is the message being sent to my servers when it should have been delivered internally? (Its recipient is a departmental address.) Why isn't Exchange downgrading the binary content as required by the specification? Alternatively, why is it using DATA to send a binary message when you can only do that using the BDAT command?

Very strange.

(3 comments | Leave a comment)

Tuesday 1st September 2009

More firewall hate

I've spent most of this afternoon staring at tcpdump traces trying to work out what is causing connections from an Exchange server to my Exim servers to be lost.

The original symptom was that our sending rate measurements for the server were going crazy - increasing massively seemingly without a corresponding increase in traffic. This happens because (in our configuration) if a connection is going OK (valid sender and recipient addresses, etc.) Exim doesn't log anything until a message is accepted. Our rate measurement calculations are performed when processing RCPT commands, before the connections were lost, and also before they got logged. (This is, obviously, less than ideal.) So the Exchange server's sending rate seemed to go up for no reason.

I temporarily increased our logging which verified that this was the correct explanation. But why were the connections being lost? I ran tcpdump for a bit and looked at the contents of the connections from the problem host. This showed that a couple of messages with lots of recipients were being repeatedly retried, only to fail because the connection was lost mid-transaction. Most messages were getting through OK.

I suspected a broken firewall, so I sent a message to the relevant staff giving them some details of the problem and asking if they have a firewall that might be causing it. I turned off tcpdump to stop it filling the disk :-) and kept half an eye on the logs. Soon a new and useful bit of information turned up: (I've edited this log extract a bit.)

2009-09-01 18:00:20 +0100
    SMTP protocol synchronization error
    (next input sent too soon: pipelining was advertised):
    rejected "XXXX TO:<abc123@cam.ac.uk>"
    H=mailserver.dept.cam.ac.uk [131.111.999.999]:50555 I=[131.111.8.131]:25
    next input="RCPT TO:<abc123@cam.ac.uk>\r\nRCPT TO:<abc123@cam.ac.uk>\r\n"

Exim is complaining about the XXXX command because only certain specific commands may be pipelined and XXXX isn't one of them. So why is the Exchange server sending gibberish XXXX (sic!) instead of RCPT? In fact it isn't. I took another tcpdump and found that the problem command fell across a packet boundary: one packet ended with @cam.ac.uk>\r\nXX and the next one started with XX TO:<.

This is an absolutely classic signature of the utterly braindamaged Cisco PIX firewall's SMTP fuxup mode. Oh it aggravates me so much. It replaces SMTP commands that it doesn't know with XXXXes, and it's too stupid to handle packet boundaries correctly.

I sent another message to the department's IT staff telling them to fix their firewall ASAP. I shall probably send similar instructions to another department with a misconfigured PIX, since they brushed off my previous request to fix it. Both departments are also having problems with email from us to them, about which our servers can log more details, so I hope I have enough evidence to persuade them to heed my advice.

(a previous story of packet boundary screwups)

(1 comment | Leave a comment)

Tuesday 11th August 2009

Duplex printers

I'm annoyed by how slow our office printer is at printing on both sides of the paper. It occurs to me that it ought to be possible to make it much faster (a similar speed to single-sided printing) without too much extra complexity.

At the moment it handles one sheet at a time, in the sequence feed / print / reverse / print / eject. The feed and eject phases can be overlapped but there's otherwise little opportunity for pipelining.

If the duplex mechanism includes a paper path that goes from the print mechanism's exit to its entrance while turning over the sheet, then the path through the printer can be one-way (except inside the duplexer) and the duplexer can operate at the same time as the printer. The whole thing can then handle two sheets at once. The sequence goes like this:

  • feed sheet 1
  • print page 2 on sheet 1
  • sheet 1 into duplexer / feed sheet 2
  • sheet 1 through duplexer / print page 4 on sheet 2
  • sheet 1 into printer / sheet 2 into duplexer
  • print page 1 on sheet 1 / sheet 2 through duplexer
  • sheet 1 exits / sheet 2 into printer
  • print page 3 on sheet 2
  • sheet 2 exits / feed sheet 3
  • ...

I have assumed a C-shaped or S-shaped paper path, where the printer prints on the top of the sheet which turns over before dropping into the hopper, so that you end up with the stack of paper face down in the correct order. Our printer's duplex mode presumably prints pages in even/odd order (i.e. page 2 then page 1 on sheet 1, page 4 then page 3 on sheet 2, etc.) so the output stack still ends up in the right order. My duplexer's 2/4/1/3 page ordering puts greater demands on the RIPper's memory but that shouldn't be a big deal these days. (On the other hand, why does our printer appear to wait while RIPping? Shouldn't that be instant nowadays?)

So I wonder if anyone makes a printer with a duplexer that works this way. It should be fairly obvious from the duplex / duplex / exit / exit rhythm.

(9 comments | Leave a comment)

Job Ad - VOIP sysadmin

The University of Cambridge Computing Service is currently looking for a second VOIP sysadmin. We've just finished replacing our old analogue phone switch with a Cisco-based setup serving over 16,000 lines. There are more details (including salary level and waffle) in the job ad.

(Leave a comment)

Monday 10th August 2009

Sainsbury's self-checkout fail

I hate plastic carrier bags, so I take a canvas bag when I go shopping (unless I forget). The city-centre Sainsbury's in Cambridge recently replaced their express tills with self-checkout tills, which use weighing scales under the bags to check that scanned things have been bagged properly.

If you put an unexpected object in the bagging area it flashes up an annoying message. This screen has a button on it saying "Using your own bag?" which I am pretty sure used to cancel the message and allow you to scan your shopping. It is now non-functional: if you try pressing it a member of staff has to come over and explain the correct procedure. If you want to use your own bag, before you put it on the bagging area you have to go into the "select an item" menu (for bananas and fresh bread etc.) and choose the "bag re-use" icon. Obviously.

(19 comments | Leave a comment)

Thursday 30th July 2009

More on O(log(log(n))) searching

Thanks to everyone for the informative comments on my previous post.

I found a bug in my simulation code which meant that many of the tests were performed on arrays that were half-full of zeroes (oops). This distorted the stats a little, and more for larger N because I tested fewer different arrays for large N. This bug caused the increase in s.d. for large N: the corrected code has the same s.d. for all N. Another effect was to increase the maximum number of iterations required to find an entry. After the fix the maximum number of iterations goes down to log(log(N))+12 for the pure secant method, and (log(N)+14)/2 for Junio's mixed secant/bisection method. Altogether much more well-behaved.

(5 comments | Leave a comment)

Tuesday 28th July 2009

Searching a sorted array faster than O(log(N))

The usual way to find an element in a sorted array is using a binary search, which takes log(n) time, where logs are understood to be in base 2 and N is the size of the array.

Linus Torvalds made the clever observation that you can do better than log(N) if you know that the contents of the array are uniformly distributed. For instance, git's pack files store multiple objects identified by the SHA-1 hashes of their contents. Each pack file has an index containing a list of the pack's SHA-1 IDs and their objects' locations in the pack. The index is sorted by SHA-1 ID. Since SHA-1 is a cryptographic hash, we can assume its output is uniformly distributed.

Linus described his technique as "Newton-Raphson" which is a bit of a misnomer, since N-R works on smooth differentiable curves whereas what we have is a straight line with some stochastic variations. What we're actually doing is an iterated linear interpolation. If the SHA-1 IDs were perfectly evenly distributed then a single linear interpolation would land us right on the target item, but the random variation means we will be off by some amount, so we need to continue searching.

How far off will we be? It turns out (based on Monte Carlo simulation) that the expected error is about 0.31 * sqrt(N) with a standard deviation of about 0.26 * sqrt(N). This is a really promising result since it implies that each iteration reduces the search space to N1/2 whereas an iteration of binary search reduces it to N/2. So we should expect a complete search to take O(log(log(N))) iterations.

I wrote a simulation to try this out, and it matches this prediction: in fact the number of iterations was about 1 + log(log(N)). However what is the variation around this expected result? In my tests it turned out that the maximum number of probes was log(N) though for small N it bottomed out at about 16. When testing lots of different randomly filled arrays, the standard deviation was about 1.2 for all values of N, but when I tested fewer arrays this number ramped up.

Junio Hamano's implementation of Linus's idea is included in git but disabled by default. He added a tweak that biases the linear interpolation towards the centre of the search range, so it's kind of a balance between binary search and linear interpolation search. In my simulator this tweaked version required (log(N)+3)/2 iterations on average with a standard deviation of 0.8. The maximum number of iterations was again log(N) but it bottomed out at about 12. Overall it's a bit slower but better behaved.

In git, where a large repository might contain two million objects, and where pack index lookups are not particularly performance-critical, this improved lookup code doesn't provide a noticeable advantage. Still, I think it's interesting and the idea might be useful in other situations. Note that unlike a binary search, which can just use comparisons returning greater / equal / less, the linear interpolation search needs to know the absolute values of the elements. Git's code actually uses a lexicographic variant that ignores any common prefix shared by the elements in the search range, and uses only the next two bytes for the interpolation.

To finish, here's a bit of code. In this example, 0.0 <= array[k] < 1.0, and I use k for keys and v for values of array elements. We are searching for vtarg.

	/* all bounds are exclusive */
	double vlo = -DBL_MIN, vhi = +1.0;
	int klo = -1, khi = N;
	while(klo - khi > 1) {
		int kmid = klo + (khi-klo) * (vtarg-vlo) / (vhi-vlo);
		/* ensure rounding does not put us out of bounds */
		if(guess_k <= min_k) guess_k = min_k + 1;
		if(guess_k >= max_k) guess_k = max_k - 1;
		double vmid = array[kmid];
		if(vmid == vtarg) return(kmid);
		if(vmid < vtarg) klo = kmid, vlo = vmid;
		if(vmid > vtarg) khi = kmid, vhi = vmid;
	}
	return(-1);

Addendum: there are a few corrections in a follow-up post.

(6 comments | Leave a comment)

Tuesday 9th June 2009

Tempting Fate, and getting her unwanted attention

Yesterday I dealt with a fairly routine question from a computer officer about sending email from a web server. It's always nice when someone asks for advice, even when they expect the answer won't contain any surprises, just in case it does. I gave our usual advice about the safest ways to set up web forms that send email. The bit that tempted fate was saying that web forms have caused spam problems for us "in the past" - and I had been reflecting recently that we haven't had a big web spam problem for quite a long time.

Our recent problems have all been account compromises of one kind or another. One that will be familiar to other postmasters is caused by users sending their login credentials in reply to a phishing message. We seem to have a relatively clued-up userbase, based on tales from other universities; very few reply to most phishing spams and most of those replies do not contain passwords - they are more or less skeptical and sometimes taunt the spammer. However there's still the occasional idiot, and even a particularly special one who had to be re-educated twice! If any passwords do escape and accounts get used to spam, our strict rate limits for remote email users shut the unwanted flows down pretty quick.

More recently we have had problems with compromised accounts on email systems other than our own. These seem to have been due to infected PCs on networks with a Microsoft Windows domain and Exchange server - I don't think remote access (e.g. via Outlook Web Access) was to blame in these cases. We have been reasonably successful applying rate limits to mail servers that relay outgoing mail through us. We set the limit for each server to be about 10% above their normal peak usage, so it should not trigger unless something suspicious is happening, and we have a monitoring script to warn us if they stray into (or past!) the 10% headroom. The first time this mechanism caught a spam run it stopped an 80,000 message flood about 2% of the way through.

So Fate observed my complacency about web form compromises and handed us a big plate full of crap today. Shortly before I got into work, one of our departments started spewing email and fairly swiftly whacked into their rate limits. I was foolishly trusting of their competence and aware of their occasional need to send large mailshots, so I tried to let the mail through - but it rapidly became clear that tens of thousands of messages was way beyond normal and their arrival rate of hundreds per second was going to cause us problems even if it was legit. Silly me for not looking at a sample message. I got in touch with the department's techies and it rapidly became apparent there was a compromise, and we went into lock-down and clean-up mode. Between us we had to delete about 100,000 queued messages, with the added joy that we had to avoid deleting the couple of dozen legitimate messages that had the same sender address as the spam. Sadly my error meant that 20,000 messages escaped (plus 6000 with invalid recipient addresses) whereas it should have been only two or three thousand :-(

As I said yesterday, there are two ways to ensure that a web form sends email safely. The first is to fix the recipient address, for example in "send feedback to the webmaster" forms, or mail to registered users of your site. The second is to fix the contents of the message, for example in signup or purchase confirmation messages. Either of these is enough to make the form useless to spammers.

Today's excitement was caused by a "mail a copy of this story" form, which should have been implemented safely in the second style. However the form allowed remote users to add their own comments above the story, and gave them plenty of space to do so - enough for a typically lengthy 419 scam.

It's fair to say this has been a learning experience, more painful for some than for others :-) I should remember that technical cockups are one thing, but even if you are technically competent political requirements can still screw the pooch.

(3 comments | Leave a comment)

Tuesday 2nd June 2009

CRSIDs and email addresses

I've recently written a three page memo about the advantages of our username scheme compared to "friendly name" email addresses in large domains. I have put a copy of the PDF on the web which you can read if you like.

(9 comments | Leave a comment)

Friday 15th May 2009

Use your bonce

Vesta includes a purely functional programming language for specifying build rules. It has an interesting execution model which avoids unnecessary rebuilds. Unlike make, it automatically works out dependencies in a way that is independent of your programming language or tools - no manually maintained dependencies or parsing source for #include etc. Also unlike make, it doesn't use timestamps to decide if dependencies are still valid, but instead uses a hash of their contents; it can do this efficiently because of its underlying version control repository. Vesta assumes that build tools are essentially purely functional, i.e. that their output files depend only on their input files, and that any differences (e.g. embedded timestamps) don't affect the functioning of the output.

I've been wondering if Vesta's various parts can be unpicked. It occurred to me this morning that its build-once functionality could make a quite nice stand-alone tool. So here's an outline of a program called bonce that I don't have time to write.

bonce is an adverbial command, i.e. you use it like bonce gcc -c foo.c. It checks if the command has already been run, and if so it gets the results from its build results cache. It uses Vesta's dependency cache logic to decide if a command has been run. In the terminology of the paper, the primary key for the cache is a hash of the command line, and the secondary keys are all the command's dependencies as recorded in the cache. If there is a cache miss, the command is run in dependency-recording mode. (Vesta does this using its magic NFS server, which is the main interface to its repository.) This can be done using an LD_PRELOAD hack that intercepts system calls, e.g. open(O_RDONLY) is a dependency and open(O_WRONLY) is probably an output file, and exec() is modified to invoke bonce recursively. When the command completes, its dependencies and outputs are recorded in the cache.

bonce is likely to need some heuristic cleverness. For example, Vesta has some logic that simplifies the dependencies of higher-level build functions so that the dependency checking work for a top-level build invocation scales less than linearly with the size of the project. It could also be useful to look into git repositories to get SHA-1 hashes and avoid computing them.

It should then be reasonable to write very naive build scripts or makefiles, with simplified over-broad dependencies that would normally cause excessive rebuilds - e.g. every object file in a module depends on every source file - which bonce can reduce to the exact dependencies and thereby eliminate redundant work. No need for a special build language and no need to rewrite build scripts.

(15 comments | Leave a comment)

Thursday 14th May 2009

Define SCM

Here's an abbreviation to avoid, because its various meanings overlap so heavily.

  • Source Code Manager: synonym for version control system, e.g. as used by the Linux crowd in the early days of git.
  • Software Configuration Manager: usually incorporates version control, build system, archival of build products, and CASE methodology baggage, e.g. ClearCASE and Vesta.
  • System Configuration Manager: for system administrators rather than developers, e.g. cfengine and Puppet.
(1 comment | Leave a comment)

Wednesday 13th May 2009

Never delete anything

How long will it be before it becomes normal to archive everything? It's already normal in some situations, and I think that's increasing. It's been the norm in software development for a long time. There's an increase in append-mostly storage systems (i.e. append-only with garbage collection) which become never-delete systems if you replace the GC with an archiver. Maybe the last hold-outs for proper deletion will be high data volume servers...

Anyway, I feel like listing some interesting append-only and append-mostly systems. A tangent that I'm not going to follow is the rise of functional programming and immutability outside the field of storage. Many of these systems rely on cryptographic hashes to identify stuff they have already stored and avoid storing it again, making append-only much more practical.

  • All version control systems, and software configuration management systems even more so. The former archive source code whereas the latter archive build tools and build products as well. DEC's Vesta SCM is particularly interesting, being based on a purely functional build language designed to maximize memoization - i.e. minimize unnecessary rebuilds. It's sort of ccache on steroids since it caches the results of entire module builds, not just individual source file compiles.
  • Nix is a purely functional package manager. Unlike most packaging systems like dpkg or rpm, Nix packages do not conflict with each other: you upgrade by installing new packages alongside your existing ones, then you stop running the old ones and start running the new ones.
  • Archival / backup systems, like Venti which is Plan 9's append-only filesystem. Apple's Time Machine isn't nearly as clever.
  • Most filesystems don't use hash-based uniquification. Append-mostly filesystems often provide cool undelete features like snapshots, e.g. NetApp's WAFL or Sun's ZFS. Early filesystems of this kind, e.g. BSD LFS tried to avoid wasting space, so didn't make old data available as snapshots, and sacrificed performance to eager garbage collection. More recently, DragonFly BSD's Hammer filesystem doesn't even have an in-kernel garbage collector, and running it is entirely optional.
  • Email archives: gmail's ever-increasing quotas, cyrus delayed expunge.
(6 comments | Leave a comment)

Thursday 23rd April 2009

Some thoughts about git

I was originally planning to witter about distributed version control vs. centralized version control, especially the oft-neglected problem of breaking up a large cvs / svn / p4 repository. This was partly triggered by Linus's talk about git at Google in which he didn't really address a couple of questions about how to migrate a corporate source repository to distributed version control. But in the end I don't think I have any point other than the fairly well-known one that distributed version control systems work best when your systems are split into reasonably modestly-sized and self-contained modules, one per repository. Most systems are modular, even if all the modules are in one huge central repository, but the build and system integration parts can often get tightly coupled to the repository layout making it much harder to decentralize.

Instead I'm going to wave my hands a bit about the ways in which git has unusual approaches to distributed version control, and how bzr in particular seems to take diametrically opposing attitudes. I'm not saying one is objectively better than the other, because most of these issues are fairly philosophical and for practical purposes they are dominated by things like quality of implementation and documentation and support.

Bottom-up

Git's design is very bottom-up. Linus started by designing a repository structure that he thought would support his goals of performance, semantics, and features, and worked upwards from there. The upper levels, especially the user interface, were thought to be of secondary importance and something that could be worked on and improved further down the line. As a result it has a reputation for being very unfriendly to use, but that problem is pretty much gone now.

Other VCSs take a similar approach, for example hg is based on its revlog data structure, and darcs has its patch algebra. However bzr seems to be designed from the top down, starting with a user interface and a set of supported workflows, and viewing its repository format and performance characteristics as of secondary importance and something that can be improved further down the line. As a result it has a reputation for being very slow.

Amortization

Most VCSs have a fairly intricate repository format, and every operation that writes to the repository eagerly keeps it in the canonical efficient form. Git is unusual because its write operations add data to the repository in an unpacked form which makes writing cheaper but makes reading from the repository gradually less and less efficient - until you repack the repo in a separate heavy-weight operation to make reads faster again. (Git will do this automatically for you every so often.) The advantage of this is that the packed repository format isn't constrained by any need for incremental updates, so it can optimise for read performance at the expense of greater pack write complexity because this won't slow down common write operations. Bzr being the opposite of git seems to do a lot more up-front work when writing to its repository than other VCSs, e.g. to make annotation faster.

Thus git has two parallel repository formats, loose and packed. Other VCSs may have multiple repository formats, but only one at a time, and new formats are introduced to satisfy feature or performance requirements. Repository format changes are a pain and happily git's stabilized very early on - unlike bzr's.

Laziness

As well as being slack about how it writes to its repository, git is also slack about what it writes. There has been an inclination in recent VCSs towards richer kinds of changeset, with support for file copies and renames or even things like token renames in darcs. The bzr developers think this is vital. Git, on the other hand, doesn't bother storing that kind of information at all, and instead lazily calculates it when necessary. There are some good reasons for this, in particular that developers will often not bother to be explicit about rich change information, or the information might be lost when transmitting a patch, or the change might have come from a different VCS that doesn't encode the information. This implies that even VCSs that can represent renames still need to be able to infer them in some situations.

Git's data structure helps to make this efficient: it identifies files and directories by a hash of their contents, so if the hash is the same it doesn't need to look any closer to find differences because there aren't any - and this implies a copy or rename. This means that you should not rename or copy a file and modify it in the same commit, because that makes git's rename inference harder. Similarly if you rename a directory, don't modify any of its contents (including renames and permissions changes) in the same commit.

Mercurial also uses hashes to identify things, but they aren't pure content hashes: they include historical information, so they can't be used to identify files with the same contents but different histories. Thus efficiency forces hg to represent copies explicitly.

Any more?

I should say that I know very little about bzr, and nothing about tla, mtn, or bk, so if any of the above is off the mark or over-states git's weirdness, then please correct me in a comment!

(6 comments | Leave a comment)

Wednesday 8th April 2009

LISTSERV crapness

LISTSERV uses a null return path (RFC821 MAIL FROM:<>) on its administrative mail, and some mail hosts reject this. I discard message with a null return path that do not match a few simple heuristics, so I lose things like subscription confirmations from services like JISCfail. This makes me cross.

L-Soft claim that LISTSERV is following the specifications, and they cite a couple of paragraphs from RFC 821 (published in 1982) and RFC 1123 (1989). However they fail to cite text from RFC 2821 (2001) which explicitly forbids what they are doing.

There are several types of notification messages which are required by existing and proposed standards to be sent with a null reverse path, namely non-delivery notifications, other kinds of Delivery Status Notifications (DSNs), and also Message Disposition Notifications (MDNs). [...]

All other types of messages (i.e., any message which is not required by a standards-track RFC to have a null reverse-path) SHOULD be sent with with a valid, non-null reverse-path.

The only other permitted use of null return paths that I know of is vacation notifications, described in RFC 3834 (published in 2004).

L-Soft needs to get a grip, read some RFCS, and fix their software.

(6 comments | Leave a comment)

Thursday 2nd April 2009

Configuration management

We've been discussing configuration management in the office this week. None of us are happy with the way we're doing things...

On Hermes, we do most configuration management using rdist. On our management server we have a number of file trees containing files that differ from the default - a smattering of stuff in /etc, the contents of /home and /opt, and a few other bits scattered around. These trees are cloned and hacked for each flavour of machine (ppswitch, cyrus, lists, webmail, etc.) and each version of the underlying OS. These trees are mostly kept under revision control.

This setup has the advantage of simplicity, and it's easy to push out small changes. One key feature is rdist's testing mode, which makes it easy for us to ensure that the servers are in sync with the configuration tree without changing anything. We often run a test across a cluster of 10 or 16 machines in parallel. It's easy to selectively push a change to an idle machine for testing before rolling it out to the rest of the cluster. For more tricky changes I do a phased rollout so I can check for unwanted changes in behaviour without breaking the entire service at once.

Of course it has some serious disadvantages. We have to be root on the management host to be able to push changes out to the servers. We can't easily keep file ownership and permissions under revision control. This mechanism misses significant parts of the configuration, such as installed base OS packages and which rc scripts are enabled. There's also a lot of scope for improving our initial OS install scripts to reduce the amount that they get out of sync with the rdist configuration.

So we'd like something better. Our colleagues have different system management setups with different problems, and they are also looking for something better.

A couple of my colleagues have looked at Puppet but weren't happy with it. I dislike its basic design. Managed servers pull their configurations from the master, which means you must never screw up a change on the master, and it's harder to test changes - you have to explicitly set up test server profiles. Yes, you can run Puppet in push mode but it's often a bad idea to work against a program's basic architecture. Puppet also has its own security mechanisms, whereas I'd prefer to avoid multiplying channels of trust. Finally, I really hate writing configuration files for programs that write configuration files. It's a waste of brain cells to understand this superfluous abstraction layer: I just want to write the underlying configuration file directly.

Aside: this is why I really really hate autoyast, which uses an XML configuration file to control a program that writes configuration files that control rc scripts that manipulate the underlying configuration files for the programs you actually care about. It takes hours to work out how to make it produce the correct results.

I spent some time working on a program called pstx which was to be a suped-up replacement for rdist. It was going to be an sftp client (so it would require nothing special on the target servers) that had a simple configuration file to specify which directory trees to copy where, including ownerships and permissions (bringing them under revision control and removing the need to be root on the master), and possibly with added bells and whistles like remote diff, and a reverse (pull) mode. I also intended to make it easer to combine collections of files and thereby share common files. Sadly it's only about a third written and likely won't get much further.

One of the conversations this week was about how to reduce the chance of mistakes when rolling out changes, in which we discussed the use of revision control to help with testing and rollback. At the moment Hermes mostly uses CVS and other unixy bits of the CS share a Subversion repository, so the conversation very much assumed a central repository model - which fits in well with a master configuration server. Recently I have also been investigating git more seriously, so I thought it might be part of a solution.

The key things that git provides include a flexible and efficient network protocol for moving files around (flexible in that it can use ssh and http as well as git's native protocol, and efficient in that it's incremental and compressed), it can tell us what differs between a directory tree and the state of the repository, changes can be pushed and pulled, etc. The distributed version control functionality is way better at all this than pstx was ever going to be.

The big missing part is the ability to track ownerships and permissions: git only supports normal development checkouts which should be done using the developer's uid and umask. There are also some low-level problems with the way git performs checkouts: it removes the destination file before writing the updated version in its place. I would like a special deployment command which checks all the changed files out under temporary names, fixes their ownerships and permissions, then renames them into place. The first stage almost exists in the form of git-checkout-index --temp d, though it does weird things with symlinks. It doesn't look like much work to add the other stages.

This could provide some really nice workflows. A configuration change is created on a small topic branch, then pushed to the configuration repositories on all the machines - which changes none of the live configuration. You can then switch a test machine to the new branch with a single git deploy branch, or roll back by re-deploying the master branch. You can find out the current configuration of a machine in summary using git status or git diff to make sure the summary isn't a lie.

One interesting possibility might be to tame the clone-and-hack problem by using a shared repository for all the different classes of machines. This should make it easier to see the differences between each kind of machine and spot spurious ones.

One thing that is very manual in our current setup is hupping daemons to make them load an updated configuration. It might be feasible to use a post-deploy hook to do this, provided the hook script is given enough information that it can avoid unnecessary restarts.

Now I just need to find some time to turn the idea into code...

(11 comments | Leave a comment)

Wednesday 25th March 2009

Ada Lovelace Day

This is a bit late, but I've been inspired to post after reading about the women other people admire. Here are a few of mine.

  • Margo Seltzer: developer of the BSD log-structured filesystem and Berkeley DB. An open source entrepreneur - founder and CTO of Sleepycat Software - who also has a successful academic career.
  • Dina Katabi invented XCP, the eXplicit Congestion control Protocol, which is a brilliant way for routers to make each individual flow adjust to the level of congestion without the need for any per-flow state. I think it was one of the first papers I read about advanced transport protocol research, and I have continued to follow the subject since then.
  • Lisa Dusseault is a WebDAV expert and IETFer, currently one of the Applications Area Directors and therefore a member of the IESG. She was one of the funner people I met at the Paris IETF meeting a few years ago. I like the new ideas she's brought to the IESG, such as monthly updates on her work as Apps AD.
  • And of course [info]rmc28, who got the University Card working properly and now slaves away on the student information system.
(4 comments | Leave a comment)

Sunday 15th March 2009

John Taylor talks about the Corpus clock

I posted a couple of articles (one, two) about the Corpus clock soon after it was unveiled, and I optained a copy of the Chronophage book soon after it became available. The book is a bit lacking in technical detail, so I was pleased to find out that John Taylor would be talking about it for the Cambridge Science Festival and I attended his talk yesterday. The lecture theatre was packed.

The talk was in three parts: a bit of autobiography, then a quick overview of the clock and its development, then questions. The talk revealed some interesting facts which do not appear in the book.

Taylor's story of how he came to make a fortune in kettle thermostats is relevant to the clock not just because that's how he made the money: his horological hero, John Harrison, invented the grasshopper escapement that inspired the Chronophage, and also invented the bimetal on which Taylor's thermostats rely. He explained his snap-action bimetallic disc, which has a C cut out to form a tongue in the middle; this tongue moves further than the centre of a disc without the cut-out does, which loosens the tolerances required to manufacture a thermostat. The calligraphic flourish engraved below his name on the clock's pendulum bob forms the shape of his bimetallic disc. (Sorry, I can't find a picture on line though it's visible in the book.)

ETA: I have found a large picture of the pendulum bob in which you can see the shape of Taylor's bimetallic disc under the date. It's overdrawn by some of the flourish so it's a bit obscure unless you know what you are looking for.

There were a couple of hints at the clock's workings. It is usual for clocks with chiming mechanisms to have a second set of drive springs or weights for the chimes. The Corpus clock has only one spring, so its hourly death-rattle is driven in an unusual way. This is why the clock ticks back and forth while rattling the chain in the coffin. There are two main mechanisms inside the Chronophage itself. It contains a mechanical pseudo-random mechanism to control the blinking and biting. The sting, however, is regular: it slowly rises then jabs down on each quarter hour.

He also mentioned that the clock "has fun" four times a year: on 24th March, which is the date of Harrison's birth and also his death; on 25th November, Taylor's birthday; on new year's eve and new year's day; and on Corpus Christi day, the Church's festival that occurs on the Thurdsday after Trinity Sunday, the 8th Sunday after Easter. The clock is away for maintenance this week, so I hope it'll be back in time to muck around on Tuesday week. The fun seems to involve even more erratic ticking than usual, with more colourful lights.

The clock's spring has to be particularly strong in order to overcome the momentum of the large escape wheel that goes around the outside of the clock. When prototyping the clock this caused a serious problem: the force of the spring is transmitted through the escapement to the pendulum, making it swing higher and higher and eventually breaking the clock. Taylor overcame this problem by adding a regulator, which also serves two other functions: it produces the clock's erratic behaviour that plays tricks with observers, and it listens to the MSF time signal to synchronize the clock every five minutes. (Taylor confirmed to me after the talk that these are all functions of the same mechanism. He also said that the hollow pendulum bob is not in fact "massive and weighty" as the book says, which answered my question about conservation of momentum.) I'm not sure how this is consistent with his assertions that the clock is purely mechanical - would it work if the computer regulator broke? News reports about the clock's teething problems suggest that it's pretty vital. I wonder if the maintenance periods this week and back in January are for software patches rather than mechanical adjustment?

I asked Taylor if he would publish more technical details about the clock, but he thinks that the mystery makes it fun. I disagree :-/

(5 comments | Leave a comment)

Thursday 5th March 2009

Job Ad

My colleagues over in MISD (the University's administrative computing unit, as opposed to the CS which is the University's ISP and IT training department) are looking for another DBA to help look after the Oracle running under exciting services like the financial system and student information system.

http://www.admin.cam.ac.uk/offices/hr/jobs/vacancies.cgi?job=4781
(Leave a comment)

Saturday 21st February 2009

Microblogging

Since April 2002 I have been keeping a link log: a web page where I record links to other pages that are notable in some way, with a brief comment about each one. It's gatewayed to Delicious, LiveJournal, and Twitter. The Twitter feed is new.

I've been on Twitter since March 2007, when it made a big splash at the SXSW festival. I joined to see what the fuss was about but soon stopped using it. None of my social circle were using it much, and there wasn't a sensible way of finding people in the more distant reaches of my social network. For 18 months my last tweet said " Grumbling about the difficulty of finding out people's twitter usernames".

Recently Twitter has taken off in the UK, since a fairly large contingent of journalists and celebrities have started using it and talking about it. (Reminds me in a small way of the rise of the web in 1993/4.) Coincidentally at about the same time I decided to get back into using Facebook and Twitter a bit more, to keep in better touch with my family's activities. (They're on Facebook but I linked my Twitter and Facebook statuses together for convenience.)

The celebrity thing helped me to get more out of Twitter in a significant way. I had thought of it as more like Facebook or LiveJournal: a way of keeping up with the activities of people I already know. But in fact it's more like the wider blogging culture where it's normal to follow someone's writing just because they are interesting. (Yes, LJ blurs into this kind of blogging too.) In fact the even the Twitter guys didn't realise this at first: until June 2007 Twitter sent me email saying "You are foo's newest friend!" but the following month it started sending me email saying "foo is now following you on Twitter!". I now feel free to follow well-known people just because they are interesting, and I found more interesting people by looking through the lists of people they follow.

Part of the Twitter culture is heavy use of short URL services when posting interesting links. I decided that my link log ought to be gatewayed to Twitter to join in the fun, so I embarked on a bit of hacking. Previously it had been just a static HTML page, with a command-line script that I used to insert a new link. There was another script to turn the HTML into an RSS feed for LJ, and some code for uploading the results to chiark. I had a rudimentary click tracker so that I could get some idea of what links other people followed, but it did not shorten URLs. (That meant it was an open redirector until I found out what evil that made it vulnerable to.)

I replaced all the old static HTML and RSS gubbins with a CGI script which can format my link log in HTML or Atom, as well as providing short URLs for all the links. I decidided to host my own short URLs partly to continue tracking what people find interesting, and partly because I have a vanity domain that's just right for this job :-) The command line script for adding new links remained basically the same, apart from new code to create short tags and to post the link to Twitter.

After a couple of weeks of this new regime I decided I needed to be able to save links found when reading Twitter or LiveJournal on my iPod Touch. Its UI makes it easy to email links but not much else, so I set up a gateway from email to my link log. I send links to a specially tagged email address which is filtered to a special mailbox. A script polls this mailbox over IMAP every few minutes, looking for messages that match a number of paranoid checks. From these extracts a URL and a comment which it passes on to the old command-line script. (I still use the latter when I'm on a computer with a keyboard.)

The end result is pretty neat: I can easily save interesting links that I see in places where that was not previously practical. It's probably a good thing I don't have an iPhone because then I'd be able to spod everywhere.

PS. a brief hate: web sites that redirect deep links to the mobile version of their front page. Utter cWAP. They'll get no Google Juice from me. (In fact the iPhone browser is good enough that mobile sites are usually NOT an improvement.)

(Leave a comment)

The joy of lpeg

I've recently started playing with lpeg, a parsing library for Lua. It is based on "Parsing Expression Grammars", which were recently popularized by the prolific Bryan Ford. PEGs have some nice properties: they're suitable for unified parsers that handle both the low-level lexical syntax as well as higher-level hierarchial syntax; they have much simpler operational semantics than traditional extended regexes or context-free grammars; and as well as familiar regex-like and CFG-like operators they have nice features for controlling lookahead and backtracking. PEGs were originally developed alongside a cute algorithm for linear-time parsing which unfortunately also requires space linear in the input size with a fairly large multiplier. Lpeg instead uses a simple parsing machine, implemented somewhat like a bytecode interpreter. Its performance is quite competitive: the long paper says it has similar performance to pcre and glibc's POSIX regex implementation, and the short paper says it has similar performance to lex and yacc.

Lpeg actually consists of two modules. The core lpeg module is written in C and allows you to compose parser objects using operator overloading, building them up from primitives returned from tersely named constructor functions. The resulting syntax is rather eccentric. On top of that is the re module which provides a more normal PEG syntax for parsers, which despite the name of the module are rather different from regular expressions. This module is written in Lua, using an lpeg parser to parse PEGs and construct lpeg parsers from them. The PEG syntax is extended so that you can define "captures". Captures are the really nice thing about lpeg. You can use them like captures in Perl regexes to just extract substrings of the subject, but you can often do better. Lpeg captures are more like the semantic actions that you can attach to rules in parser generators like yacc. So, where in Perl you would do the match then fiddle around with $1, $2, etc, with lpeg the match can incorporate the fiddling in a nice tidy way. (In fact, probably the closest comparison is with Perl 6 rules, but they're not yet practically usable.)

The program I was writing with lpeg was to process some logs. I needed to convert the timestamps from ISO 8601 format into POSIX time_t which implied converting 8 fields from strings to numbers. Rather than having to convert each capture individually, or loop over the captures, I could write a single grammar rule to match a pair of digits and convert it to a number, then refer to that rule elsewhere in the grammar. (In fact Lua will coerce strings to numbers implicitly in most - but not all - circumstances. I happened to be tripped up trying to compare a number with a string, which doesn't coerce.) In the end it's nicest to let the parser drive all the program's activity through its semantic actions.

code )
(Leave a comment)

Thursday 5th February 2009

Impressive display of security clue from the Student Loans Company

The Student Loan Company executive management board minutes from a meeting just over a year ago says the following in section 6, "update on data security processes":

RSJ provided an update on Data Security and advised that information which was being received from external sources confirmed that the transfer of data on removable media devices was now unacceptable. He stated that there was a need to consult with HEI’s as to the method of transferring Attendance Confirmation Reports as SLC now had PGP encryption software available which could replace the previous method of transferring the data via CD’s. He also stated that the PGP software which SLC were using should be checked to ensure that it was on the US Government list of standard encryption as HEI’s are only permitted to use PGP software from this list.

Not shipping media is good. Using end-to-end encryption is good. (Unlike banks which seem to like SMTP over TLS, which provides no additional security for inter-domain communication.) I wonder why the choice of PGP instead of S/MIME - I believe that PGP usually requires an add-on whereas S/MIME is often built in to MUAs. Perhaps they've been nobbled by a vendor.

(6 comments | Leave a comment)

Thursday 29th January 2009

Meanies

In my previous post I said I was using the standard formula for the exponentially weighted moving average. This is not entirely true because the standard formula uses a complementary smoothing constant (α = 1 − a) which allows you to write the formula as an incremental adjustment to the previous value.

It's also possible to express the standard unweighted mean in a similar manner.

I was quite pleased when I found out about this analogy :-)

(Thanks to wikipedia for rendering the maths.)

(3 comments | Leave a comment)

Tuesday 27th January 2009

Luck or judgment?

In 2005 I developed a mathematical model for measuring average event rates which became the core of a new rate limiting feature for Exim. The model has a particularly useful property which I did not expect it to have and which I did not (until recently) fully understand.

The model

The central formula is the standard exponentially weighted moving average:

rnew = (1 − a) ∗ rinst + a ∗ rold

We use this to calculate the new average rate from the instantaneous rate and the result of the previous average calculation. These rates are all measured in events per some configurable time period. The smoothing factor a is a value between 0 and 1 which determines how slowly the model forgets about past behaviour.

We calculate rinst and a from the raw inputs to the formula, which are p, a time period configured by the postmaster, and i, the time interval between the previous event and this event. By dividing them we get

rinst = p / i

In this formula, p determines the per time unit used for measuring rates, e.g. events per hour or events per day.

The exponentially weighted moving average is usually used to smooth samples that are measured at fixed intervals, in which case the smoothing factor, a, is also fixed. In our situation events occur at varying intervals, so the smoothing factor needs to be varied accordingly.

a = e−i / p

In this formula, p determines the smoothing period, i.e. the length of time it takes to forget 63% of past behaviour.

A useful property

When developing the model, I needed to understand how it reacts to changes in the rate of events. It's fairly simple to see that if the rate drops, the average decays exponentially towards the new rate. It's less clear what happens when the rate increases. A particular practical question is what happens if there's a sudden burst of messages? How much of the burst gets through before the average rate passes some configured maximum?

I did some trial computations and I found that when the interval is very small (i.e. the rate is high) the average rate increases by nearly one for each event (the smaller the interval, the closer to one). This means that the maximum rate is also the maximum burst size. How wonderfully simple! The postmaster can configure the model with two numbers, a maximum number of events per a time period, which also directly specifies the units of measurement, the smoothing period, and the maximum burst size.

(The maximum burst size is larger for slower senders, increasing to infinity for those sending below the maximum rate. However you don't have to be much above the maximum for your average to hit the limit within one period.)

This property produces a simple user interface, but I did not understand how it works. It's obvious that when i is small, a ≈ 1, but it is not so clear why (1 − a) ∗ rinst ≈ 1. It seems I landed directly on a mathematical sweet spot without the fumbling around that might have led me to understand the situation better.

Arbitrary choices

I made two choices when developing the model which at the time seemed arbitrary, but which both must be right to get the max rate = burst size property.

Firstly, I decided to re-use the configured period for two purposes: as the smoothing period and as the per time unit of rates. I could instead have made them independently configurable, but this seemed to give no benefits that compensated for the extra complexity. Alternatively, I could have used a fixed value instead of p in the formula for rinst, but that seemed liable to be confusing or awkward, and to require more mental arithmetic to configure.

Secondly, I decided to use e as the base of the exponent. I could have used 2, in which case the smoothing period would have been a half life, or 10, so that 90% of past behaviour would be forgotten after one period. There seemed to be no clear way to choose, so I split the difference and went with e on the basis of mathematical superstition and because exp() has fewer arguments than pow().

How it works

Looking back over my old notes this week, I had a revelation that the max rate = burst size property comes from the fact that the gradient of ex is 1 when x is 0. To show why this is so, I first need to define a bit of notation:

y ≡ f(x) ≡ ex

δy ≡ f(x + δx) − f(x)

δx ≡ −i / p

This allows us to say, when x is zero and δx is small,

(1 − a) ∗ rinst
= (1 − e−i/p) ∗ p/i
= (1 − eδx) / −δx
= (eδx − 1) / δx
= (e0 + δx − e0) / δx
= ( f(0 + δx) − f(0) ) / δx
= δy / δx
dy / dx
= ex
= 1

Sweet.

(Leave a comment)

Wednesday 21st January 2009

The MacBook saga

Three months ago, Apple released their new "unibody" aluminium MacBook laptop, and just over a month before that they released the second-generation iPod touch. I was in the market for a new Mac laptop, and at some point I planned to upgrade my iPod when a 64GB Touch or iPhone became available. However when I found out that Apple were offering £95 "back to school" rebates to those who bought an iPod Touch and a MacBook, I couldn't wait any longer - this was the 30th October and the offer ceased at the end of the month.

Unfortunately I made the order with a rather crusty old version of Firefox which handled Apple's customization JavaScript incorrectly - it discarded all the options I chose, and because I wasn't familiar with the order process I didn't know that the final description screen should have included more details than it did. This wasn't a problem for the bits and pieces that I could also buy at the local greengrocer Apple Store, but one important option I chose was the US keyboard layout. I wanted it partly because I normally use a Happy Hacking keyboard which has a similar layout, and partly because the ISO layout has thin return key that is awkward to press as well as being ugly.

When I discovered this upon unboxing the new Shiny! my heart sank. You can't change the keyboard on the current MacBooks without replacing the entire machine. But I phoned Apple to see what I could do about it, and I was pleasantly surprised to find out that they would replace the machine with the correct model for free, no questions asked.

I discovered Apple's FAIL when the second MacBook arrived. I had been sent the "International English" model, which is identical to the "British" layout apart from a couple of currency symbols. They also sent me a continental power plug instead of a BS 1363 plug. When I phoned Apple they were suitably contrite, even offering me £70 compensation for the error!

Until this point (two weeks after the original order) I was very happy with Apple's support - no dumb scripts, reasonably competent staff. Sadly this didn't last. While the phone staff continued to be helpful and informative about what was going on, I had to keep phoning to get an update because the supervisor who was supposed to be dealing with the problem was too busy doing other stuff. The cause seemed to be that the department which handled replacements didn't believe that US keyboards were available in the UK (despite what the online store said) and the support people were incapable of fixing or working around this failure.

What was worse was that it took another three weeks to come to this conclusion. (I was away in France for one of them, during which I didn't chase Apple so nothing happened.) I couldn't use my new iPod because I didn't want to have the pain of re-pairing and re-populating it with music etc. This all made me extremely cross.

In the end I had to return the second MacBook for a refund and buy a third one with the correct configuration, which finally arrived six weeks after the original order. Fortunately this didn't affect my eligibility for the rebate - in fact they had already sent me the £95, though I hadn't sent in the proof-of-purchase paperwork they said they needed. (Um, I didn't notice the extra money until this month!) Unfortunately they cancelled the compensation they promised me! They fixed that after I made an irate phone call, but sheesh, I still get angry when remembering it.

But I suppose it's all OK in the end. The MacBook and iPod Touch are things of astounding beauty and utility. The iPod utterly blows away the Nokia 770 I previously used for spodding, and the MacBook replaces both my elderly PC laptop and my Mac Mini (though the latter will probably become a backup server). It's really nice to be able to hack away on the comfy sofa, and I'm playing choonz via the Airport Express a lot more than I did.

(Leave a comment)

Friday 16th January 2009

RJ45 is too fat

One notable feature of the ports on the new MacBook is that the RJ45 Ethernet socket only just fits - it wouldn't if the machine were much thinner. In fact the MacBook Air notoriously doesn't have any wired ethernet connection at all, relying on IEEE 802.11n WiFi for connectivity.

Of course other manufacturers are making expensive thin laptops. The Dell Adamo neatly solves the RJ45 thickness problem by putting the port behind the display hinge, where the machine is as thick as the body plus the display.

The Voodoo Envy's solution is to put the ethernet socket in the power supply, which acts as a WiFi hot spot which the laptop uses for connectivity. This is cute but it means you only get 54Mb/s because the base station doesn't have 802.11n support.

A cheaper and faster alternative would be to pass a gigabit Ethernet connection through from an RJ45 on the power supply to a combined power network port on the laptop. The connector could be much thinner than RJ45 without sacrificing compatibility.

ETA: The other advantage this design would have is that the combined power and ethernet connector can use the magsafe idea, so you don't have the problem of one safe and one unsafe connector when tethered.

(10 comments | Leave a comment)

Monday 1st December 2008

Weird network bug

I just spent a couple of hours debugging a strange network problem, which I haven't seen before.

The initial description was that email relayed from one departmental email server to another via my email server sometimes failed with a timeout error. This only occurred with messages with multiple recipients.

My initial suspicion was that it might be related to an old (now fixed) Exim callout verification bug, which had caused us some problems in the past. If a callout's target server was slow, and there were many recipients needing callout verification, and the client used pipelining, the total time for the callout delays could add up to more than the client's timeout, so it wouldn't see Exim's pipelined replies. Exim now flushes its replies before doing a callout to avoid this problem.

However, some testing showed that the relevant servers were quite swift, which cast some doubt on this diagnosis.

The problem report included some good details which were unfortunately too old for me to be able to correlate with our logs. I had a look to see if there was anything more recent that I could examine. It turned out that there were a couple of messages on our queues that had been delayed by this problem. I ran a test delivery of one of them and it did in fact time out between us sending the message envelope and them sending the reply. Consistent with my guess, but not consistent with other testing.

I tried a delivery with full debugging from Exim, but there was no enlightenment. I tried to send a copy of the problem message's envelope manually. This worked. Um, WTF? What was the difference between what Exim did and what I did?

I ran tcpdump to analyze the two connections. It turned out that Exim was sending the whole envelope (MAIL, RCPT, RCPT, ..., DATA) in one packet, whereas my cut-and-paste into telnet split the envelope into a packet for the MAIL command and one for the rest. So I created a file containing the envelope and used a little shell script to drive telnet. Bingo. I had reproduced the problem.

At this point I thought it might be an MTU-related problem, e.g. blocking ICMP "Fragmentation Needed" messages. I tried pinning down the packet size at which the problem started occurring, with a guess of something related to 512 bytes. After a couple of goes I noticed that the problem message had an envelope of 513 bytes, and an envelope of 512 bytes worked OK.

Then I tried a larger envelope, and - boggle - it worked. 512 or less worked, 513 failed, 514 or greater worked. This also explained why the problem affected message envelopes but not message data. The usual symptom of MTU firewall problems is timeouts at the message data stage, not during the envelope, and in this case message data was getting through fine. (The overhead of message headers makes it very unlikely that a message will have as little as 513 bytes of data.)

A couple of friends suggested testing 1025 as well, and it demonstrated similar but weirder problems. Again 1024 worked, 1025 failed, and 1026 worked. However in the failure case, the timeout occurred after I received the destination's replies to the first half of the envelope - enough commands to fit in 512 bytes. A closer look at both the 513 and 1025 cases revealed that I was getting TCP ACKs for all the data I sent, but something was going wrong after that.

I guess the problem is a firewall that's doing some kind of TCP interception and re-segmentation, and getting it wrong. The ACKs would therefore have been generated by the firewall and not by the actual destination. Someone needs to be given a good whipping with a copy of end-to-end arguments in system design before having their firewall programming licence revoked.
(10 comments | Leave a comment)

Tuesday 18th November 2008

Licence revoked

I'm happy to say that the problem I ranted about in my previous entry has been fixed. The general guidelines for bulk email have been restored, and the new guidelines for intraspam large internal mailing lists have been published on their own page.

(5 comments | Leave a comment)

Thursday 6th November 2008

Licence to spam

Over a year ago we had some discussions inside the Computing Service about better provision for large mailing lists within the University. At the moment we rely too much on paper spam, and electronic spam is distributed to staff by departmental administrators who manually forward irrelevant utterances from the Old Schools. My hope was that officially sanctioned mailing lists for internal communications would improve this situation because, as well as being more efficient, there would be better moderation and recipients would be able to unsubscribe. We approved a document which I thought was quite good, and which I expected would live alongside our existing, very general, bulk email guidelines. This then went to the IT Syndicate for approval.

Unfortunately the IT Syndicate was disbanded at about this time to be replaced by an Information Services and Strategy Syndicate which has a broader remit. The large list policy got dropped for several months as the committee rebooted. When the work item was picked up again it somehow got corrupted. It was rewritten with absolutely no input from us, and it replaced the bulk email policy instead of being adopted alongside it. As a result there is now almost no policy against spam intentionally sent by University members to recipients inside and outside the University. I am extremely annoyed.

The new policy is on the ISSS web site and I'll put a copy of the old policy behind a cut to preserve it after Google's cache expires.

Old bulk email policy )
(6 comments | Leave a comment)

Wednesday 5th November 2008

REST FAIL

This afternoon my colleague Andy gave a talk about Microsoft Exchange 2007, which I attended since I need to know what the competition has to offer and I have to provide support for its SMTP interfaces. One thing he mentioned which perked up my interest was "autodiscovery" for Outlook 2007. The unnecessary difficulty of configuring email software has irritated me for a long time, so after the talk I immediately went to find out more. It turns out to be a mixture of good stuff and utter FAIL.

The documentation describes three ways that Outlook 2007 can configure user accounts automatically - server names, security requirements, etc. If you are logged into a Windows Domain then it first tries querying the Active Directory. If that succeeds then it can find everything out by itself. Nice. If not, it falls back to an open protocol, which any standards-based mail server can implement, that configures the server settings automatically given an email address. More about this below.

If server-supported autodiscovery doesn't work, Outlook tries to guess the settings by attempting various combinations of host name, port number, TLS or not, POP or IMAP, etc. and stopping when it finds something that works. I think this is a great idea, so it's a damn shame that it prefers to use the lame POP not the studly IMAP if both are available. By itself that is enough to make it worth our while to implement the autodiscover protocol; but because our primary mail domain is cam.ac.uk but our service name is hermes.cam.ac.uk, Outlook's guesses will fail. This has the advantage of preventing users walking blindly into bad configurations, but the disadvantage that they're less likely to be able to configure it at all.

And so to the autodiscovery protocol itself. How can something so simple be fucked up in so many ways? It looks to me like it was bodged together by a clueless intern over two or three summer breaks, each time getting worse as the requirements evolved. The result is a case study in what RESTful protocol design is not.

The essence is simple: the client sends a request containing the user's email address, and the server sends back an XML document containing the configuration settings. The request is sent using HTTP-over-SSL to a URL derived from the domain part of the user's email address. (In fact it tries https://domain/autodiscover/autodiscover.xml first, then tries https://autodiscover.domain/autodiscover/autodiscover.xml.) So far, so good.

The first mistake is that the request is a POST with the email address contained in another XML document in the request body. It would have been more RESTful to use a GET with the email address encoded in the URL. This mistake turns into FAIL when you realise that most email services have the same settings for all users, in which case autodiscover.xml can be a static file, but many web servers do not allow POST requests to static files. If they had done it right, using a GET request with the email address in the query string would have Just Worked for both CGI scripts and static files.

It gets worse when they bodge around this mistake. A POST to a file commonly results in a "405 Method Not Allowed" error. Microsoft specify that if this is the case for your web server, you should configure it to send the autodiscover.xml file in the error reply, as if the request was successful. A foul perversion of the protocol.

It gets worse when they add support for virtual domains. The requirement seems to be that the service provider wants to host the autodiscover.xml script centrally, and doesn't want to fork out for an SSL certificate for every virtual domain. So if both of the https URLs fail, Outlook tries a GET request to the autodiscover.domain URL, which can redirect to the central autodiscover.xml. However that's not secure enough so there must also be an _autodiscover._tcp.domain SRV record in the DNS pointing to the same central web server. However that's still not secure enough so Outlook also admonishes the user to click OK in a popup dialogue box.

It gets worse when you look at the autodiscover.xml "schema". It isn't a schema, it's a commented skeleton fill-in-the-blanks autodiscover.xml file. That's just everyday lameness; the real FAIL is to be found in the various elements described therein. The redirect I mentioned above isn't an HTTP redirect, it's an autodiscover.xml document containing various redirection settings. There are also settings for controlling the expiry time of the document, because of course configuring these in a .htaccess file is too hard.

Ugh. I think that just about exhausts the rantage this stuff has triggered. I am most of the way to implementing it; I just need to get the necessary SSL certificates. It turns out to be easier than I expected because newish Apache does not get upset by POST requests aimed at static files: it just throws away the request body and returns the file with a "200 OK". No need to fart around with ErrorDocument 405 autodiscover.xml. With any luck it'll make Outlook trivial to get to work with Hermes.

(9 comments | Leave a comment)

Wednesday 8th October 2008

LOLauditors

A colleague tells me that our auditors are quizzing our system administrators about our backup schedules. A wag asked them if they were interested in restores too. The auditors replied, "No, just backups."

PS. I bought a copy of the book about the Chronophage from Corpus Christi plodge for £8. It is a glossy A4 booklet of 44 pages with large type and lots of colourful pictures.

(2 comments | Leave a comment)

Tuesday 30th September 2008

More on the Corpus Christi Chronophage clock

  • I said the clock has three circles of 60 slits. In fact it has two circles of 60 and one of 48 - the hour circle has a slit for every quarter hour.
  • A website about the clock is going live soon.
  • A book about the clock is being published, and it will be available from the Corpus Christi porters' lodge amongst other places.
(Leave a comment)

Monday 22nd September 2008

The Corpus Christi Chronophage Clock

I have been admiring the new clock on the corner of Trumpington St and Bene't St. Instead of having hands, the clock has three concentric rings hidden behind its face. The outer ring marks seconds, the middle ring marks minutes, and the inner ring marks hours. Each ring is driven by the next outer ring via a gearing mechanism that makes the rings tick from one mark to the next. It is fairly normal for the second hand's motion to be quantized in this manner, but more unusual for the minute and hour hands to tick like this too.

Each ring sits in front of a circle of 60 blue LEDs, and the face has three circles of 60 slots which would allow the LEDs to shine through if the rings weren't in the way. The position of each disk is indicated by a slot which allows one LED to shine through the disk and the face to be seen by the viewers.

In fact, each ring has 61 slots, one of which is usually aligned with a slot in the face to indicate the time, and 60 of which are normally out of alignment. When a ring ticks from one position to the next, its 60 normally-unaligned slots line up with the face in turn before the normally-aligned slot lines up with the next slot in the face. This has the effect of making it look like the ring has flown around 366° whereas it only moved by 6°

I have made a little animation which shows how this works. It ticks once every 4 seconds and it only has 12 slots on the face and 13 on the disk, so that you can clearly see the mechanism behind the effect. I have drawn the face's slots in the outer circle and the ring's slots in the inner circle. It uses the new-fangled <canvas> HTML element, so you may need to upgrade your browser to view it.

There is a video of the clock where some of its features are described by the designer, John Taylor, who made his fortune from his invention of the kettle thermostat.

(13 comments | Leave a comment)

Monday 15th September 2008

The date of the count

Turning a linear day number into a date is about twice as difficult as the reverse calculation, both from the computer's point of view and the programmer's. The overview of the function is to (1a) calculate the year (1b) subtract the day count for previous years (2a) calculate the month (2b) subtract the day count for previous months. These pairs of operations are essentially calculating quotients and remainders to produce successive components of the dates.

The trickiest part is dealing with the Gregorian correction. You can work out the Julian year in which a day number occurs using d*4/1461 which is a straight-forward inverse of the corresponding part of the reverse calculation (though you need to fiddle with the epoch to make the leap years fall in the right place). The equivalent for the Gregorian calendar, dividing by the average length of a year using integer arithmetic, is d*400/146097. However this produces leap year intervals of 4 or 5 years, not 4 or 8 years. Fortunately the book describes a neat trick which allows us to use this formula to estimate the day's year, then adjust it to fix any error.

Errors are detected using the forwards conversion from years to day numbers, y*1461/4 - y/100 + y/400. This formula produces the number of days from the start of 1 A.D. up to the end of year y inclusive, according to the conventional numbering. My code internally adjusts the year to start with March instead of January; the formula works for adjustments like this so long as the numbering of February doesn't change. (Note that the formula in my earlier post adjusts the year number so that it refers to the end of the previous February, to which is added a day count corresponding to the fraction of the current year that has passed.)

So what we do is calculate d*400/146097 and add one to estimate the year according to the conventional numbering. The result is always an underestimate (i.e. the formula guesses that years start up to two days late) so we may in fact get the previous year's number. We calculate the count of days up to the end of the estimated year and compare with the original day number. The original day number should normally be before the calculated end of the year. If the estimate was wrong, we need to increment the year number again.

We then need to obtain the remainder of days within this year by subtracting the day count up to the end of the previous year. To get this count we subtract one from the year and do the forward calculation again. If we combine the decrement with the estimate correction, the resulting code looks like this, where dn is day number and dy is day of year.

	y = dn*400/146097 + 1;
	y += (dn >= y*1461/4 - y/100 + y/400) - 1;
	dy = dn - y*1461/4 + y/100 - y/400;

The month calculation is more awkward than I would like because of a mismatch between the forward m*153/5 and reverse dy*5/153 rounding patterns, as I've illustrated below. In each column I've italicised the five-month cycle that corresponds to March - July.

month
number
forward
length
reverse
length
03031
13131
23030
33131
43130
53031
63131
73030
83131

The result is that the code needs a series of different administrative fiddles to work out the month and the day of the month. First add one month of days to line up with the italic part of the reverse pattern, then calculate the month giving March = 1 through to February = 12. Then add three months to line up with the italic part of the forward pattern, March = 4 to February = 15, and calculate the number of days before the month. Subtract this from the day of the year, then add four months of days to compensate for the fiddles, to finally get the day of the month.

	m = (dy + 31)*5/153 + 3;
	d = dy - m*153/5 + 123;

The last step is to restore the year to its usual January - December alignment. This, at least, is a simple inverse of the corresponding code in the forwards direction. There's also a preparatory adjustment of 10 months at the start of the code to set up the March - February alignment in the first place. After a few tweaks, the final code is:

void F(int d, int *py, int *pm, int *pd) {
	int y, m;
	d += 305;
	y = d*400/146097 + 1;
	y -= y*1461/4 - y/100 + y/400 > d;
	d -= y*1461/4 - y/100 + y/400 - 31;
	m = d*5/153 + 3;
	d -= m*153/5 - 92;
	if (m < 14) m -= 1;
	else m -= 13, y += 1;
	*py = y, *pm = m, *pd = d;
}
  • 305 is 10 months (March - December) minus the epoch (1st January 1 A.D. is day 1)
  • 400 is the number of years in a Gregorian cycle and 146097 is the number of days
  • 1461 is the number of days in 4 years, usually
  • 153 is the number of days in 5 consecutive months, except February
  • 31 is one month of days to align the reverse month calculation, from 31*5/153 == 1
  • 3 aligns the forward month calculation
  • 92 is three months of days to compensate for the three months added in the previous line, from 3*153/5 plus an extra one because we count days within a month from 1 not 0
  • actually 92 = 4*153/5 - 31 + 1 where the extra 31 is the same 31 we added previously

Edited to add...

It's possible to make the forward and reverse patterns line up by adjusting the 153/5 factors, which saves a couple of additions. The smallest denominator for which this works is 17, which produces the following code. Before the final fix-up, months are numbered Mar=1 - Feb=12.

void G(int d, int *py, int *pm, int *pd) {
	int y, m;
	d += 305;
	y = d*400/146097 + 1;
	y -= y*1461/4 - y/100 + y/400 > d;
	d -= y*1461/4 - y/100 + y/400 - 31;
	m = d*17/520;
	d -= m*520/17;
	if (m < 11) m += 2;
	else m -= 10, y += 1;
	*py = y, *pm = m, *pd = d;
}
(Leave a comment)

Symbolic links

What happens if you try to create a symlink to a zero-length target, as in ln -s "" foo?

Linux: ENOENT.

Mac OS X 10.4: EINVAL.

FreeBSD: works, and any attempt to resolve the symlink returns ENOENT.

Solaris: works, and the resulting symlink behaves the same as ln -s . foo

(10 comments | Leave a comment)

Wednesday 10th September 2008

Counting the days

Nearly 10 years ago I developed a neat function for converting a Gregorian date into a count of days since some epoch. There was nothing particularly new about it; I just tweaked the numbers until the formula was as simple as possible. I've re-used this code a fair number of times since then without re-examining it. This evening I worked out that it can be distilled even more.

The function adds together some expressions that count the number of days indicated by the three components of the date, plus a constant to set the epoch.

The number of normal days before the year y (full four digit year) is just y*365.

The number of leap days up to and including the year y is y/4-y/100+y/400. This is the usual "add a leap day in every fourth year, but not in every hundredth year, but do in every 400th year" rule. I'm using integer division that discards any remainder, i.e. rounds down.

The previous two paragraphs are inconsistent about including or excluding the current year. This is because leap days occur inside a leap year, not at the end, so their contribution to the day count does not depend solely on the year. We can fix this by re-assigning January and February to the previous year using a small administrative fiddle, as follows (where m counts months starting from 1). Then a leap day falls just before the start of years that are multiples of four (etc.) and the expression in the previous paragraph magically does the right thing. The fiddle also requires an adjustment to the epoch constant.

        if (m > 2) m -= 2; else m += 10, y -= 1;

Months are the part that I spent most time fiddling with. Section 1.12 of Calendrical Calculations develops various equations for calendars (such as the Julian calendar) that spread their leap years evenly, like Bresenham's line drawing algorithm spreads out raster steps. The equations also work for month lengths in the Julian and Gregorian calendars, if you treat 30-day months as normal years and 31 day months as leap years, and ignore February. We want to calculate the number of days in a year before the start of a given month, which is calculated by equation 1.73:

        (L*y - L + D*L % C) / C + (y - 1) * N

where L is the number of leap years in a cycle, C is the total number of years in a cycle, D is the offset of the start of the cycle, N is the number of days in a normal year, and years are actually months. Yuck. It turns out that if we perform the right administrative fiddling, we can arrange that D=0, and we can substitute m=y-1. This gives us the greatly simplified:

        m*L/C + m*N = m * (N*C+L) / C

For Julian and Gregorian months, N=30, C=12, and L=7, giving m*367/12. This pretends that February has 30 days, but the pretence doesn't matter because we have moved February to the end of the year so we never count days after February 28th or 29th . The complete function is then as follows (with the epoch set so that 1st January, 1 A.D. is day 1).

    int A(int y, int m, int d) {
        if (m > 2) m -= 2; else m += 10, y -= 1;
        return y*365 + y/4 - y/100 + y/400 + m*367/12 + d - 336;
    }

In fact you don't have to treat a whole year as a cycle for the purpose of the month calculation. The most basic pattern is a five month cycle containing three long months, as in March-July and August-December. (January and February are the start of a third truncated cycle.) However when I tried to make m*153/5 work in 1999 I could not eliminate the D offset to keep the equations simple. My problem was that I only allowed myself to count months starting from zero or one, and I was flailing around without the sage guidance of messrs. Dershowitz and Reingold. (I didn't use their equation 1.73 when I was developing this code, but it's a good example of what I was trying to avoid.)

It turns out that we can play around with the administrative fiddle to adjust D so that the cycle falls in the right place, so long as we compensate by adjusting the epoch constant. For the Julian and Gregorian five-month cycle D=4, i.e. we need to shift March (the start of the cycle) to month number 4, which gives us the following code.

    int B(int y, int m, int d) {
        if (m > 2) m += 1; else m += 13, y -= 1;
        return y*365 + y/4 - y/100 + y/400 + m*153/5 + d - 428;
    }

I mentioned above that equation 1.73 works for Julian leap years, so we can use it for years as well as months. This combines first two terms of the return expression leaving the Gregorian correction separate. No additional fiddling is necessary. We can't merge in the Gregorian correction as well, because evenly spreading the leap years gives intervals of 4 or 5, not 4 or 8.

    int C(int y, int m, int d) {
        if (m > 2) m += 1; else m += 13, y -= 1;
        return y*1461/4 - y/100 + y/400 + m*153/5 + d - 428;
    }

There's a related function for calculating the number of days in a month. It uses the remainders from the divisions to find the length of the current month, instead of using the results to count the preceding days. My old code re-used the same administrative fiddle:

    int D(int y, int m) {
        if (m > 2) m -= 2; else m += 10;
        return m*367%12 > 4 ? 31
             : m != 12     ? 30
             : y % 4      ? 28
             : y % 100   ? 29
             : y % 400  ? 28
                       : 29;
    }

In fact you don't need a fiddle in this case, because there's no need to move February to the end of the year. You can just pick values of C and L which make the pattern fall in the right place.

    int E(int y, int m) {
        return m*275%9 > 3 ? 31
             : m != 2     ? 30
             : y % 4     ? 28
             : y % 100  ? 29
             : y % 400 ? 28
                      : 29;
    }
(9 comments | Leave a comment)

Thursday 4th September 2008

Faster LIAR (Life in a register)

This week I have been discussing the Game of Life with the amazingly enthusiastic Brice Due, after he commented on my original post about LIAR. He put me in touch with Tom Rokicki who, as well as writing probably the best implementation of Life, recently featured in New Scientist for his work on God's solution to Rubik's Cube.

Tom pointed me at a better bit-twiddling implementation of Life which comes in at 26 operations compared to my code's 34 - a significant improvement. However his code is ultra-compressed and really hard to understand. So I wrote a comprehensible version. This code has some rather clever features.

	left = bmp << 1;
	right = bmp >> 1;
	side1 = left ^ right;
	side2 = left & right;
	mid1 = side1 ^ bmp;
	mid2 = side1 & bmp;
	mid2 = side2 | mid2;

In the first stage he uses a normal full adder to do the horizontal 3-to-2 compression which counts the number of cells in each row (bits mid1 and mid2). The first two ops of the full adder form a half adder which he uses to count the cells to either side of the middle cell (bits side1 and side2). Thus he gets four useful outputs from just five ops!

	upper1 = mid1 << 8;
	lower1 = mid1 >> 8;
	upper2 = mid2 << 8;
	lower2 = mid2 >> 8;

The side bits are kept in the middle row, whereas the mid bits are shifted to become the sums for the upper and lower rows.

#define REDUCE(g,f,a,b,c) do {	\
		uint64_t d, e;	\
		d = a ^ b;	\
		e = b ^ c;	\
		f = c ^ d;	\
		g = d | e;	\
	} while(0)

	REDUCE(sum12, sum13, upper1, side1, lower1);
	REDUCE(sum24, sum26, upper2, side2, lower2);

The second stage is to compress the three rows into separate sums of the low bits and the high bits. Tom does not use a normal adder to do this. It turns out that there are 24 adder-like boolean functions of three arguments, which produce two values that together encode the number of input bits. Only one of them requires four operations - the others require five or more. The encoding of the result is a bit strange:

countresult
0f=0 g=0
1f=1 g=1
2f=0 g=1
3f=1 g=0

As well as saving two ops compared to using a full adder, this function makes the final twiddling particularly brief.

	tmp = sum12 ^ sum13;
	bmp = (bmp | sum13) & (tmp ^ sum24) & (tmp ^ sum26);
	return(bmp & 0x007E7E7E7E7E7E00);

I wondered if it's possible to use the 4-op adder-alike in the first stage too. The problem with adding together all nine cells is overflowing the three bit result. This is OK when using a full adder because the overflow happens to confuse a very large count (therefore a dead result) with a very small count (again a dead result) so it makes no difference. The modified adder confuses a middling small count (live result) with a middling big count (dead result) so it fails. The dual-purpose half+full-adder trick neatly avoids overflow problems.

Despite successfully unpicking this code I still find it rather unintuitive, especially the final twiddle. I am in awe of people who can invent such cunning tricks.

(Leave a comment)

Thursday 28th August 2008

More postcodes

Thanks for all the interesting comments on my previous post.

The reason I'm investigating this is to work around false positives caused by SpamAssassin's obfuscation rules. These are intended to match deliberate misspellings of commonly spammed goods such as Viagra. The specific instance that caused the bug report was a Reading postcode being treated as an obfuscated Rolex.

Therefore I'm not particularly worried about missing out obscure special cases like GIR 0AA and the overseas territories AAAA 1ZZ. However it might be worth tightening up the outcode regex, based on the list of UK postcode areas, to reduce the chance of matching a bogus postcode.

Also, the Post Office's postcode FAQ mentions that only London uses the ANA and AANA outcode formats. (In fact it's only the E, EC, SW, W, WC areas.) I managed to find a list of postcode districts which includes these outcodes (Wikipedia omits them) and it shows that the third position rule is wrong: it says M does not appear there but there is a poscode district London W1M. Rule Three also allows A and E which are not in fact used.

qr{\b
  ([BGLMNS][1-9][0-9]?
  |[A-PR-UWYZ][A-HK-Y][1-9]?[0-9]
  |([EW]C?|NW?|S[EW])[1-9][0-9A-HJKMNPR-Y]
  )[ ]{0,2}
  ([0-9][ABD-HJLNP-UW-Z]{2})
\b}x
(3 comments | Leave a comment)
Previous 50
Powered by LiveJournal.com