dotat

Tony Finch

Wednesday 9th May 2012

Transparently auditable automatic vote counting

Electronic voting is impossible to implement in a secure manner, by which I mean there must be a way to audit the result after the fact in a way that is independent of the electronics. If you cannot perform an independent audit then hidden or compromised mechanisms in the electronics can lie without detection.

This audit requirement essentially means you need to make a paper record of each vote that is verified by the voter. That is, you need ballot papers.

Given you need ballot papers, you might as well keep the traditional method of voting (no voting machines) and only use automation for counting. Ideally the process of counting should be transparent, so that observers can see it proceeding correctly. It is not good enough for the ballot papers to disappear into a black box and a result pop out the other end.

It is possible to do this with mechanical collaters and counters - the kind of device that was used for data processing on punched cards. That way you can see the papers being split into a stack for each candidate, just as happens in the manual process. Observers can riffle through the stacks to verify correctness.

To count votes, why not weigh the stacks with precision scales?

So I wonder if you could make a ballot collating machine cheap enough and reliable enough that it is more cost-effective than manual counting. Could you perhaps use off-the-shelf printer/scanner/copier equipment?

(24 comments | Leave a comment)

Wednesday 2nd May 2012

A couple of interesting networking papers

I read a couple of interesting papers last night, both from last month's USENIX symposium on networked systems design and implementation.

The first (linked to by Wes Felter) is "Jellyfish: Networking Data Centers Randomly" which very thoroughly shows that a random topology outperforms a structured topology, and is more easy to grow incrementally. The challenge is for higher layers to make effective use of a Jellyfish network; key technologies probably include OpenFlow at layer 3 and Multipath TCP at layer 4.

The second is "How Hard Can It Be? Designing and Implementing a Deployable Multipath TCP". This is about the pragmatics of MPTCP - previous papers have described the principles behind its load balancing and congestion control. A few aspects of the paper stood out. Firstly, they gathered a lot of data on the real-world behaviour of TCP-mangling middleboxes in order to work out what changes MPTCP could get away with, and what mechanisms it needed for falling back to traditional TCP. They very frequently found that port 80 was much more likely to be mangled than other ports. They also did a lot of work to mitigate the problems caused by buffer bloat (though they don't use that term) to the extent that an MPTCP connection over a combination of WiFi and 3G has lower latency than a traditional TCP connection over WiFi!

(By coincidence both of these papers have a vague Cambridge connection. The Jellyfish paper cites Béla Bollobás's results on random graphs. The Multipath TCP group at UCL included Damon Wischik who was at Cambridge when I was; his brother Lucian was a fellow ucam.chatterer.)

(Leave a comment)

Wednesday 18th April 2012

Staff seminar on version control systems

This morning Jon Warbrick and I gave a pair of talks to our colleagues about version control systems. Jon did an introduction and overview of the topic aimed at people who aren't particularly technical. I followed with descriptions of the particular version control systems we are using in the Computing Service, organized historically. You can see my slides and my notes.

(2 comments | Leave a comment)

Friday 13th April 2012

Political engagement

I have been perplexed recently by the way my political engagement responds to external stimuli.

Several months ago Rachel persuaded me to go to the Lib Dem spring conference, on the basis of her enjoyment of the autumn conference. I said OK, expecting I could treat it like a fan convention and spend the weekend shooting shit with like-minded people over a few beers. Unfortunately it coincided with the peak of the argument over the NHS bill, probably this government's most important*controversial bit of legislation so far.

I ended up spending the conference utterly disengaged from any meaningful discussion, and not really understanding why. I care deeply about the NHS! I have strong opinions about it! But I felt unable to get involved.

So I have been surprised by my reaction to the CCDP government snooping proposal: much more engaged and active in the arguments against it. Why?

The most obvious thing is that I know more about the technicalities. I can join in wonkish arguments in a way I couldn't over the NHS.

The second thing is that I am surrounded by like-minded people who also want to defend civil liberties and kill CCDP. My MP understands and cares about these kinds of things and has influence over them, being on the home affairs select committee.

So yes, it's much easier for me to get involved and feel like I'm helping to improve the situation. But it doesn't explain the feeling of futility I had about the NHS.

I have been struck by the force of the moral arguments around government snooping. This is quite well illustrated by the Swedish wiretapping law. Their security services made the same argument as ours: "we have permission to invade some people's privacy within this limited technical context, so we should have the ability to invade everyone's privacy without restriction". Discussing the technicalities of what snooping is feasible or sensible is wrong because it implicitly acknowledges that it is morally acceptable to violate privacy and legal process in this way, though we'll only do it a little bit, honest.

The advantage of arguing from the basis of firm political principles came across to me most effectively from one of the older local activists, I think because she is the least like a cipherpunk of anyone I have encountered in this context.

This is all very jolly when you have a shared political framework, and everyone agrees it is reasonable to argue that an individual's rights to privacy and freedom from arbitrary forced search outweigh the MI5 security blanket fear of fear. But what if you don't share such a framework?

Looking back, it seems to me that the arguments over the NHS bill were wonking about the technicalities of how, or how much, or where to outsource health provision. A lot of discussion about improving outcomes, but there was no chance of credibly standing up to say, we should take a public-sector approach to fixing the weaknesses in the NHS.

So I think useful political engagement comes when you agree with the general consensus, and know about the details, and want to help fine-tune; or when you feel the consensus is about to head off-course but can be pushed back into line. That is the realm of mainstream politics.

But if you think the consensus is wrong (drug prohibition! market speculation! crisis austerity!) you have to campaign from the side-lines until the consensus shifts. Fine if some group exists to pursue such a campaign. However, political disengagement, or disaffection, or even extremism come when people disagree with the consensus at a fundamental level and there is no party up there arguing their point of view and representing their opinion.

(1 comment | Leave a comment)

Wednesday 4th April 2012

UK communications monitoring / advance notification to Ofcom

If you follow my link log (also [info]dotaturls) you'll see several links recently about the Home Office's "communications capabilities development programme". This is the rebranding of the "interception modernisation programme" which started under the previous government. The key difference between them is replacing a centralized database of communications metadata with a distributed one, hosted by ISPs.

There are technical questions about what data can reasonably be obtained, especially when the media keep using Skype as an example of what the government want to track, though it isn't technically feasible for ISPs to do so. Much communication is mediated by American businesses (Google, Microsoft, Twitter, Facebook) over encrypted connections so access to those logs would require co-operation from the US. (I suppose that's what the "special relationship" is for.) Despite these technicalities, there are also oddities in the legal coverage, at least in the current law.

For example, the EU communications data retention directive requires public communications providers to retain logs for a year, in case law enforcement agencies want access to them. This does not apply to private services, such as businesses or universities. At Cambridge the central email services keep logs for four weeks. We occasionally (less than once a year) get a request to take a snapshot of a particular account which we retain on DVD in a safe until a warrant requires us to hand it over.

Heading off on a tangent, I was reading bits of legislation about obligations on non-public communications providers when I found section 33 of the communications act 2003:

33 Advance notification to OFCOM
  (1) A person shall not—
    (a) provide a designated electronic communications network,
    (b) provide a designated electronic communications service, or
    (c) make available a designated associated facility,
  unless, before beginning to provide it or to make it available, he has given a notification to OFCOM of his intention to provide that network or service, or to make that facility available.

The definition of electronic communications networks and services is so broad that it covers home networks and personal web sites, for which it would be insane to require advance notification, so I wondered what subset of these Ofcom had designated.

I gather that the overall shape of the Communications Act 2003 was in response to an EU directive that required a technology-neutral regime, hence consolidating Oftel and the Radio Authority into one organisation. However the Ofcom web site is still somewhat divided into technology silos, so there's a section for radio licensing and a section for telephony. Another effect of the 2003 act was to liberalise the telephony licensing regime, replacing it with a General Authorization regime in which no licence is required though providers must follow the regulations. On the surface these so-called General Conditions are couched in technology-neutral terms, but in fact they are specific to telephony (e.g. condition 16 requires support for DTMF dialling). On the other hand, Condition 1 seems to apply to Internet peering within the EU.

Having failed to find anything on the Ofcom website about how section 33 of the act applies to internet services, I decided to make an FOI request asking them to explain what kinds of networks, services, and facilities are designated as requiring advance notification.

(Leave a comment)

Tuesday 27th March 2012

Pogonotomy

I've been shaving with a traditional double-edge safety razor for about 20 months now. It's good. There are a couple of reasons I switched.

The main one is the insultingly exploitative business model of the shaving gear manufacturers. Since the late 1960s they have been ratcheting up the complexity and cost of razors in order to extract more profit from their customers. It's a classic example of patent-driven innovation: they have to keep coming up with new gimmicks that they can monopolize, and use bulshytt to convince people to buy these "better" razors instead of choosing on the basis of objective value or quality. This process has been obviously ridiculous practically since the introduction of cartridge razors, and has long since passed the stage of blatant self-parody.

When Gillette introduced the Mach 3, I stayed with the Sensor; a few years later I decided to see if a Boots own-brand Contour clone was any good despite being a lot cheaper. It turned out than any difference in the razors was dwarfed by variance in my shaving technique, so I switched to the cheaper one. There are now other options in the "less eye-watering than Gillette" segment of the market, such as the King of Shaves Azor or the Dollar Shave Club, but they still buy into the Trac LXXVI bulshytt.

The secondary reason was that although Boots own-brand razors do the job, they are a bit crappy and ugly. I had a vague desire for something more elegant which involved chucking less plastic in the bin. Partly based on satisfied reports from Tom, I invested £40 in a new old-fashioned razor and some consumables.

The Edwin Jagger DE89L is a lovely object. It is made in Sheffield from chrome-plated brass, and has a nice heft: it weighs 76g which is more than four times as much as my old plastic razor. It has a wonderful economy of design (Occam would approve) with only three parts each of which serves multiple functions. The handle is threaded to screw the blade clamp together; the bottom of the clamp includes a safety guard; and the top of the clamp acts as a guide to the angle of the blade against the skin. It's just the right shape for safely shaving under my nose.

ETA: For blades I'm currently using Feather Japanese blades, mainly on the basis of hearsay and prejudice, er, I mean their reputation for quality and sharpness. I don't think it's possible to make a meaningful comparison without fitting different blades to identical razors and using them both during the same shave, repeatedly. (See above about variability of technique.) And I only have one DE razor at the moment.

I've also switched from using a shaving oil to a shaving cream and badger brush. Taylor of Old Bond St, Court Hairdressers are awfully posh but a 150ml bowl of their shaving cream costs less than £7 and you only need one or two ml for a shave. More fun to use and easier than oil to clean up afterwards.

So now my morning shaves are still inexpensive but much more luxurious. Very satisfying :-)

(9 comments | Leave a comment)

Tuesday 28th February 2012

Path names in a rootless DNS

Names in the DNS always appear as "fully qualified domain names" in queries, answers, and name server configurations, specifying the complete path from the root to the leaf. A surprisingly small change would be enough to make query names relative rather than absolute, and this change would have interesting and far-reaching consequences.

The first change (and the key) is to the resolution algorithm. When given a referral, instead of repeating the same question at the replacement name servers, trim off the leading labels of the query name, leaving everything up to and including the leftmost label of the delegation NS records.

Authoritative servers will have to distinguish zones by just their apex label, because that's all that is available in incoming queries. This means that, unlike at present, a nameserver will not be able to serve different zones for example.com and example.net.

This modification means that names now trace paths in a graph, rather than being hierarchial addresses. The graph can be cyclic, for example, if zone A has a delegation to zone B which in turn has a delegation back to A, then names can have an arbitrarily long sequence of A.B.A.B.A cycles round the loop.

How does resolution start in this setting, when there is no root? You (or your ISP) would configure your recursive name server with one or more well-known starting zones, which would function rather like top-level domains.

The key difference between this arrangement and the root zone is that it allows diversity and openness. The decision about which zones are starting points for resolution is dispersed to name server vendors and operators (not concentrated in ICANN and the US DOC) and they need not all choose the same set. They can include extra starting zones that are popular with their users, or omit ones that they disapprove of.

Unlike the hierarchial DNS, you can still resolve names in a zone even if it isn't in your starting set. It will be normal for zones to have delegations from multiple parents, ensuring that everyone can reach a name by relying on redundant links instead of global consistency. So the berlin zone might be generally available as a starting point / TLD in Germany, but if you are in Britain you might have to refer to it as berlin.de.

Instead of a political beauty contest, to establish a new TLD you would probably start by obtaining the same label as a subdomain of many existing TLDs, to establish a brand and presence in your target markets. Then as your sales and marketing efforts make your zone more popylar you can negotiate with ISPs and resolver vendors to promote your zone to a TLD instead. I expect this will force DNS registry business models to be more realistic.

Users may be able to augment their ISP's choice of TLDs by configuring extra search paths in their stub resolvers. However this is likely to lead to exciting RFC 1535 ambiguity problems.

In some respects the relationship between vendors and rootless TLDs is a bit like the situation for X.509 certification authorities. ISPs will have to judge whether DNS registries are operating competently and ethically, instead of relying on ICANN to enforce their regulations.

Trust anchor management cannot rely on policies decided by a central authority, and it will need to cope with a greater failure rate due to the much larger and more diverse population of resolution starting points. Perhaps RFC 5011 automated DNSSEC trust anchor management would be sufficient. Alternatively it might be possible to make use of a zone's redundant delegations as witnesses to changes of key along the lines of a proposal I wrote up last year.

These thoughts are partly inspired by the Unmanaged Internet Architecture's user-relative personal names. And bang paths (in the opposite order) were used to refer to machines in the UUCP network. Some other background is Zooko's Triangle and Clay Shirky's essay on domain names. The PetName system described by Mark Miller is also interesting, and similar in some ways to UIA names.

The rootless DNS doesn't quite reach all the corners of Zooko's triangle. The names are as human-meaningful as a crowded namespace can allow. Names are only global to the extent that network effects promote zones as popular TLDs worldwide - but you can work around this by providing alternate names. Names are secure to the extent that you trust the intermediaries described by the path - and if that doesn't satisfy you, you can promote important names to be trust anchors in your setup.

(13 comments | Leave a comment)

Friday 3rd February 2012

Adventures with IPv6 DNS hosting

Last summer I upgraded my DNS setup to support IPv6 and DNSSEC. After a bit of searching around I settled on using puck.nether.net and GratisDNS as my new secondary servers. GratisDNS has very good DNS server diversity though the web site is entirely in Danish. (Google Translate to the rescue.) Puck gives me a bit of organizational diversity, and supported the remarkably shiny NOTIFY+IXFR for fast update propagation. (But that doesn't seem to be working any more.)

The DNSSEC side of the upgrade has been pretty trouble-free. The main problem is that nic.at do not yet support DNSSEC, so I am relying on the ISC DLV to provide a chain of trust to my zone. The .at zone was actually signed towards the end of last year, though they do not yet have a secure delegation from the root. Hopefully I will be able to get a secure delegation from them before very long.

IPv6 has been a bit more difficult. When I was changing my zone's delegation records in June, I was not able to put more than one IP address for each name server. I could have one IPv4 address or one IPv6 address, not both. I reported this problem to nic.at, and to work around it I created lots of aliases for my name servers for use in the delegation NS records:

 black.ns4.dotat.at. A     131.111.11.130
 gdk3.ns4.dotat.at.  A     194.0.2.6
 puck.ns4.dotat.at.  A     204.42.254.5
 black.ns6.dotat.at. AAAA  2001:630:212:100:646f:7461:742e:6174
 gdk3.ns6.dotat.at.  AAAA  2001:678:5::6
 puck.ns6.dotat.at.  AAAA  2001:418:3f4::5

However a few days later I got an email from GratisDNS complaining that they wanted me to list their name servers by their canonical names in my zone or they would cease slaving it. So I ended up with a delegation NS RRset in the .at zone looking like this, to appease nic.at:

 dotat.at. NS black.ns4.dotat.at.
 dotat.at. NS gdk3.ns4.dotat.at.
 dotat.at. NS puck.ns4.dotat.at.
 dotat.at. NS black.ns6.dotat.at.
 dotat.at. NS gdk3.ns6.dotat.at.
 dotat.at. NS puck.ns6.dotat.at.

And an an apex NS RRset in the dotat.at zone looking like this, to appease GratisDNS:

 dotat.at. NS ns1.gratisdns.dk.
 dotat.at. NS ns2.gratisdns.dk.
 dotat.at. NS ns3.gratisdns.dk.
 dotat.at. NS ns4.gratisdns.dk.
 dotat.at. NS ns5.gratisdns.dk.
 dotat.at. NS puck.nether.net.
 dotat.at. NS black.dotat.at.

This was rather ugly but it worked - mostly. Last week I got a report from Sevan Janiyan that OpenDNS was unable to resolve dotat.at. I reported the problem to OpenDNS and I was pleased to see that they were interested in fixing it.

This prompted me to see if I could make the delegation records for dotat.at less insane, and I was happy to find out that nic.at had fixed the bug I found in June. So I changed the delegation NS RRset to match the apex RRset and deleted the superfluous ns4 and ns6 aliases. Once these changes had taken effect OpenDNS was able to resolve my domain again. Hooray!

However this made it difficult for the OpenDNS techies to reproduce and debug the problem I reported, so I set up a test domain fanf2.ucam.org with a copy of dotat.at's old weird delegation. This allowed them to find the bug, and they are in the process of rolling out a fix. They have even promised to send me some swag in thanks!

So it has been a bit bumpy but it is nice to see the rough edges being rubbed off.

(3 comments | Leave a comment)

Thursday 2nd February 2012

Tennent's correspondence principle, closures and continuations.

Tennent's correspondence principle is a powerful programming language design guideline. It says that the meaning of an expression or statement should not change when an extra level of block structure is added around it. Following this principle strictly leads to profound consequences, especially for control structures and closures.

There are some implications for variable scoping, too. If the extra level of block structure is a binding construct (such as let or for) then the new variable may shadow an outer variable and change the meaning of inner uses of that name. This is usually treated as a style warning rather than an error as Tennent would suggest.

Instead of forbidding shadowing, another way to follow Tennent might be to specify that local variables have function scope not block scope, as Javascript does. Then all uses of a variable name in a function refer to the same object. However this causes trouble if the language also has nested functions, because this brings back nested scopes and the shadowing problem. The combination of closures and function-scoped variables that look like block-scoped variables leads to a well known Javascript gotcha, so it's probably best to stick with block structure.

Tennent's principle gets more interesting when you look at control structures. Firstly, it says that C-style break and continue are a bad idea. If the language instead has labelled loops, so you write break label then the break still refers to the same loop even if you add an intermediate loop.

Strict adherents of structured programming say that you should follow Tennent by abolishing constructs like break and goto. However this is going too far: to code within this restriction you often have to add extra state variables to produce the control flow that you want, which is less readable and less efficient.

But this pragmatism blows up in your face if your language has nested functions. Ideally you would like users of the language to be able to define their own control structures by writing appropriate higher-order functions. Then the bodies of these control structures are nested functions. And within these control structures, break, goto, and return should work as they do with built-in control structures. The problem with this is it allows nested functions to capture their continuations. In fact, you can define call-with-current-continuation as follows (using a Lua-ish syntax with a named return statement analogous to labelled break):

  function callcc(f)
    local function k(r)
      callcc returns r
    end
    callcc returns f(k)
  end

First-class continuations cause enormous difficulties for language implementations and code that uses them is often extremely hard to understand. How can we escape from this trap?

There is the option of hair-shirt structured programming which bans early returns. This prevents inner functions from making non-local jumps to outer functions. There is the option of not supporting nested functions and higher-order programming. But neither of these are very interesting.

Continuations cause problems when they are not used in a stack-like manner. It is possible to keep nested functions to a stack discipline if you allow them to be used in only two ways: they can be called, and they can be passed as a function argument. They cannot be copied into variables in outer scopes or data structures on the heap, and they cannot be returned up the stack. You can lift these restrictions if the closure does not capture its continuation, which is easy to check statically by looking at its jumpy statements.

Smalltalk-style blocks are enjoying a renaissance at the moment, having been re-popularized by Ruby. Instead of making static restrictions, Smalltalk relies on a dynamic check that prevents callcc from working. Mozilla's Rust programming language implements has second-class "stack closures" (though it isn't clear to me if you can jump out of them) as well as first-class closures that cannot capture their continuation.

It seems to me that this approach is a good way to support expressive higher-order programming and embedded domain-specific languages with a conventional stack-based language implementation.

(3 comments | Leave a comment)

Tuesday 31st January 2012

On the safety of SSHFP records.

Proper SSH hygiene is to go through a manual procedure the first time you log into a host to verify that its public key is the one you expect. RFC 4255 specifies a way to use the DNS to verify ssh host keys so you can skip the manual process. The host key fingerprint is published in an SSHFP record in the DNS and is authenticated using DNSSEC.

OpenSSH supports SSHFP lookups if you turn on the VerifyHostKeyDNS option (documented towards the end of ssh_config(5)). OpenSSH does not do any DNSSEC validation of its own, and instead relies on the "authenticated data" flag (the AD bit) in the response from a validating resolver. Its SSHFP checking logic is a bit more complicated than you might expect:

  • If the host key does not match then ssh puts up its usual scary banner, regardless of the AD bit.
  • If the host key matches and the AD bit is set in the DNS response, then ssh connects happily, bypassing the manual check.
  • If the host key matches but the AD bit is clear, ssh goes into the usual unknown host procedure, with an added note about the presence of a matching SSHFP record.

As the RFC explains, none of this is safe unless you are running a validating resolver on the same machine as ssh (or with some other secure communication channel), and the resolver has a chain of trust to your target host's zone.

I am not entirely sure that ssh's behaviour in the no-AD situation is wise. It won't happen in a correct setup, but it's the accidentally incorrect setups that worry me. In particular, if someone has configured their machine to use one of Cambridge University's central validating resolvers and they turn on VerifyHostKeyDNS, then ssh will see the AD bit in DNS replies but it will not be trustworthy because it travelled over the network between resolver and ssh client without verification. There are also quite a lot of non-validating resolvers in the University (e.g. used by the Linux timesharing servers, and handed out by the Eduroam WiFi DHCP servers) and users of those will get ssh's note about matching SSHFP records, and this might give them a false sense of security since it doesn't mention the lack of verification.

So I'm in two minds about deploying SSHFP records. If we add them for hosts such as hermes.cam.ac.uk we need to be confident that people who turn on VerifyHostKeyDNS also run their own validating resolver. However there is no warning in the OpenSSH manual pages that this is necessary - in fact no mention of DNSSEC at all, nor any explanation of the difference between secure and insecure DNS fingerprints.

These worries would not exist if OpenSSH did its own DNSSEC validation of SSHFP records, and if it did not do questionable things with untrusted SSHFP records.

(2 comments | Leave a comment)

Monday 9th January 2012

Contents of my pot of small change

I have an earthenware pot into which I dump my small change: any coins worth less than 50p. (It's like a pot of gold that has suffered hyperinflation.) It was nearly full, so I took the coins to my bank in a very sturdy bag to convert them back to bits. Happily HSBC have a coin counting machine which is free for customers. Here are the results:

coin count value weight
1p (3.56g)  309 £  3.09  1100.04g
2p (7.12g)  163 £  3.26  1160.56g
5p (3.25g)  283 £ 14.15   919.75g
10p (6.5g)  294 £ 29.40  1911.00g
20p (5.0g)  426 £ 85.20  2130.00g
total 1475 £135.10  7221.35g

The machine also found US$1.91, NZ$0.50, HK$0.50, and €0.02.

(9 comments | Leave a comment)

Wednesday 7th December 2011

nsdiff 1.33

Last month I did some work on nsdiff based on bug reports and feature requests from Piete Brooks at the University of Cambridge Computer Laboratory. The changes became a major overhaul, though I have managed to keep the program short. Here's a copy of the announcement I sent to a few mailing lists...

nsdiff is an add-on tool for BIND that compares old and new versions of a zone and generates an nsupdate script which turns the old version into the new version. It is designed to bridge the gap between static master files and dynamic DNS updates, making it easier to use auto-dnssec maintain.

http://www-uxsup.csx.cam.ac.uk/~fanf2/hermes/conf/bind/bin/nsdiff

This update includes an important fix to deal with replacing a CNAME with other RRtypes or vice versa. The DNS update protocol requires that all the old RRs are removed before adding the new RRs if any of them are CNAMEs. If you violate this requirement part of the update will be ignored, with the only sign of a problem being a message in BIND's logs.

Other changes include: configurable SOA serial number formats and verbosity; more control over how large numbers of changes are split into multiple update requests; and fewer restrictions on DNS record owner name syntax.

(Previously, previously, previously, previously.)

(Leave a comment)

Monday 5th December 2011

Mail switch naming and addressing at Cambridge

A postmaster at another university asked me why Cambridge has just one MX record pointing to a host name with multiple IP addresses, and what our experiences are with this setup. I thought I would post my answer in public since it might be of general interest.

Our current setup dates from 2004, though we reshuffled it a bit in 2010. It still has some historical artifacts which it would be nice to fix, but which aren't all that important.

Until 2004 our mail hub ppsw.cam.ac.uk (named after the infamous JANET email relay software) handled both incoming and outgoing email. Since approximately the dawn of time ppswitch has been scaled to multiple servers by giving the name multiple IP addresses. (PPswitch dates from 1991; I don't know when it was first scaled to multiple hosts - mid 1990s?.) We've generally depended on hardware and software reliability rather than fancy load-balancing fail-over appliances; this has been a very cheap and effective strategy for the last 10 years, though it didn't work so well when we were running PP :-)

By 2004 ppswitch was also providing a message submission service on smtp.hermes.cam.ac.uk, which ran on a different set of IP addresses on the same machines. (Plus POP+IMAP proxies which aren't really relevant to this post.) At that time the Exim configuration was a bit unsatisfactory because it did not clearly distinguish between the different classes of traffic - incoming, outgoing, submission - which meant it was not possible to take aggressive SMTP-time anti-spam measures without affecting internal email service.

So we created mx.cam.ac.uk to replace the use of ppsw.cam.ac.uk in MX records, keeping the traditional name ppsw.cam.ac.uk for outgoing relay service. Since then each ppswitch machine has had three public IP addresses, one for each type of service. Exim is configured to behave differently depending on the IP address that the sender connected to. The delivery logic is the same regardless of how messages arrive.

The setup of mx.cam.ac.uk was basically a copy of ppsw.cam.ac.uk and smtp.hermes.cam.ac.uk, which is why it is configured like a scaled service host name rather than making use of the extra indirection that MX records allow. This simple arrangement has never really been a problem for us. The load is not perfectly balanced - we tend to get more on the lowest IP address - but it has never been impossibly out of whack. The extra traffic tends to be easily-rejected spam and we have enough headroom that it isn't a problem.

Last year we made a change that improves ppswitch's managability and robustness - more the first than the second in practice, but auditors like to hear about disaster recovery plans. Now, each ppswitch machine by default only has a management IP address (and since this is the system's default IP address it is also used for outgoing connections). Machines in service or testing also have three service IP addresses for incoming connections.

The service addresses can be brought up on any of the physical servers, so if one of them dies we can bring up its addresses on a spare server. We can also use this for potentially disruptive configuration changes: put the new configuration on a spare server, flip the IP addresses over, and in case of cockup back out with a reverse flip. This is considerably better than relying on DNS changes to move service between machines, as we used to do!

This year we did IPv6 day, and we're in the process of putting IPv6 into full service on ppswitch. The IPv6 setup is basically the same as the v4 one, except that we have allocated separate addresses for the IMAP and POP proxies in v6 whereas they share the message submission address on v4. So a dual stack machine has 5 v6 and 3 v4 service addresses plus a v4 and v6 management address.

You can see how all this appears in the DNS if you run

dig axfr cam.ac.uk @authdns0.csx.cam.ac.uk | grep ppsw | grep -v RRSIG

That should give you some idea of how we have laid out ppswitch's names and IP addresses. The public service host names are: ppsw.cam.ac.uk (outgoing relay), mx.cam.ac.uk (incoming anti-spam gateway), smtp.hermes.cam.ac.uk (secure message submission), and {pop,imap}.hermes.cam.ac.uk (message store access).

We have well-defined IP address ranges to accommodate parts of the University with strict packet filters: 131.111.8.128/27 and 2001:630:212:8::e:0/112.

The way the (numbered) physical hosts and the (lettered) virtual service addresses fit into the v4 range is complicated. The final decimal digit tells you whether it's a physical host (0,1 = on site, 2,3 = off site) or virtual service address (4,5 = testing, 6-9 = live), and the penultimate digit defines which kind of service (3 = ppsw, 4 = mx, 5 = hermes).

What could be improved?

I would quite like to rename all the hosts into a mail.cam.ac.uk subdomain, instead of using the generic Computing Service Internal domain.

I have occasionally wished for an MX host name like mx0.mail.cam.ac.uk, so we have the option of more flexibility without polluting our top level namespace. But the only thing that might have benefited from the ability to add MX records was the possibility of fake low-priority anti-spam MXs.

The current naming scheme for the physical and service addresses is confusing and not as helpful in practice as I thought it might be. But I haven't come up with a scheme that is better enough to be worth the effort of renaming.

(4 comments | Leave a comment)

Thursday 1st December 2011

Some notes on git hosting software

My colleague Ben Harris has been working on a configuration management system, based on my git-deploy idea with added cryptographic security. Ben's "starling" system will ensure that servers will only deploy configurations that have been gpg signed by a trusted sysadmin.

We have not yet got a git repository hosting setup, though it is an obviously required part of the system. So this week I have been looking at what software is out there. To make it interesting we would like to go beyond the usual corporate deployment and see if we can do something useful as a university-wide service. "Something useful" unpacks into the following shopping list of features, which I am posting here in case anyone outside the CS has good suggestions.

Basics
Read/write access via ssh, with per-repository and per-branch access controls. Browse repositories via the web.
Delegated access control
We would like to delegate repository creation and management to groups (such as the computing service itself, other University departments, research groups, etc.) and we would like group managers to be able to delegate repository access control to repository managers.
Repositories for individuals
Each user should have an automatically provisioned group of their own (like github).
Public and private repositories
Repository managers should be able to allow anonymous read-only access via the git protocol and the web repository browser.
Authenticated browsing
Users who have read access to a repository should be able to browse it via the web.
External collaborators
Allow repository managers to give access to users without University accounts.

What software can do this for us? Here's a quick review of the candidates that I know of. Any other suggestions are welcome. (I have not included gitosis since it was made obsolete by gitolite.)

github
The obvious outsourcing option. Has all the features we want, I think, though we would have to pay, and they don't advertise prices for the scale we would need just for the computing service.
gitolite
A set of perl scripts that just does access control to repositories via ssh. Management is done by commits to an admin repository. This model implies a petty bureaucracy of people who have commit access to the admin repository. For delegated management we would need a gitolite install per group. It doesn't support delegating access control decisions per repository. Individual setups are probably not feasible - point them at github instead? Web access is anonymous-only. External collaborators are easy since the repoman has complete control over which ssh keys have what access.
gerrit
The web-based code review tool developed for the Android project. As such its focus is on a feature we don't particularly care about. A big Java program with its own ssh and git implementations. It allows access control delegation per repository, but it does not allow delegation of repository creation. It supports web access controls and has hooks for web single sign-on, by default using OpenID but the "siteminder" support can probably be used with Raven.
gitorious
Affero-GPL source for a github competitor. Big Ruby-on-Rails app. Designed to allow users to do their own access control and set up their own groups. Big downside is lack of support for private repositories (but see this merge request). Bonus wiki feature.

I think the choice is between gitolite and gitorious. Gitolite has the advantage of simplicity at the cost of several desirable features. Gitorious would require us to maintain a fork.

(16 comments | Leave a comment)

Tuesday 29th November 2011

DNS DNAME interoperability problems

David Blacka recently posted a complaint about the limited usefulness of DNAME sub-domain aliases in the DNS. Everything he says is right (except perhaps his linkbait title!) but I have a few points to add.

It's worth noting that the IETF has been working on updates and clarifications to RFC 2672 which should soon be published as an RFC.

David points out the awkwardness of DNAME only aliasing sub-domains and not the name itself. This was one of the main points of discussion last year when the IETF dnsext working groups was talking about better support for spelling variations. There were a few proposals to address this problem. One option was to relax the restriction that a CNAME may not coexist with any other RRs, so that you can have both CNAME+DNAME at a name. Alternatively there is the proposed BNAME RR type which acts as both a CNAME and a DNAME. These are all options for the long term, and the whole discussion has been on hold for several months while clearer requirements are gathered from the IDN experts for who this feature is intended.

There is not very much deployment of DNAME out there. Ian Eiloart asked if any UK Universities use DNAME to do NRS-style long form / short form aliasing. I did a quick survey and found five DNAME RRs at the apices of zones under ac.uk.:

cant.ac.uk.             300     IN      DNAME   canterbury.ac.uk.
king.ac.uk.             28800   IN      DNAME   kingston.ac.uk.
sund.ac.uk.             3600    IN      DNAME   sunderland.ac.uk.
oxford-brookes.ac.uk.   28800   IN      DNAME   brookes.ac.uk.
oxfordbrookes.ac.uk.    28800   IN      DNAME   brookes.ac.uk.

Cambridge's chief hostmaster Chris Thompson pointed out to me that there is currently one top-level domain with an apex DNAME record, using it for variant spellings of internationalized domain names as David Blacka described:

xn--kprw13d.		86293	IN	DNAME	xn--kpry57d.

De-punycoded, this aliases everything under 台湾 to the corresponding name under 台灣, which are respectively simplified and traditional Chinese for Taiwan.

At Cambridge we are using DNAME to consolidate 128 reverse DNS domains, {128-255}.232.128.in-addr.arpa, into a single reverse zone in-addr.arpa.cam.ac.uk. The class B IP address block 128.232.0.0/16 is delegated to the Computer Laboratory which has in turn delegated the top half 128.232.128.0/17 to the Computing Service for use by the rest of the University. The DNAME trick slightly simplifies the Computer Lab's reverse zone, and massively reduces the number of zones that the Computing Service has to run. It is essentially classless reverse DNS for large CIDR blocks.

This is almost exactly what David Blacka calls the "canonical use" for DNAME. However all is not sweetness and light. We have found that DNAME in the reverse DNS causes occasional interoperability problems. There are two cases I know of, both of which are due to software that strictly checks DNS packet syntax and is upset by unexpected DNAME RRs.

  • The University Press's mail exchangers have IP addresses in 128.232.233.0/24, in our DNAME range. They were having problems getting mail through to Comcast's mail servers, which dropped connections from the Press with a 421 temporary error because of a "Reverse DNS failure".

    Because of this, that reverse DNS block contains 256 CNAME records instead of one DNAME record.

  • The glibc resolver code bleats into syslog whenever it encounters unexpected RR types, including DNAME. The message it logs comes from the AskedForGot() macro on line 98. In fact the glibc code is disgracefully out of date and poorly maintained: for instance, it has some ancient support for skipping DNSSEC records, but it doesn't know about the DNSSEC-bis RR types introduced in 2004 with RFC 3755.

    This is mostly benign, apart from putting a lot of unnecessary noise in the system logs.

I expect that any serious attempt to use DNAME in the forward DNS will encounter many more interop problems, especially with MTAs (which often have custom resolver code to deal with MX records) and crappy DNS proxies in consumer routers and captive portals. A quick Google fails to find anything on the topic published by the four universities I listed above. Has anyone else published their experiences?

ETA: Doug Barton reminded me of the other proposals that had been suggested to support IDN variants. They avoid DNAME's interop problems and somewhat reinforce David Blacka's argument that DNAME is useless. The most straightforward suggestion was that no protocol support is needed, if you add zone clone support to master servers. However this doesn't make it easier to provision cloned zones on slave servers. Doug made a more sophisticated proposal for a CLONES RR which allows authoritative servers to auto-provision alias zones, and allows clued-up resolvers to avoid duplicate cache entries for a zone and its clones.

(2 comments | Leave a comment)

Tuesday 11th October 2011

Time on Terra Nova

There is a new sf series on Fox in the US called "Terra Nova". The premise is that the protagonists have travelled back 85 million years in time from 2149 to escape environmental collapse. Back then in the time of the dinosaurs the length of the day was one or two percent shorter than it is today. This led Daniel Tobias to ask the leapsecs mailing list how time should be handled by the colonists. A Mars colony would also have to answer a similar question, because they would have to cope with a day that is a couple of percent longer.

The problem is that our base unit of time does not conveniently divide the length of the day. This is inevitable however we choose to fix our units, because the length of the day is not constant. Travellers to other times or planets have to deal with an exaggerated version of the problem, but it's also true on present-day Earth. The length of the second is based on the length of the day over 100 years ago, and it is now off by about 10-8. As well as tidal slowing, there are also periodic and random variations in the length of day, which also happen to be on the order of 10-8.

There are about five ways of dealing with this problem which I'll divide into two and a half categories..

Digital schemes

When we're dividing up longer time periods into days we use calendars. We can also view schemes for dividing days into seconds as simple calendars.

Observational calendars

The simplest calendars are based on observation of astronomical phenomena. For instance in the Islamic calendar, when the new moon occurs the day counter is reset and the month counter is incremented.

Our current system of leap seconds in UTC is essentially an observational calendar, where what is being observed is the difference between mean solar time (aka UT1) and atomic time. Leap seconds are inserted to keep this difference less than 0.9 seconds.

A time or space colony could use a similar system, though their day is unlikely to be close to a round number of seconds in length, so they will probably need to switch back and forth frequently between short and long days. The disadvantage of observational calendars is that they are not predictable, so it is not possible to schedule events in the future with any precision.

Arithmetical calendars

If you have enough astronomical sophistication to measure periods of rotation and revolution accurately, you can set up a calendar with fixed arithmetic rules. For example, the Julian and Gregorian calendars.

There has been a lengthy discussion over the last ten years on the possible discontinuation of leap seconds. This would turn UTC into a very simple arithmetic calendar.

Arithmetic calendars tend to drift out of sync, either because of errors in their initial setup, or because the relevant periods are no longer what they were. Reforming a calendar to fix it is incredibly painful - think of the 350 year transition period required by the Gregorian reform.

There is a work-around available when you are dealing with a mis-match between the nominal length of day and the actual length of day, which is not available for normal calendars. Provided the difference is small enough, less than about 10-5, you can accommodate mismatches by adjusting timezone boundaries. This is easy to cope with if your timezone system is already handling random political fluctuations, and will probably happen without the need for any central co-ordination.

The difficulty with this scheme is that time of day is not a good approximation of planetary angle of rotation relative to the sun, so astronomical and navigational systems will need a source of UTC-UT1 data (aka DUT1).

Fractional schemes

In Kim Stanley Robinson's Mars trilogy the colonists stop the clocks between 24:00 and 00:00 to allow for the extra forty minutes in a Martian day beyond 24 hours. (I presume they don't actually stop the clocks inside their support systems since they still need useful telemetry logging during this "timeslip", amongst other things.) There is a similar arrangement in David Weber's Honor Harrington books.

You could perhaps allow for a partial second at the end of each day, to make the nominal length of day exactly match the actual length of day. The disadvantage is this would cause an awkward glitch in time and frequency reference broadcasts.

ETA: The fractional second idea is another trick that time-of-day calendrical systems can use which more common calendars can't, because the second is an artificial unit of time not a measurement of the position of a celestial body.

Analogue schemes

Instead of having a variable number of fixed-length seconds in a day, we can have a fixed number of variable-length seconds in a day. There are a couple of ways of doing this.

Rubber seconds

In the 1960s, time and frequency reference broadcasts were matched to the length of day using a combination of frequency adjustments and occasional jumps of 0.1 or 0.2 seconds. This allowed them to track UT2 more closely than modern UTC tracks UT1.

However it had the disadvantage of requiring difficult adjustments to the broadcast equipment make the frequency changes, and it made it more difficult for users of the broadcasts to obtain a precise reference frequency. So it was abandoned in favour of the simpler UTC scheme.

A modern variant of this is smoothed leap seconds or leap smear, where rubber seconds are used temporarily to avoid glitches caused by leap seconds.

Two timescales

All the above schemes start with a timescale based on seconds, and try to accommodate the variable length of days within this timescale (except for rubber seconds which are the other way round). Instead of trying to reconcile the irreconcilable, we could instead work with two separate timescales.

For precise time and frequency applications, establish an atomic timescale that is as stable as possible. It might be sensible to use a different base unit, say the Planck time, or rename the atomic second to say the essen, in order to avoid confusion with subdivisions of the day. Instants in this timescale should be labelled as a count of seconds since an epoch, not in YYYY-MM-DD HH:MM:SS form to avoid confusion with time of day.

For civil time of day, use UT1. All calculations involving civil time should be done inside this system, ignoring its relationship to atomic time. This timescale is not suitable for high precision applications, since there's an inherent instability of about 10-8. It retains all the properties of pre-atomic time: fixed number of seconds per day and synchronization with Earth rotation.

Time and frequency broadcasts should be based on the atomic timescale. In order to obtain civil time, these broadcasts should include the atomic time when the current civil day started, and its length in atomic seconds (and possibly also the current rate of change of the length of day).

In terms of the POSIX clock_gettime() interface, the atomic timescale roughly corresponds to CLOCK_MONOTONIC, and civil time corresponds to CLOCK_REALTIME i.e. time_t seconds since the epoch.

Back in the real world

I think the two timescales arrangement is the best way to model what is actually going on. However it is difficult to see how it could be deployed. It requires us to separate out the high-precision time and civil time functions of a lot of critical systems, such as the MSF / DCF77 / WWV / etc. time signals, GPS and other navigation systems, and NTP.

We should find out within the next year what will happen to leap seconds. I think time geeks underestimate the problems they cause by breaking deeply embedded cultural assumptions about time. The notation we use for time of day is many thousands of years old, and changing it is possibly a reform of Gregorian scope. So I tend to be in favour of abolishing them even though it'll cost a lot for astronomers to fix. Cheaper than fixing POSIX time, though.

(Previously: iCalendar is wrong.)

(6 comments | Leave a comment)

Wednesday 14th September 2011

How my link log works

My link log is a fairly horrible combination of three perl scripts. It's "symbiosisware", a quick hack specifically for my needs and not designed for general usage. (I think I got that term from Simon Tatham.) Still, people occasionally ask how it works, so here goes.

The visible part is a 35,000 line CGI script which consists almost entirely of a data structure containing my log of nearly 7000 links. (This is perhaps a bit wasteful - the CGI uses about a quarter of a second of CPU time.) This program produces the HTML web page, the atom feed, and does redirections for the short versions of the URLs. There's also a periodic log analysis job (some vile seddery) which counts how many times each short URL has been requested in the last couple of weeks.

New URLs are added to the CGI script using a command line utility which also feeds the links to IRC, Twitter, and Delicious. (I posted previously about scripting Twitter using Jef Poskanzer's utilities.) This script also chooses the random tag used in the short version of the URL.

Finally, I don't have convenient access to the command line when I am using my iPhone, so I use its build-in facility to mail a link to a particular address. These messages are delivered to a special mailbox (using Sieve) which is polled periodically to extract the links and feed them to the command-line tool. The cron script is mostly a shonky IMAP implementation and RFC 822 message parser...

You can see I'm not particularly fond of digging through CPAN to find which of the million libraries is worth using...

(1 comment | Leave a comment)

Monday 12th September 2011

Lua Workshop 2011

I spent Thursday and Friday last week at FiBL, an organic agriculture research centre, in Frick, a small town between Basel and Zurich. I was there for the Lua Workshop 2011. There were a bit more than 40 attendees, of which two were women, with a good spread of ages from about 20-65. Some themes came up repeatedly: live code upgrades, distributed systems, Lua internals, LaTeX Beamer slides, and really good demos. Here's my summary of the talks. The slides should appear on the workshop website in due course.

The first day started with Alexander Gladysh, who talked about his efforts to find a good strategy for implementing declarative embedded domain specific languages in Lua. Lua was originally designed to be good as a data description language, so it has quite nice lightweight syntax for EDSLs. Alexander found that implementing the infrastructure for EDSLs often got bogged down in spaghetti code and boilerplate. In his talk he explained how he improved the situation using common data structuring conventions and traversal algorithms. It was a very high bandwidth talk with over 60 slides (most containing code) in less than 40 minutes. I need to go back to read the slides at leisure.

Next was Fabien Fleutot from Sierra Wireless. They make embedded system monitoring equipment, for checking the performance and status of things like wind turbines and street lights. The communication between these in-the-field assets and the back-end monitoring systems is usually via GPRS, so it's important for them to minimise bandwidth usage, and they mostly have to rely on the assets to initiate outgoing connections since their connections are often down or behind NAT. Their monitoring devices run embedded Lua and make heavy use of coroutines, with a nice IPC framework and scheduler. Fabien described how this makes it easier for them to do in-field upgrades and quickly adapt their products for new customers. They're planning to release their framework as open source.

After coffee, Gaspard Bucher talked about and showed videos of his exploits using computers to support live dance and music performance, including turning his body into a MIDI device so he could use music to make his muscles contract involuntarily, and attaching motion sensors to dancers so their movements can be translated into sounds. The latter required a lot of custom hardware and research into using support vector machines to distinguish different kinds of movement. He explained how the pressures of performance seriously damaged the maintainability of his code. He's now working on Lubyk which is a Lua system with many bundled libraries, intended to provide a basis for his computer-assisted performances. It is a distributed development environmnet, using Bonjour to tie the components together, with a graphical editor to link the parts and allow you to live-edit the code on remote systems. Very swish, but still a bit raw.

He was followed by Roberto Ierusalimschy, the principal author of Lua. He talked about some of the design considerations the Lua team use when evolving the language, and discussed some of the areas that are particularly troublesome - table length, varargs, bitwise operations on floating point numbers.

After lunch Wim Couwenberg of Océ talked about using Lua as a diagnostic tool for the controller software for their large printers. The controller software itself is a set of C++ processes running on a Windows PC attached to the printer. They have an IPC framework with an XML-based IDL, which includes extensive logging of the activity of the system as a whole. Wim replaced their log processing tools with a much smaller Lua script which translates an IDL specification to Lua and runs it to process a log. He extended the XML IDL with Lua snippets that can verify the interface requirements have been followed. This improved their ability to debug the system so much that they are using more IPC to get better visibility into the system. A nice success using just base Lua without add-ons.

Next was Valerio Schiavoni of Neuchatel University, talking about the "Splay" system for making distributed application research easier. It runs on PlanetLab and provides some higher level facilities for distributed IPC, and makes it easy to push Lua code out to a selection of machines, run it in a sanbox, collect the data, and produce visualizations of the results. He did a very smart live demo of a virus propagation simulation on 200 machines around the world.

After coffee was Peter Cawley, an undergraduate at Oxford University. He talked about his work finding and exploiting holes in Lua's type system using cleverly crafted bytecode. By default Lua is compiled without the assertions that do thorough type checking of values passed by C embedding/extending code to the Lua API. This means that (for instance) you can make the low-level table index function try to interpret a string as a table. The bytecode interpreter is mostly typesafe but there are a few holes which can be exploited, which Peter described in some detail. Lua 5.1 has a bytecode verifier which is supposed to make it safe to load untrusted bytecode, but because of Peter's exploits, Lua 5.2 has no verifier and instead makes it easy to restrict loaded code to source only, which is safe provided the library is suitably restricted. Peter has gone on to write a couple of replacement bytecode verifiers. The first is based on a type inference algorithm, but this turned out to be too slow to be practical. The second is based on a simpler analysis of which stack slots are variables or temporaries or unused, and verifying that they are used consistently.

The last talk on Thursday was Erik Hougaard talking about the series of robots called "Crazy Ivan" which he built with his brother-in-law. They compete in the Danish Technical University RoboCup, which is an annual obstacle course for autonomous robots. Erik compared the structure of the competition to a Formula 1 race, in that the competitors have time to set up their robots to suit the course, and there is a qualification stage before the final competition. Crazy Ivan's software is customized during the competition to follow the specific course and win points by performing tricky tasks. They use Lua to make it faster to adapt the software, in particular the high level plan of the course and tasks. As well as videos of previous competition performances, Erik did a live demo of the current version of Crazy Ivan, whose control computer is a complete PC running Windows XP with on-board copies of Visual Studio and PIC programming tools for the robot's peripherals. He ran a remote desktop session from his laptop to the robot to show the computer vision algorithms it uses to follow the track and spot golf balls etc. Lots of fun!

On Thursday evening we had an organic wine tasting at FiBL's winery, with a talk on the diffrences between organic and traditional viticulture. After that we went for a meal at a local restaurant.

The first speaker the next morning was Gaetan Morice who also works at Sierra Wireless. He was speaking about the Lua development tools they are putting together. They are keen to open up their devices for their customers to program, for which an attractive development environment is necessary. They have put together a lot of open source components, starting with Eclipse and adding JnLua, MetaLua, LuaDoc, and a lot of their own code to produce a pretty swish IDE. They are planning to release it as open source under the umbrella of the Eclipse Foundation, since they want to avoid the appearance of trying to lock their cusomers into a proprietary system. His live demo included lots of nice features like code completion, highlighting instances of the same variable, and their debugger.

The second talk of the morning was Roberto again, filling in for another speaker who was unfortunately ill. This talk was about new features in Lua 5.2, including more flexible coroutine yields, emergency garbage collection when memory is low, ephemeron tables to avoid some memory leaks, lexically scoped global tables, light C functions, the bitwise operation library, and the goto statement. The talk covered the rationale for the features and the implementation challenges for the trickier ones. He's always pleased when an improvement allows him to delete code.

After coffee, Henning Diedrich talked about d'Arc, which is a performance oriented extension to the Lua API. He bypassed the standard API in order to make his JSON serializer faster, since it was a bottleneck in his game engine. The original PUC Rio implementation of Lua and LuaJIT are similar enough that Henning could make d'Arc work with both of them. The main feature of d'Arc is a faster table traversal function. In the standard Lua API tables are traversed using the lua_next() function which has to re-check its arguments and re-find its iteration point on every call. darc_traverse() inverts control so it keeps state and calls down to a fold function for each table element. There was some discussion between several people about the wisdom of bypassing the API and whether Lua's tables were keeping their performance promises.

The next talk was by Ashwin Hirschi, talking about the Reflexis Flow framework for web applications. This is a declarative EDSL for describing a workflow between dynamic web pages, intended to be accessible to non-programmers. Ashwin did an impressive live coding demo putting together a little web app with a form taking a name, height, and weight, and calculating the person's BMI. He then added some bells and whistles including a Google Gauge widget to make the BMI number look more pretty, and a database search for looking up names from a list of workshop attendees. It can also draw a graph to visualise the traversals between pages. He then showed off the traversal graph for a larger application, which is a meta-recursive web-based editor for Reflexis Flow applications.

After lunch, Patrick Rapin of Olivetti described LuaDura, which they use for diagnostics and servicing of their printers. Their firmware is written in C++, and they have a nice reflection system that allows you to make arbitrary virtual method calls via RPC over the service port. LuaDura ldownloads a description of all the printer's functionality from the service port into Lua on a controlling computer. It includes some really nice readline support so you can tab-complete over the printer firmware API. It's also multi-threaded, and can control multiple printers at once, or run in the printer firmware itself. LuaDura is used a lot for development of printers, and in production for things like flashing the printer's serial number. Patrick finished his talk by abusing a printer, making its scanner LEDs shine with random colours, and making the paper and head motors play the William Tell Overture in two tone polyphony.

Then Francesco Santorini from the University of Basel Hospital talked about using Lua for processing images from MRI scanners. The development environment for their Siemens MRI scanner does not encourage experimentation or ad-hoc image processing pipelines. Their practice before Francesco developed IceLuva was to transport images by sneakernet to a system running MATLAB. After plugging Lua into the Siemens software as a module, they were able to easily translate their MATLAB scripts into Lua and fit them neatly into the existing system, displaying the output of the pipeline on the scanner's console. Francesco was sadly unable to bring the scanner with him to do a live demo, but did have a recording of the scanner playing Smoke on the Water.

After coffee, Peter Odding spoke about Lua/APR. He started work on binding the Apache Portable Runtime library to Lua in 2007, at which time he did not know how to program in C, but approached the project with the attitude, "how hard can it be?". After learning about manaual memory management, segmentation faults, and Windows calling conventions, he released Lua/APR in 2010. One of the problems he had to overcome is the mismatch between APR's pool-based allocation strategy and Lua's garbage collector; his solution is to allocate a pool for each APR object, which is simple if not all that efficient. Lua/APR includes some shaky prototype multithreading support, which is still under development. Peter wants to get it to the stage of being able to implement a performant web server in Lua.

Finally, the last talk was given by one of the organizers of the workshop, Marc Balmer. He talked about his project to run Lua inside the NetBSD kernel. His goals are to use it for rapid prototyping of drivers, including reverse engineering undocumented hardware. Other possibilities include scripted autoconfiguration. At the moment he has the infrastructure for creating Lua states in the kernel, autoloading Lua modules, and running scripts using this context. His next goal is to write a working driver for some simple watchdog hardware. One other significant area of difficulty is multi-processor safety. He has promised to present a talk at FOSDEM in February on his experiences writing drivers in Lua.

(Leave a comment)

Friday 2nd September 2011

Version 1.13 of nsdiff

I have updated nsdiff to support wildcards and _underscore tags in owner names. Wildcards were prompted by Ian Jackson's desire to test resolver behaviour for his IP-over-DNS implementation. SRV and DKIM records use _underscore tags. Alan Clegg of the ISC says nsdiff is "very cool" :-)

(Previously, previously, previously.)

(Leave a comment)

Wednesday 31st August 2011

Web browser stats

Here are some numbers from our webmail service. These cover the dates 2011-08-03 - 2011-08-29 inclusive, during which time there were 1320906 logins by 23087 users. Edit: I have corrected & clarified the Mobile Safari breakdown.

users : platform

20130   Windows
 7193   Mac OS
 2671   iOS
 1714   Linux (desktop)
 1054   Android
  393   BlackBerry
  313   Symbian
   92   Samsung
   89   Kindle
   69   SonyEricsson
   57   Nokia
   54   J2ME
   36   SunOS
   11   LG

users : browser

12965   MS IE
12063   Firefox
 7190   Chrome
 5165   Safari (desktop)
 4014   Safari (mobile)
  446   other mobile
  406   Opera
  109   Opera Mini

 9052   MS IE 8
 5277   MS IE 7
 2960   MS IE 9
 1477   MS IE 6
   12   MS IE 5 for Mac

 6566   Firefox 5
 6392   Firefox 3.6
 4226   Firefox 6
 1457   Firefox 4
  823   Firefox 3.5
  775   Firefox 3.0
  322   Firefox 2
   66   Firefox 7
   48   Firefox 1
   31   Firefox non-standard

 6390   Chrome 13
 4132   Chrome 12
  176   Chrome 14
  124   Chrome 11
  112   Chrome 10
   49   Chrome 8
   42   Chrome 6
   41   Chrome 9
   32   Chrome 7
   32   Chrome 5
   28   Chrome 4
   25   Chrome 15

 4974   Safari 53x
  194   Safari 52x
   61   Safari 3xx
   61   other desktop WebKit
   14   Safari 4xx

 3445   Mobile Safari 53x
  554   Mobile Safari 52x
  136   Mobile Safari 41x
   40   Mobile Safari 42x

 2466   iOS     WebKit 53x
  960   Android WebKit 53x
  258   iOS     WebKit 52x
  257   other   WebKit 52x
  170   other   WebKit 53x
   55   Android WebKit 52x
(3 comments | Leave a comment)

Tuesday 30th August 2011

MUA stats

Following a request from a colleague, I have compiled some statistics on email software used with our service. It's well over a year since the last time I did this. The numbers this time are generally larger because I analyzed a longer period of time. The main difference is the massive increase in the number of iOS users. The main WTF is where are all the Android users?

You need to be careful when interpreting these numbers. They come from User-Agent: and X-Mailer: headers in messages sent through our message submission service, smtp.hermes. It does not cover the parts of the University that have their own independent mail services. We aren't able to obtain stats from IMAP and POP connections, since there is no way for MUA software to provide a user agent string in that context, so we have to assume that there isn't a gross mismatch between software used to read and send email. Also, not all MUAs include a User-Agent: or X-Mailer: header, though the popular ones do.

The following data covers the dates 2011-07-29 - 2011-08-25 inclusive. In that period Hermes users sent 1430479 messages of which 1370145 (95.8%) included a user agent string. The total number of distinct users using the service is about 25,000. (Apart from xmas, August is our quietest time of year; we have over 30,000 users in term.) The numbers below count distinct users running each identified variety of software. The detailed per-version and per-OS breakdowns do not add up to the overall totals because some users use more than one version / OS.

users software

17391 Prayer (webmail.hermes)
 3868 Thunderbird
 3271 Apple Mail
 2454 Microsoft Outlook
 2131 iPhone / iPad / iPod
  465 Windows (Live) Mail / Outlook Express
  260 Mulberry
  236 Entourage / MacOutlook
  221 Eudora
  114 Evolution
   86 Pine / Alpine
   81 Android
   40 EPOC
   21 Opera
   19 KMail
   13 Mutt
   10 sparrow

Thunderbird breakdown

 1253 Thunderbird/6.0
 1815 Thunderbird/5.0
 1868 Thunderbird/3.1
  157 Thunderbird/3.0
  570 Thunderbird 2.0
   25 Eudora/3.0

 3099 Windows
  491 Macintosh
  468 Linux/X11

  510 Windows NT 6.1; WOW64;
  795 Windows NT 6.1;
   29 Windows NT 6.0; WOW64;
  310 Windows NT 6.0;
    1 Windows NT 5.2; WOW64;
    5 Windows NT 5.2;
 1433 Windows NT 5.1;
    4 Windows NT 5.0;

   39 Intel Mac OS X 10.7
  299 Intel Mac OS X 10.6
  101 Intel Mac OS X 10.5
   22 Intel Mac OS X 10.4
   18 PPC Mac OS X

Apple breakdown

  587 Apple Mail 5
 2048 Apple Mail 4
  802 Apple Mail 3

 1777 iPhone
  394 iPad
  117 iPod

Microsoft breakdown

 629 Outlook 14
1335 Office Outlook 12
 502 Office Outlook 11
  56 Office Outlook, Build 11
  56 Outlook, Build 10
  20 Outlook IMO, Build 9

 197 Outlook Express 6
  94 Windows Mail 6
 137 Windows Live Mail 15
  43 Windows Live Mail 14
   3 Windows Live Mail 12

  56 MacOutlook 14
 142 Entourage 12
  37 Entourage 11
(19 comments | Leave a comment)

Saturday 20th August 2011

We cannot use Google+

If you follow my link log you will know that I have been following the "nymwars" saga with some interest. (The kind of interest one has in watching someone do something bone-headedly self destructive, accompanied by popcorn and preferably beer.) I have in fact signed up for Google+, but I don't use it except for following the occasional link to a post there. I don't have any particular interest in spending the time to work out how to get a decent amount of benefit from it - perhaps that will change once others have beaten the path (as happened with Twitter). That combined with the nymwars led me to delete the circles that I set up in the first few days of the service.

The Google+ name policy means it would be foolish of me to invest any effort in the service. Although I "use the name [my] friends, family or co-workers usually call [me]" this is not the same as the name I use for formal purposes. If anyone were to take exception to me and flag my G+ account, I would not be able to prove to Google that Tony Finch is a valid name for me. In fact I think the name that would be acceptable to their reinstatement process would not be recognisable to most people who know me since no-one refers to me by my first name.

Rachel's name also violates the policy though in a different way. She has chosen to delete her G+ account altogether, because she doesn't want a terms-of-service violation to affect her usage of other more important Google services.

(8 comments | Leave a comment)

Tuesday 9th August 2011

New version of nsdiff

A few people have shown interest in nsdiff which is more than I expected for a quick hack. In fact, Terry Burton at the University of Leicester is planning to put it into production! He also reported a bug (it got TTL changes wrong) and said they needed TSIG support so they can control which view a zone is transferred from. So I have fixed those problems and even added documentation! Maybe I should give it a home page...

(Previously, previously.)

(Leave a comment)

Friday 8th July 2011

unifdef and getline()

I get email from Philip Paeps, FreeBSD hacker at large:

I'm spending some quality time with Linux kernels and buildroot this week. Fun fun. Not.

Every time I get thrown in this space, one of the first things I have to do, is fix unifdef.c to spell 'getline()' differently so as not to clash with the libc version. (I've got a patch in my $HOME/patches that I apply time and time again -- trivial but tedious).

So I've been wondering: is there a reason unifdef.c spells getline() like that? "If you can't make the tool compile, you shouldn't be using it?" or is it just sadism? :-) Inquisitive minds...

Remind me to continue to buy you beers for unifdef. :)

Well, I started working on unifdef on 2002-04-25 and one of the earliest changes was this one:

  commit ae32111731291ff29a80c51c8405fe3a6e886e78
  Author: Tony Finch <dot@dotat.at>
  Date:   2002-04-27 17:23:47 +0000

    spell getlin() with an e

The commit message is a reference to the footnote on page 204 of Kernighan and Pike:

Ken Thompson was once asked what he would do differently if he were redesigning the UNIX system. His reply: "I'd spell creat with an e."

So that's where it comes from and at the time this spelling fix was not a problem. I didn't expect glibc and POSIX to steal the name from me! Compare this list of functions beginning with "g" in POSIX 2004 with with the corresponding one from POSIX 2008. WTF! You can't add a function with such a simple name to stdio.h!

So I fixed it in my repository nearly two years ago. I was perhaps a bit slow to do so - I had not been on top of unifdef maintenance for a while.

  commit c018c45e1e9372c428028dc333142467678c41ae
  Author: Tony Finch <dot@dotat.at>
  Date:   2009-11-24 11:58:41 +0000

    Rename getline() to parseline() to avoid clashing with a glibc function.

FreeBSD got the fix the following day:

  r199813 | fanf | 2009-11-25 20:23:18 +0000 (Wed, 25 Nov 2009) | 21 lines

  Update unifdef to my upstream version 1.188

The Linux copy was fixed earlier the same year:

  commit d15bd1067b1fcb2b7250d22bc0c7c7fea0b759f7
  Author: Justin P. Mattock <justinmattock@gmail.com>
  Date:   2009-03-07 13:31:29 +0100

    kbuild: fix C libary confusion in unifdef.c due to getline()

And before then Linux had been using unifdef happily for 2.5 years:

  commit 01f1c8799ad8b23c190d59cf1c9e28e6fed390a4
  Author: Sam Ravnborg <sam@mars.ravnborg.org>
  Date:   2006-07-23 20:39:59 +0200

    kbuild: add unifdef

I do not understand why incompatible versions of unifdef have persisted for so long. Part of it seems to be old kernel branches that are still used but not well maintained, but it is rather surprising that this patch hasn't been backported if those old branches are still in use. But then I don't understand the Linux kernel branching & maintenance model. I wonder if I can prod someone to improve the situation.

(2 comments | Leave a comment)

Thursday 9th June 2011

IPv6 day stats

Here are some numbers from yesterday's activity on the University of Cambridge's central mail relays. We advertised AAAA records for our services between about 08:20 and 20:30 local time (+01:00). We did not make any changes to the A records which point at IPv4-only servers; the added AAAA records pointed at a couple of dual-stack servers.

Incoming mail: mx.cam.ac.uk

  • 223651 messages over IPv4
  • 2204 messages over IPv6
  • 1.0% ipv6
A lot of nerdy senders. Prominent on the list were NANOG, the IETF, FreeBSD, Debian, Haskell, UKNOF, Exim, Lua, ISC, Tor. The stats are somewhat distorted by my own personal mailing list subscriptions! Apart from that we got mail over IPv6 from several universities: Imperial, Vienna, Reading, TU-Berlin, Southampton (ECS), Valencia, Leicester, Oslo, Malta. Notable service providers include Retrosnub, chiark, Mythic Beasts, Fastmail, Andrews & Arnold, Bytemark.

Outgoing smart host: ppsw.cam.ac.uk

  • 122384 messages over IPv4
  • 3915 messages over IPv6
  • 3.1% IPv6
This traffic reflects which parts of the University have IPv6 connectivity. Almost all came from the SRCF (which handles a lot of mail) and the Institute of Astronomy.

Message submission service: smtp.hermes.cam.ac.uk

  • 90015 messages over IPv4
  • 226 messages over IPv6
  • of which 185 were over 6to4
  • and 6 from outside the University
  • 0.25% IPv6
  • 0.20% 6to4
Again almost all this traffic is internal to the University so it reflects the proportion of IPv6 connectivity on our edge networks. A lot of these have poor layer 2 security, in particular little protection against rogue router advertisements.

Outgoing mail from the dual-stack servers

  • 6103 outgoing deliveries total
  • 5907 over IPv4
  • 196 over IPv6
  • 3.2% IPv6
  • 1759 to internal University destinations
  • 117 over IPv6
  • 6.7% IPv6
  • 4344 messages delivered to external destinations
  • 79 over IPv6
  • 1.8% IPv6
All this mail arrived over IPv6. You can see the effect of the SRCF and Astronomy again.

Mail reader access webmail/imap/pop.hermes.cam.ac.uk

The rightmost columns below are v6 logins from outside the University.

v4    v6 % 6to4 % ext %
webmaillogins 72149 525 0.7% 194 37% 141 27%
clients 15696 135 0.9% 83 61% 18 13%
imapslogins 361956 752 0.2% 407 54% 144 19%
clients 10380 101 1.0% 56 55% 20 20%
imap-tlslogins 117109 295 0.25% 27 9% 76 26%
clients 4312 41 0.9% 10 24% 10 24%
popslogins 162738 128 0.1% 41 32% 55 43%
clients 2629 6 0.2% 4 67% 1 17%
pop-tlslogins 27741 0
clients 453 0
(1 comment | Leave a comment)

Wednesday 8th June 2011

In which IPv6 day turns out to be unexpectedly exciting

As part of IPv6 Day we have enabled IPv6 on the important parts of our mail service: mx.cam (incoming mail), ppsw.cam (outgoing), and smtp/imap/pop/webmail.hermes. Most of the University is IPv4-only, apart from the Computing Service's staff network, the Institute of Astronomy, the CUSU/SRCF network, and a smattering elsewhere. Today has been mostly unexciting for us, which is cheerful. Mostly.

This afternoon I got an alarming email and phone call from a departmental sysadmin whose mail was all of a sudden being rejected by our outgoing relay. (This turned out to be one of the very few occasions where my special "accept all mail to postmaster regardless of ACLs" rule actually got used by someone to work around a problem!)

Looking at the logs I immediately saw that the rejected mail was arriving over IPv6 from a 6to4 address; our ACLs do not treat 6to4 addresses as being inside the University, so the mail was being rejected.

The quickest way to fix mail delivery was to add disable_ipv6 to the Exim configuration on the sender.

The breakage suddenly started happening at 15:24, which immediately made me suspect a bogus IPv6 router had appeared on the server's LAN at that time. Malc has a good description of how Windows Internet Connection Sharing breaks connectivity by issuing rogue router advertisements.

It is possible to identify the machine responsible for this kind of braindamage. The unwanted 6to4 address was of the form 2002:836f:... where 2002::/16 is the 6to4 prefix and the two following hexadectets are the IPv4 address of the tunnel endpoint. 836f:xxxx is hex for 131.111.ddd.ddd.

I trawled my logs for other 6to4 lossage and found a couple of colleges that had a few messages from their web servers rejected. One of them had at least four machines issuing rogue RAs.

It is going to be, um, interesting dealing with this in the future...

(6 comments | Leave a comment)

Tuesday 31st May 2011

The state of DNSSEC deployment

DNSSEC comprises a couple of fairly independent sets of protocol extensions:

  • Transaction authentication
  • Data authentication

Transaction authentication has a few varieties: TSIG, GSS-TSIG, and SIG(0). It is mainly used to authenticate zone transfers between master and slave authoritative servers, and to authenticate DNS UPDATE messages (in Microsoft Active Directory domains, for instance). It has been widely deployed for years now. It's fairly unglamorous because it doesn't require a massive worldwide campaign to get it deployed: instead a network of clients and servers can deploy it and benefit from improved security without liaising with anyone else.

Data authentication is where all the noise is at the moment. Unlike TSIG, this is a change to the DNS data model to add fine-grained cryptographic signatures to the data in the DNS, so that users can verify that it is authentic. The deployment campaign is well under way: many of the largest TLDs are signed and accepting secure delegations. There are still barriers to deployment: many registrars are not yet ready to handle secure delegations, and many DNS hosting services don't support DNSSEC. You can work around the former to some extent using the ISC's DLV service. You can work around the latter by doing it yourself, but the tools are still a bit rough so you have to enjoy being on the bleeding edge.

The counterpart to signing the zones is deploying validating resolvers that actually check the signatures. It's harder to tell how widespread this is yet. It's fairly clear that the .gov requirement to deploy DNSSEC was interpreted as a requirement to sign the zones, but not to deploy validation. Hence they seem to have been slow to become aware of broken signatures and to fix them. If you turn validation on, you have to be willing to deal with a little more operational pain than usual for the DNS. While it is common for smaller domains to screw up their DNS, DNSSEC has new exciting failure modes that have led to occasional breakage in more important domains. But in practice for us it doesn't seem to have noticeably increased the support load - but we don't deal with .gov much :-)

What's next? There are a couple of prongs developing in parallel:

  • Applications built on DNSSEC
  • End-user security

Once you can depend on the authenticity / validity of data obtained from the DNS, you can use it to bootstrap trust in higher-level protocols. For example, you can use SSHFP records to avoid manual host key validation or leap-of-faith authentication. Or you can use TLSA records to mitigate the weaknesses of the X.509 certificate authority system.

But in order for these applications to be secure, you have to push the secure area all the way out to the end user. Here the roadmap is a bit murky because some important parts are missing.

The traditional Unix stub resolver is based on the assumption that you can just do a simple request/response with your recursive resolver which does all the hard work for you. The recursive server is usually remote, and the stub resolver usually doesn't have DNS hardening features (port randomization, query ID randomization), and the link between the two is the most likely to be attacked. Given that, a validating remote resolver is not actually much use - it adds failure modes without meaningfully improving security.

A fix that is sometimes suggested is to deploy some kind of channel security between stubs and resolvers (something like TSIG). This isn't attractive for a couple of reasons. It probably requires new protocol development to specify the key management, so it'll be a long time before it is usable. Also the stubs continue to trust their resolvers, which is not desirable for mobile devices roaming on untrustworthy networks.

Really, what you want to do is run a validating cache on your local machine, which the stub resolvers can talk to over a private socket. A shared validator keeps the stub resolvers simple, and a cache avoids repeated fetches of the same signatures and keys.

BIND 9 can in fact be run in exactly this mode, called a "lightweight resolver daemon". I think this means "lightweight configuration" and "lightweight protocol" but not "lightweight implementation" since lwresd is in fact just an alias for the full BIND named. It was originally intended to offload the complexity of dealing with IPv6 bit-labels and A6 records, but it seems to have been neglected since they were obsoleted by the simpler AAAA records. It could really do with being resurrected for DNSSEC.

There are a few improvements that could be made to lwresd: it should reconfigure itself automatically when /etc/resolv.conf is updated; it should have a command-line option to turn on validation using the built-in trust anchors; it should try bypassing the forwarding server if it doesn't support DNSSEC; and (more speculatively) the lightweight resolver protocol could be extended to provide better DNSSEC error reporting - the DNS itself just gives an uninformative "server failure" reply if validation fails (or for any of several other errors).

The advantage of this setup is that it doesn't change the relationship between the end host and the network: you continue to use the resolvers you get from DHCP (lwresd forwards queries to them) so caches remain effective and strict firewalls don't break things (as they would if you ran a full resolver).

(7 comments | Leave a comment)

Wednesday 25th May 2011

Why I am not a fan of the locator / identifier split

A scalable system's size and complexity is not constrained by the capacity of its components.

By this definition the Internet is not scalable, since every router must have a large enough routing table to handle every network address prefix. Since the Internet has scaled impressively over the last 20 years, this might seem to be a theoretical problem, but I think it is a very practical problem and we have largely become inured to its consequences.

Consider your laptop or smartphone. It has three or four network interfaces - wifi, Ethernet, cellular, Bluetooth - but it can't roam from one uplink interface to another without resetting its network stack, nor balance the network load across interfaces, and its ability to provide connectivity to other devices is restricted. You can make these features work with special hackery (NAT, mobile IP, HIP) but they aren't supported by the architecture in the way that first-class multihoming is. The Internet cannot cope with a billion small and highly-mobile networks.

The cause of the scalability problem is the addressing scheme. Each packet contains a globally unique destination address that indicates where the packet should be delivered, without any information about how to get there. Therefore every router must maintain a map of the entire address space so it knows where to forward each packet. Routing tables must be very fast as well as very large - the packet arrival interval in 10Gbit Ethernet is as little as 75ns - so backbone routers require expensive custom silicon. Maintaining the routing tables requires all significant connectivity changes to be communicated to all routers over BGP, so communication overheads scale badly as well as routing table sizes.

The current strategy for controlling this problem relies on aggregation: the Internet registries try to allocate numerically adjacent address blocks to topologically adjacent networks, and hope that distant routers can use a single large routing table entry to span several blocks. But the need for multihoming and traffic engineering encourages de-aggregation despite the external costs this imposes. The regional Internet registries try to use their allocation policies to minimise the harm, but this has the side effect of slowing the growth of the Internet.

All network engineers are aware of this problem, and there has been a lot of work on developing better routing schemes. At the academic end there have been many "Future Internet" research programmes, which funded a lot of projects that have often ignored any need for backwards compatibility with interesting results. At the more practical end has been the work in the IETF and IRTF, especially the Routing Research Group. RFC 6115 has an overview of their work.

The most notable proposals are HIP and LISP, which both follow the locator / identifier split plan. The idea is to add a topology-independent overlay which endpoints use to identify each other. For example, identifiers are used instead of IP addresses in TCP connection tuples. Multi-homing etc. are handled by a mapping layer that translates between identifiers and locators. The locators can then be allocated much more strictly along topological lines than is currently reasonable. HIP requires upgraded endpoints with little support from the network, whereas LISP supports existing endpoints with an upgraded network. HIP puts an extra end-to-end header in packets whereas LISP is closer to the 8+8 scheme of GSE and ILNP.

I am not convinced that the ID/loc split will be a great success. It keeps the same unscalable globally unique address scheme, but duplicated, with a complicated mapping layer in between. It may, at great expense, make widespread site multihoming feasible, but I doubt that will ever be a feature of consumer connectivity let alone edge device connections.

I believe a properly scalable replacement for the Internet should be cheaper and have more features than the current Internet. However it looks like it takes at least 20 years to go from prototype system to a plausible replacement for the old network - see for example the replacement of traditional telephony with VOIP, and IPv4 with IPv6. So we are unlikely in the foreseeable future to see any anything come and sweep away the accumulating ugly complication of the Internet and replace it with an elegant clean slate.

(6 comments | Leave a comment)

Monday 23rd May 2011

IPv6 day

My previous post was incredibly boring and probably quite baffling without any context.

I have recently been preparing my mail servers for IPv6 day. My office mates have been doing similar prep work on their services. Cambridge University's network and DNS have supported IPv6 for a number of years now, and many of the desktops on the computing service staff network use it. Outside the CS the Institute of Astronomy has deployed it extensively, as have the student union and the student-run computing facility, and there is a smattering of it in the Computer Lab and Churchill College. Not much out of a few hundred institutions.

IPv6 day has provided a good push to get us to move our deployment further along. There is not actually much pressure to deploy it here: we have over two hundred thousand public IP addresses (though nearly half of them are dedicated to the Engineering department and the Computer Lab for more-or-less valid historical reasons) and we make extensive use of 172.16.0.0/12 private IP addresses. That should probably be enough for 55,000 people. So progress has been slow.

Our network manager has been looking more closely at rolling out IPv6 to more institutions in the University, and this required coming up with a more detailed addressing plan. It soon became clear that our current 2001:630:200::/48 allocation is uncomfortably small. We have too many institutions to sub-allocate on the /56 boundary, and many of them are too large to fit into a /60 range (only 16 subnets). The colleges have a lot student rooms, and for ease of management it may (eventually) make sense to give them individual /64 subnets, which will easily eat up 2^14 subnets. The federal nature of the University leads to a fair amount of fragmentation, especially since we prefer to allocate on multiple-of-four boundaries to make delegating reverse DNS easier.

After a lot of arguing with JANET and RIPE over their various bureaucratic allocation policies, we have (at last) got the OK for a new /44 allocation. [Since the IPv6 address space is practically infinite, it is obviously important to make sure that we use it efficiently or something. Just like A&A who allocate a /48 to each home user. Obviously there is enough IPv6 space for a one-size-fits-all allocation policy.] As you can see from the list in my previous post, JANET have been allocating /48 ranges to their subscriber institutions, mostly without space between them. This means we can't grow our current allocation because we share our parent /44 with eight other institutions. So we have the joy of a renumbering ahead of us. Good thing it will be small... You can also see from the list that we will be the second UK university to get a /44 after Oxford.

(16 comments | Leave a comment)

JANET IPv6 address allocations

A script to generate the list ) The current list of JANET IPv6 allocations )
(Leave a comment)

Wednesday 11th May 2011

DNS and system configuration vs. application data.

There's a rough distinction to be made between system configuration data and application managed data. The former is altered by the development / operations staff, and the latter is altered by application users. The former is typically maintained in a revision control repository and deployed to the live systems using configuration management tools such as rdist or puppet; the latter is maintained on the live systems by the application from which it is backed up. The former tends to be more ad hoc and the latter more standardized.

So for email the application data includes messages in flight (SMTP/LMTP) or in store (IMAP/POP), users' sieve filters (MANAGESIEVE), and the shared address book (LDAP). [I mention the protocols to emphasize the standardization.] System configuration data includes things like which domains are hosted where, system aliases, central filtering and access control rules, and suchlike. The distinction is rough because in an environment that is more ISPish than corporate, big chunks of system configuration get delegated to customers and sold as "virtual mail domains", with their own custom management applications in place of sysadmin tools, and SQL or LDAP database storage.

The DNS is more like a system configuration component than a user-facing application. Even so, it's a bit surprising that BIND has remained so firmly based on traditional flat files (at least, superficially). But in fact for a corporate environment (with one frequently-updated main zone and not a lot of churn in which zones are being served) the DNS has some really nice application data management features. In fact it had loads of NoSQL buzzwords before NoSQL was cool. It's a distributed replicated in-memory database with eventual consistency. The combination of UPDATE, NOTIFY, and IXFR mean that the lag between a user requesting a modification and it becoming visible at the slaves can be tiny. UPDATE itself has very nice transactional semantics which allow you to implement read-modify-write operations with optimistic concurrency. Swish.

It isn't actually too painful to bridge the gap between static zone files in revision control and live zones managed by BIND. Here's a little tool called nsdiff which does the job. If you are running on the master server you can just pipe the output into nsupdate -l. (This is not much harder than updating the live zone file in place and running rndc reload.) Note that if the diff is too large, nsdiff breaks it up into multiple update operations - perhaps it should have an option to fail if it can't use a single atomic transaction.

Whereas dynamic updates have been around for many years, BIND has only recently become able to incrementally add and delete zones from its configuration, as opposed to re-reading the entire thing to find out which zones it should serve. This is a bit surprising since BIND configuration updates have been a pain point for ISPs for a very long time. Other nameservers with more accessible database backends probably make provisioning rather less clunky than my little addzone-master addzone-slave delzone scripts.

It might be interesting to see what comes of the IETF DNSOP work on standardized name server management. So far there is just a requirements document but sadly not much sign of follow-up work.

(Leave a comment)

Wednesday 4th May 2011

DNSSEC with BIND 9.8.0

The latest version of BIND makes DNSSEC validation very easy to set up. Just put the following lines into the "options" section of your named.conf:

  dnssec-validation auto;
  dnssec-lookaside auto;

When you upgrade from an older version of BIND you need to delete the managed-keys.bind pseudo-zone - BIND will only add its built-in root and DLV trust anchors when it first creates the file.

That's it! Easy! Do it!

Publishing signed zones is getting easier too. If you are an old-skool DNS admin who is a dab hand at editing flat text master files, then the main thing that takes some getting used to is wrangling dynamic DNS instead. Signed zones need to be dynamic so that BIND can refresh the RRSIG records periodically, so you might as well use nsupdate to make changes too, and enjoy the shiny future.

To sign a zone, cd to named's working directory where you will create a set of keys for the zone. (You can tell BIND to look for keys elsewhere using a key-directory statement in each zone block or set it globally in the options section.) Then run these commands:

  dnssec-keygen -f KSK $zone
  dnssec-keygen $zone

This creates two key pairs with the default settings: a key signing key pair and a zone signing key pair. Ensure they are readable by the BIND user.

Then create an initial zone file. It has to have at least a SOA and an NS record. I start off with a copy of my local empty zone and change the SOA and NS later.

  $TTL 1h
  @ SOA localhost. root.localhost. 1 1h 1000 1w 1h
    NS  localhost.

Then add a zone statement to named.conf.

  zone "$zone" {
    type master;
    file "$zone";
    update-policy local;
    auto-dnssec maintain;
  };

The update-policy statement lets you run nsupdate -l on the same machine as the nameserver to make changes to the zone. The auto-dnssec statement tells named to handle re-signing automatically. (It will also handle key rollovers if you pre-generate keys and set their lifetimes.)

Then run rndc reconfig and you are all set!

A couple of other non-default settings are possibly worth noting. There is a new feature which makes adding and deleting zones marginally neater. If you put allow-new-zones yes in your options section then you can use the rndc addzone and delzone commands instead of editing named.conf. When adding a zone you still have to create the keys and zone file first. The other tweak is to set dnssec-dnskey-kskonly yes which reduces the size of the zone apex RRSIG RRset (which should probably be the default).

(6 comments | Leave a comment)

Tuesday 12th April 2011

Unicode and binary data

There is a particularly irritating requirement in the Unicode standard that a UTF-8 parser must either abort or corrupt its input if it encounters ill-formed UTF-8. By "corrupt" I mean that ill-formed subsequences all get converted to the U+FFFD replacement character, information about the values of the bytes in the ill-formed sequence is lost.

This is a problem for programs that want to treat data as UTF-8 but which cannot be sure that the data is actually conformant. For example, how do you delete a file or an entry in a database if its name is ill-formed and the API rejects ill-formed names?

UTF-16 has several advantages over UTF-8 in this area. The only syntax error in UTF-16 is the appearance of a surrogate that isn't part of a pair. If a parser ignores this error it is natural to convert the surrogate into the corresponding invalid UCS-4 character value between 0xD800 and 0xDFFF. The resulting (invalid) string can be written back out without loss of information as UTF-16. It can also be written out as UTF-8 and read back in (if you have a relaxed UTF-8 parser) without loss of information.

UTF-8 has more error conditions: as well as surrogates, there are over-long sequences, out-of-place continuation bytes, and some byte values that may not appear at all. In many cases there is no natural way to convert an ill-formed sequence into UCS-4. So if you want your code to be more relaxed than the standard there's no obvious way to do it, and you are unlikely to interoperate with other implementations.

Markus Kuhn proposed a way to deal with this problem which has been dubbed UTF-8b. He suggests using part of the surrogate range to represent each byte in an ill-formed sequence. However this suggestion conflicts with the use of surrogates to represent ill-formed UTF-16 - you can't tell whether to write out (for example) 0xDCBA as the single byte 0xBA (preserving ill-formed UTF-8) or as the sequence 0xED 0xB2 0xBA (preserving ill-formed UTF-16).

In a pure UTF-8 world it might make sense to represent ill-formed sequences using character values outside the Unicode range. For instance, all bytes in invalid sequences have their top bits set, so you could just sign-extend them to produce a negative invalid UCS-4 character value. But there are situations where you need to round-trip via UTF-16 and UTF-16 cannot represent negative character values.

So what we really need is a set of 128 code points allocated to represent raw byte values from ill-formed UTF-8 sequences, along the lines of UTF-16 surrogates, and to be used instead of the U+FFFD replacement character. This would allow a parser to read in UTF-8 without losing data, and to leave error handling to higher layers that can make more informed decisions. One odd wrinkle is that a UTF-8 parser that allows graceful round-trip handling of binary data should not treat surrogates as parse errors, so that it can preserve ill-formed UTF-16 data, but it must treat encoded raw byte values as parse errors to avoid ambiguity. A sensible place to allocate the raw byte value code points is in plane 14 which is reserved for special use, above the area reserved for ignorable characters, i.e. U+E1000 .. U+E10FF.

I propose the name UTF-8-relaxed or UTF-8r for this scheme.

(2 comments | Leave a comment)

Friday 18th February 2011

Programming languages in configuration files

Justin Mason posted an interesting article against the use of programming languages in configuration files. I think he is mostly right but his arguments could be improved.

His first argument is provability, that is, ensuring that a configuration does what it is supposed to do. The problem here is that it is very easy to make a language which is unexpectedly Turing-equivalent. In particular, as soon as you have a combination of regex-based rewriting and iteration, you have a Turing machine. There have been practical demonstrations of this fact for Sendmail and Exim but it must also be true for Postfix (since it has regex email redirection and redirection iterates) and Apache (via mod_rewrite of course).

But my counter-argument is mostly irrelevant, since in practice these kinds of rewrite engines have very strictly limited iteration counts - 50 or 100 is typical. And in practice you know when you are getting on to thin ice when regular expressions start to proliferate.

Justin's second argument is security. His example is SpamAssassin, where it is easy to accidentally get exponentially slow pattern matching time out of a poorly-written regex. Exim provides some much better examples, since its string expansion operators can do arbitrarily complicated things (such as running a shell command) and, even worse, it is not obvious how privileged each string expansion is from its context in the configuration.

His final argument is usability. I think this really is the crux. Provability - being able to understand a configuration by inspection - is really a special case of usability. And security is a special case of provability.

The usability problem is a consequence of a program's growth. As it gains features, there is an enormous pressure for the program's configuration language to get more complicated, and to expose those features to sysadmins without requiring them to learn how to write code that plugs in to an API. But if the program's developers indulge these demands they end up growing what started out as a simple configuration language into a fully-fledged programming language - except that it has really bizarre syntax and poorly-defined semantics.

However, before your program gets bloaty, usability leads you into this trap, since it is the reason you don't want to use a programming language for configuration. Imagine trying to configure your favourite complicated over-featured daemon using your most lightweight favourite programming language. The result is almost certainly unsatisfactory - except perhaps if you are a Lisp or Smalltalk hacker, or (as Justin suggests) if you have drunk the Ruby-without-brackets kool-aid. So programmers start with something ad-hoc rather than using an embeddable programming language.

Except if they know Tcl. I think Tcl is massively under-rated and should be used a lot more than it is. Its syntax is so unobtrusive that it is the perfect substrate for a configuration language. And, because it comes with ready-made semantics and proper programming constructs, it can cope when your configuration requirements become more demanding. And the techniques for sandboxing and subsetting it are well known, so it can be secure.

There are lots of other embeddable programming languages. I'm a huge fan of Lua, and Guile (the GNU embeddable Scheme interpreter) has been around for ages, but both of them have a lot of syntax which will intrude into simple configuration tasks. The other dynamic languages are much worse, with their bloat, horrible embedding APIs, and intrusive syntaxes.

I think Tcl suffers because people treat it as a programming language, and it is not a very good one. It is the most "stringly typed" language there is, so it is prone to type safety bugs that don't exist in other dynamic languages. Its variable and data structure semantics are downright odd.

But if you treat it as a configuration framework that can grow with your users' demands, it rules. Note that it provides a framework in three ways: as a library for your program to use for reading its configuration; as a set of design patterns to use for your program's configuration language; and as a way to plug in extra functionality (since Tcl can load modules dynamically).

I do not mean this to be a Tcl advocacy piece. One consequence of using something like Tcl or Lua as your configuration language is that the tail can end up wagging the dog. There's an enormous temptation to move functionality up into the dynamic language - this is partly why Tcl is branded as a programming language rather than a configuration library. This means you have to treat your configuration language more like an API.

Perhaps this is the way to escape from Tcl and to add a concluding argument to Justin's article. Instead of adding endless flexibility to your configuration language, grow a proper plugin API instead. Keep your configuration language simple for those who want to plough the well-worn furrow; assume that advanced users can program and encourage them to do that properly. Sendmail's milter API is a successful example (despite its flaws).

(5 comments | Leave a comment)

Monday 17th January 2011

Debit several clue points from Le Crédit Lyonnais

; <<>> DiG 9.6.2-P2 <<>> +dnssec mx carte-isic.lcl.fr.
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 54984
;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 3, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags: do; udp: 4096
;; QUESTION SECTION:
;carte-isic.lcl.fr.             IN      MX

;; ANSWER SECTION:
carte-isic.lcl.fr.      407     IN      MX      5 82.118.192.159. ; *D'OH!*

;; AUTHORITY SECTION:
lcl.fr.                 407     IN      NS      ramses.credit-agricole.fr.
lcl.fr.                 407     IN      NS      ns0.creditlyonnais.fr.
lcl.fr.                 407     IN      NS      chenar.credit-agricole.fr.

;; Query time: 0 msec
;; SERVER: 127.0.0.1#53(127.0.0.1)
;; WHEN: Mon Jan 17 12:52:13 2011
;; MSG SIZE  rcvd: 167
(7 comments | Leave a comment)

Wednesday 5th January 2011

Do programming languages have terroir?

James Noble gave a talk at ECOOP 2009 called "the myths of object-orientation". When I read the paper version the following quote caught my eye:

When I was learning Smalltalk, [Brian Boutel] used to complain about “Californian” programming — no types, dynamic dispatch, a relaxed interactive programming environment — much warmer, and much less bracing, than Oregon or Glasgow that gave birth to his beloved Haskell. So I wonder if, like wine, do programming languages have terroir? What influence does the environment that nurtures a programming language, or a programming principle, or a myth, have on the result? Smalltalk is Californian, Dick Gabriel has described how Unix (and C) comes from the Bell Labs engineering culture, but what of the rest?

The talk as a whole is very much tongue-in-cheek, but the idea is intriguing even if it isn't to be taken too seriously. There's more discussion at Lambda the Ultimate.

(1 comment | Leave a comment)

Saturday 25th December 2010

55555.55555

I am informed by Tom Van Baak (the guy who helped his children measure general relativistic gravitational time dilation with a few atomic clocks) via the LEAPSECS mailing list that Christmas Day is modified Julian day number 55555, so at 13:20 the epoch will be MJD 55555.55555.
(Leave a comment)

Tuesday 7th December 2010

Amazon "Route 53" authoritative DNS service

What a disappointing offering.

To be fair, they are providing a cheap, high volume, globally distributed, authoritative DNS service. I expect they will do this as superbly as their other cloud hosting services.

But they are using obsolete unmaintained software for the DNS servers. This has two clearly bad consequences:

Firstly, they don't support DNSSEC.

Secondly, they have invented their own API for updating the DNS instead of using the standard dynamic update protocol.

ETA: Thirdly, they don't support AXFR, let alone NOTIFY and IXFR, which means that Route 53 cannot be used as a secondary DNS service for a primary you run, nor can you set up a local slave of your Route 53 zones.

I admit that they would have to have their own API for provisioning zones, since there is not yet a standard way to do that - and in fact BIND is only just getting it's own API for dynamic zone provisioning. But I doubt I'll see Amazon staff on the IETF lists working to help standardize a provisioning protocol if they don't even support DNSSEC or standard dynamic updates.
(19 comments | Leave a comment)

Monday 22nd November 2010

Abuse of mailing lists

On the 2nd November, one of our users had the bright idea of promoting an event by CC:ing their ad to a very large number of our internal mailing lists, almost all of which were thoroughly inappropriate for the subject of the message. This is, of course, not allowed, and in such cases the perpetrator is usually invited to visit the office of my colleague who specializes in abuse (as it were) and/or their college authorities are asked to deal with it as a disciplinary matter.

Happily this kind of abuse is very rare - it's the first time it has happened to us for about ten years. However, just like last time, the first event has led to a very annoying series of copy-cat offenses, as various other thoughtless idiots have used "reply to all" to distribute their own off-topic bumf. I'm lucky that I don't have to tell them off myself, but I understand that their usual reaction is to whine "I thought it was OK because someone else did it".

We're trying various things to put a stop to this. Just getting the senders deaned isn't enough.

I added a simple clause to the Exim configuration on our mailing list system to freeze any message with too many recipients. This turned out not to be enough: one of the miscreants is a GMail user, and GMail splits a message so that there are at most 20 recipients per copy (even though the standard says 100 should work fine).

  warn
    condition = ${if >{$recipients_count}{25} }
    control   = freeze

So I added a second clause that freezes messages that have too many addresses in the To: and CC: headers. This still has the weakness of missing BCC:ed messages. (Edited: I came up with a more cunning regex which only counts @lists addresses, since counting all To: and CC: addresses led to far too many false positives.)

  warn
    condition = ${if >{${strlen:${sg {$h_To:,$h_CC:} \
                                     {(?s)(?!@lists[.]cam[.]ac[.]uk).} {} \
                      }}} {25} }
    control   = freeze

Freezing the messages is a fairly benign way of dealing with awkward edge cases, since we can inspect the frozen messages and either let them pass if they are OK, or bounce them or delete them if not. Messages that are frozen incorrectly just get delayed for a little while. We have used this technique for a while to detect spam from compromised accounts, so I already had the necessary support scripts.

I have also updated the message of the day on the Hermes webmail front page, to get the word out that this kind of idiocy is not tolerated. Hopefully we will not have to escalate to public floggings.

(8 comments | Leave a comment)

Tuesday 16th November 2010

Interior iteration with less period pain

In a previous post I said a little bit about the field structure of the exterior of the Mandelbrot set.

I mentioned that field lines with rational angles (rational multiples of 2π radians) correspond to pinch points on the Mandelbrot set. The relationship is in fact much deeper than that. The angles of the field lines at a pinch point also tell you something about the interior components of the set on either side of the pinch.

As well as proving that the Mandelbrot set is "simply connected", Douady and Hubbard made the "MLC" conjecture that the Mandelbrot set is also "locally connected". This would mean that the Mandelbrot set can be formed by just pinching a disk and that it has no infinitely thin strings joining its components.

Points inside the Mandelbrot set either converge to a fixed point, or to a periodic cycle, or they iterate chaotically. Disk-like and cardioid-like components of the set whose points all converge to periodic cycles are called "hyperbolic components". The MLC conjecture implies that the set is made entirely of hyperbolic components, and therefore that points with chaotic evolutions are on the edge of the set.

Wikipedia has a nice picture of the main hyperbolic components of the Mandelbrot set labelled with their periodicities. Note that a bud's periodicity is a multiple of its parent bud's. You can also see that the periodicity of a bud off the main cardioid matches the number of antennae it has (counting the bud as well). The same periodicity also turns up in the angles of the field lines that land at the pinch point that joins the bulb to the cardioid.

Take for example the period-3 components. Their pinches have field lines whose angles are multiples of 1/7, i.e. 1 / (23 - 1). When represented as binary fractions, the digits after the binary point repeat with period 3. 1/7 = 0.001001...; 2/7 = 0.010010...; 5/7 = 0.101101...; 6/7 = 0.110110...

Field lines corresponding to an irrational angles land at points on the Mandelbrot set with chaotic evolutions - their binary expansions do not repeat.

You can watch John Hubbard himself giving a brilliant 16 minute lecture about all this, which is much better than my excessively terse witterings. (He starts with 3.5 minutes on the Julia set for z := z2 - 1 before moving on to the Mandelbrot set.)

There is a practical consequence to all this. When making pictures of the Mandelbrot set, if you can detect that a point has reached a cycle then you can stop iterating it early. You do not have to go all the way to your maximum iteration count so you can save time working on points in the interior of the set. Also, MLC suggests that whenever you have a significant interior area in your image, it will (within practical limits) be susceptible to cycle detection. (You can do more clever things with cycle detection like interior distance estimation, but I'm not worrying about that right now...)

Cycle detection can be pretty cheap - cheap enough to more than pay for itself in many Mandelbrot renderings. For example this image shows (in orange) which points were found to converge to cycles, and this saved about a third of the rendering time.

The way to implement it is to iterate two copies of z, a "tortoise" and a "hare", and keep two iteration counters. Start off with the tortoise's counter initialized to one and the hare's counter initialized to zero. Increment the hare's counter each time you iterate it, as usual. When the hare's iteration counter passes the tortoise's counter, iterate the tortoise once, and multiply the tortoise's counter by a number slightly greater than one. (I found a multiplier of 1.1 worked OK. Larger multipliers mean less work is done on the tortoise, but it requires more hare iterations to detect cycles.) Each time you iterate the tortoise check if it is the same as the hare, and if so you can immediately say the point is in the Mandelbrot set. (I cheated a bit and checked if the distance between tortoise and hare was less than 2-52. A larger epsilon detects cycles earlier but may have false positives.)

I seem to remember one of my schoolmates baked knowledge of the Mandelbrot's primary cardioid into his program. Cycle detection is much a more elegant and comprehensive way of saving time :-)

(4 comments | Leave a comment)

Saturday 13th November 2010

Iteration Intuition

After writing my previous entry, I noticed that the final equation is in fact just the negated log of the Douady-Hubbard potential (assuming n is practically infinite). (In the following I will use the binary logarithm throughout, lg x = log2 x.)

I didn't really understand this so I sought a more intuitive explanation.

Observe that when z is large, fc(z) is very close to f0(z). So in the following we can treat the crinkly component c as insignificant.

So the double log gives us a sort of measure of distance in terms of the number of iterations needed to get from z to zn.

Recall that

Hence the equations in z also apply to r = |z|,

When z is close to the set, r is slightly greater than one (plus crinkles), so lg r is a small positive number, and lg2 r is a large negative number. Hence the negation to turn it into the equivalent of an iteration count.

The equation above also relates back to the definition of the electrostatic potential and our simplifying approximation, that is,

Thinking about what happens when we finish iterating,

So lg2 rn lies between two numbers with a difference of one,

When lg2rn is subtracted from n, it contributes a fractional part and a fixed offset determined by the bail-out radius.

Finally, on a practical note I find that a better function for colouring pixels is to take the log again:

The extra log gives the value a more uniform gradient so it cycles through the colours evenly, and it gets much closer to the set before it flies off to infinity so less of the picture suffers from aliasing. Even better, it retains these properties as you zoom in.

(Leave a comment)

Wednesday 10th November 2010

Smooth colouring is the key to the Mandelbrot set

At Cambridge Geek Night 6 on Monday I gave a brief talk about what has been occupying some of my spare time in recent weeks. Since Benoît Mandelbrot died last month I have been playing around with making pretty pictures of Mandelbrot sets, culminating (so far) in a movie of a deep zoom targeted near z = -0.15 + 1.04i. The software and materials for my talk are at http://dotat.at/prog/mandelbrot/.

One of the nicest renderings I did when I was a teenager was a 300dpi laser-printed binary decomposition of the level sets of the exterior of the Mandelbrot set. It took a whole afternoon to generate using BBC BASIC on my Archimedes, which was the best tool for numeric code I had available. I wanted to be able to produce smoothly coloured renderings like the ones in the fractal art books by Heinz-Otto Peitgen and his collaborators, but a few things prevented me. Firstly the lack of a computer that could display true colour images :-) and secondly the difficulty of the mathematics. If I remember correctly, I just about grasped enough of the distance estimation method to understand that it requires significantly more work per iteration, which also made it unattractive on the slow computers I had to hand.

Revisiting the Mandelbrot set now, I have the luxury of computers that are thousands of times faster and thousands of times more colourful and millions of times more accurate with their arithmetic. The Internet also provides more accessible resources on the mathematics of the Mandelbrot set. A page by Linas Vepstas was particularly helpful with the smooth colouring problem. Even better, it turns out that his simple recipe arises from the mathematics that gives us our deepest understanding of the Mandelbrot set.

Here's how... )
(12 comments | Leave a comment)

Thursday 14th October 2010

Missing text in messages from Outlook

A user just reported a problem to us involving messages they received which appeared to have text missing. The sender's signature and the quote of the message that was being replied to appeared OK, but no additional text from the sender could be seen. The recipient was using Thunderbird. The messages were multipart/alternative messages sent by Outlook, with text/plain and text/html parts.

The HTML parts did in fact include the expected text, except that it was wrapped in an IE conditional comment, like the following, where the ... was the substantive content.

  <!--[if !mso]> ... <![endif]-->

Note that this is a comment so non-Microsoft HTML processors will not display its contents, so the reply appears to be missing!

Putting IE conditional comments in email is bad enough, but using them to hide the important part of the message is simply braindamaged. The bit that takes the cake is that Microsoft's own HTML to plain text converter also ignores IE conditional comments, so the text/plain alternative also omits the sender's reply!

Does anyone have a link to any Microsoft documentation about this problem? Or any other good descriptions, for that matter.

(3 comments | Leave a comment)

Sunday 3rd October 2010

Twitter echo chamber fail

"Lots of people are WRONG on the Internet!"

Twitter users have a propensity to retweet, echo and rebroadcast interesting facts and links without very much fact checking. For example,

@_468 This October has 5 Fridays, 5 Saturdays and 5 Sundays all in one month. It happens only once in 823 years. waw.
(700+ retweets according to search.twitter.com)
@bryanthatcher This month has 5 Fridays, 5 Saturdays and 5 Sundays. It happens only once in 823 years.
(300+ retweets according to search.twitter.com)

Those retweet counts do not include the vast numbers of copies that didn't use twitter's retweet mechanism.

If you think about it, this happens whenever the month has 31 days and the first day of the month is a Friday. Because the years cycle through the days of the weeks fairly evenly, you would expect this to happen for a particular month about 1/7th of the time - in fact if you average over the entire Gregorian 400 year cycle it happens 1 in 7.1429 Octobers. Each year has 7 months with 31 days, and if you average over all of them you expect about one month in each year to have this property, and in fact you get precisely this result if you average over the entire Gregorian cycle. There happen to have been two months this year with 5 Fridays, Saturdays, and Sundays because January also started with a Friday.

1 in 7 and 1 in 1 are both rather more frequent than 1 in 823.

(33 comments | Leave a comment)

Thursday 23rd September 2010

The University and College Union are spammers

The Universities Superannuation Scheme is currently in the process of changing its terms and conditions from a final salary pension scheme to a career average scheme. Unsurprisingly the UCU are campaigning against this.

The last two nights they have been spamming university staff with a campaign mailshot. (They tried to send us 3450 messages last night and 650 the previous night.) The mailing list they are using is VERY dodgy. At our site more than 9% of the addresses are invalid, and 1% of the addresses are role addresses that should not be receiving the kind of confidential personal information that a union might send to a member.

(15 comments | Leave a comment)

Tuesday 24th August 2010

Timezone display by MUAs

One of my colleagues is currently having problems with his new Exchange server. Messages delivered internally to the server have the correct timezone (+0100) in the Date: header, but messages that are delivered externally via SMTP have a Date: header timezone of +0000. All the Received: header timestamps have the correct +0100 timezone. We haven't worked out how to fix this yet.

In the course of debugging this problem, my colleague was discombobulated to find that different MUAs display the dates on messages differently. Mulberry, Alpine, and our webmail server display the date as set by the sender (+0000 in the case of the problem messages), whereas most other MUAs (Thunderbird, Mail.app, Outlook, Gmail, Hotmail) translate the date to the recipient's local timezone.

Given the choice between these two options, I prefer to see the sender's date, since it provides useful clues about long-distance correspondents (such as when it might not be reasonable to expect a prompt reply). However I can see why others would prefer to see times in a consistent timezone (e.g. it makes it easier to see how old messages are).

But really, as I argued last year, in this kind of situation (where the sender's and recipient's timezones differ) the MUA should display the date twice using both timezones. This makes it obvious what is going on (messages from local and long-distance correspondents are shown differently) and doesn't require users to do timezone conversions in their heads.

(4 comments | Leave a comment)

Sunday 22nd August 2010

Simple shell scripting for Twitter

I have a few scripts which manage my URL log, including posting a copy of the feed to my Twitter and del.icio.us accounts. Until recently the scripts have just used wget's HTTP Basic Auth support to authenticate to my accounts. This has to change because Twitter is switching to oauth. This switch has already been delayed but is now due to occur by the end of August.

Most oauth implementations are not designed for old school languages like the Unix shell, so I procrastinated because I didn't fancy dealing with the dependency hell of mainstream scripting languages. But I perked up when I noticed Jef Poskanzer mentioning his stand-alone oauth implementation on his twitter feed. I have liked Jef's approach to writing simple code since I worked on thttpd in support of Demon's homepages service.

Posting to Twitter with basic auth didn't require anything beyond a Twitter account. You could just POST to https://twitter.com/statuses/update.json - which is the essence of the security problem that the Twitter crew want to solve.

To use oauth you must register an application. Go to http://dev.twitter.com/ -> Your apps -> Register a new app. Fill in the form. Tell it the app is a client app and give it read+write permission.

When you have done that you will be presented with your application's settings page. The interesting parts are the "consumer key" and the "consumer secret" which are the half of your app's oauth credentials which authenticate the application to Twitter. The other half of the credentials prove that your app may do something to a particular person's Twitter account. In order to obtain the second half oauth normally requires a fairly complicated dance. Happily, for simple cases where you want to script your own account, Twitter provides a shortcut.

On your application's settings page, click the "My Access Token" link. This gives you a page containing your "access token" and "access token secret" which together with your "consumer key" and "consumer secret" allow your app to post to your Twitter account.

Now you need some software. Fetch Jef Poskanzer's oauth_sign and http_post packages. You can use oauth_sign with wget or curl, but http_post is more convenient since both it and oauth_sign encode the query parameters for you, whereas wget and curl do not. Typing make should be enough to build oauth_sign; for http_post you probably want make SSL_DEFS="-DUSE_SSL" SSL_LIBS="-lssl -lcrypto".

Now you have everything you need to write a simple shell twitter client. Something like:

    #!/bin/sh

    consumer_key="COPY-FROM-APP-SETTINGS-PAGE"
    consumer_secret="COPY-FROM-APP-SETTINGS-PAGE"
    access_token="COPY-FROM-MY-ACCESS-TOKEN-PAGE"
    access_secret="COPY-FROM-MY-ACCESS-TOKEN-PAGE"
    url="https://api.twitter.com/1/statuses/update.json"

    http_post -h Authorization "$(oauth_sign \
	$consumer_key $consumer_secret \
	$access_token $access_secret \
	POST "$url" status="$*")" \
	     "$url" status="$*"

That should be enough to get you going.

For completeness (and since I worked out how to do it I'm going to inflict it on you) here's how to use Jef's tools to do the full three party oauth procedure.

  • First obtain a request oauth token and secret. For this transaction your oauth token and secret are empty. The HTTP request is a POST with an empty body.
    http_post -h Authorization "$(oauth_sign \
        $consumer_key $consumer_secret "" "" \
        POST https://api.twitter.com/oauth/request_token)" \
             https://api.twitter.com/oauth/request_token
    
  • The response will contain oauth_token and oauth_token_secret parameters which are your request token and secret. You must then get your victim to visit the URL https://api.twitter.com/oauth/authorize?oauth_token=$request_token. They will be asked to give your app permission to diddle with their account. (They may have to log in first.)
  • When they have done this they will be redirected to your app's callback URL, with some extra parameters. The documentation says these will be a copy of the request oauth_token and an oauth_verifier but in my testing the verifier was missing.
  • If your app is a client app (which has no callback URL or the URL is "oob") then the user will be presented with a PIN which is the request verifier.
  • You can now obtain the access token and secret as follows.
    http_post -h Authorization "$(oauth_sign \
        $consumer_key $consumer_secret \
        $request_token $request_secret \
        POST https://api.twitter.com/oauth/access_token oauth_verifier="$verifier")" \
             https://api.twitter.com/oauth/access_token oauth_verifier="$verifier"
    
  • The response will contain oauth_token and oauth_token_secret parameters which are the access token and secret that you must use when your app wants to do something with your victim's account.

I hope this might be of use to someone else...

(19 comments | Leave a comment)

Friday 30th July 2010

Paul Vixie declares war on malicious domain names.

Paul Vixie has been more or less in charge of BIND development for over 20 years. He is also notable for creating the first anti-spam blacklist, the MAPS RBL, in 1997.

Although usenet and email were the first serious Internet battlegrounds, the main problem now is attacks on web browsers and their users. There have been some efforts at adding blacklists to browsers, for example Google's safe browsing list, and although modern browsers use them there are still a lot of old unsafe browsers out there.

So instead of relying on users to upgrade their browsers, another approach is to put the blacklist into the DNS resolver. This approach was pioneered by OpenDNS. Their resolvers deliberately lie to protect their users. The problem with lying resolvers is that they can also lie in unhelpful ways, the worst being typo squatting - directing the user to an ad-laden money-generating portal instead of simply saying a name doesn't exist. Even if the resolver only makes white lies, you might not want to entirely outsource your DNS resolvers just for an extra safety feature.

This week Vixie announced his solution to this problem at the Black Hat and DefCon security conferences. He has effectively specified a way that a recursive resolver can easily take a data feed of malicious domain names, for which it will either deny their existence or redirect the user to a "walled garden" web server that explains why they can't get to the web site they expected or which does more fine-grained filtering.

This is not just an attack on malicious domain names: it is also an attempt to build the foundations for a competitive market in DNS reputation providers, as there already is for email DNSBLs. Vixie also wants to allow resolvers to make "white lies" without also encouraging typo squatting and other grubby activity. He also has an eye on making this work with DNSSEC, which is designed to prevent any kind of lying.

I'm keen for this to work. I'm in favour of judicious measures that make it harder for criminals to exploit the Internet. However I'm not entirely sure that Vixie's proposal is quite right as currently specified.

"Response Policy Zones" encode two things: the list of domains that the RPZ provider says are malicious, and the kind of reply that the resolver should give when a user makes a query for one of those domains. Replies can be NXDOMAIN or NODATA resolution failures, or replacement answers that can be used to redirect the user to a walled garden.

This is different from email DNS blacklists. Existing DNSBLs are just lists of bad domains or IP addresses, possibly qualified with some information about why each item is listed. DNSBLs encode no policy: it's up to the postmaster whether to block a listed site, or give it a slightly spammier score, or anything else.

This built-in policy is probably OK if the RPZ just says the domains it lists should not exist. However in the case of a redirection, the subscriber to the RPZ is likely to want to run their own walled garden, or they may want to outsource it to someone other than the provider of the RPZ. But the RPZ fixes the target of the redirect. This niggle is acknowledged in section 4.7 of the spec but the suggested solutions aren't great.

RPZs are designed to use standard DNS distribution protocols, and these are designed to copy the data verbatim. Typical implementations do not have any hooks you can use to filter a replicated zone. The alternative of getting the provider to do the customization at their end will be expensive and clumsy.

There is another possibility, which is for redirections in the RPZ to take the form of CNAME records pointing at a magic private domain name. Each RPZ subscriber can create a local private zone containing this magic name and pointing it at whatever walled garden they wish to deploy. However it might not be possible to use this trick to turn a redirect policy into an NXDOMAIN or NODATA - the spec is unclear.

I would be happier if, like DNSBLs, the RPZ proposal left the choice of countermeasures to the subscriber, leaving RPZ providers to just list the zones they don't like.

(Leave a comment)

Keyboard shortcuts for positioning windows in Mac OS X

It took me a few false starts to work out how to create arbitrary keyboard shortcuts in Mac OS X 10.6 "Snow Leopard", so with any luck this article will make it easier for other people who want to do something similar...

The way to attach an arbitrary script to a keyboard shortcut is as follows:

  • Open Automator.app and create a new "service".
  • To the left of the Automator window is a library of actions, towards the bottom of which is the "Utilities" category. Select it.
  • In the list of actions immediately to the right are interesting things like "Run Shell Script" and "Run AppleScript". Since I wanted to script the Mac layer (not the Unix layer) I double-clicked on the latter.
  • A box appears into which you can type your script. More about that below.
  • When you save your script It will appear in the "services" submenu of every application.
  • The script is saved in ~/Library/Services/Something.workflow which is a kind of .app directory. The interesting bit is the XML inside .../Contents/document.wflow
  • Also in the Services submenu is an option to open Services Preferences. Select it.
  • This takes you to the Services list on the Keyboard Shortcuts pane of System Preferences which allows you to control what appears in the services submenu. Your new service should be at the bottom of the list. Bizarrely, you can't set a keyboard shortcut for it directly on this list, so just ensure its tickybox is turned on.
  • Click the "+" button to add an application shortcut.
  • Choose "All Applications" from the list.
  • Next to "Menu Title" type in the name of your service.
  • Give it a keyboard shortcut and click "Add".
  • Your service and its keyboard shortcut will now appear in the "Application Shortcuts" list.
  • Your keyboard shortcut should also appear next to your service in the services submenu in all applications.

I tried doing something similar using the AppleScript Editor and the script menu, but for some reason the script menu is not eligible for keyboard shortcuts.

I wanted shortcuts to move a window into the left or right halves of the screen or maximize it to full screen. This is slightly less straightforward than it should be. See the comments in the following for details: script )

(13 comments | Leave a comment)

Thursday 22nd July 2010

Obscure problem caused by bad DNS load balancer

Our colleagues have recently been having problems talking to the Universities and Colleges Admissions Service ODBC server. This was caused by DNS resolution failures and lengthy time-outs when trying to look up the IPv6 address for odbc.ucas.com. BIND complains in its log:

22-Jul-2010 19:29:17.827 resolver: notice:
  DNS format error from 62.189.0.250#53 resolving odbc.ucas.com/AAAA
  for client 127.0.0.1#52970: invalid response
22-Jul-2010 19:29:17.827 lame-servers: info:
  error (FORMERR) resolving 'odbc.ucas.com/AAAA/IN': 62.189.0.250#53

This is puzzling, because dig shows that the server is sending back what looks like a perfectly well-formatted NODATA response:

; <<>> DiG 9.7.1-P2 <<>> +norec +multi aaaa odbc.ucas.com @62.189.0.250
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 59781
;; flags: qr aa; QUERY: 1, ANSWER: 0, AUTHORITY: 1, ADDITIONAL: 0

;; QUESTION SECTION:
;odbc.ucas.com.         IN AAAA

;; AUTHORITY SECTION:
ucas.com.               86400 IN SOA ucas.com. administrator.ucas.com. (
                                998545544  ; serial
                                28800      ; refresh (8 hours)
                                7200       ; retry (2 hours)
                                604800     ; expire (1 week)
                                86400      ; minimum (1 day)
                                )

;; Query time: 8 msec
;; SERVER: 62.189.0.250#53(62.189.0.250)
;; WHEN: Thu Jul 22 19:36:26 2010
;; MSG SIZE  rcvd: 105

However if you try tracing the resolution of odbc.ucas.com down from the root, you will see that it is delegated from its parent domain to a set of load-balancing DNS servers:

; <<>> DiG 9.7.1-P2 <<>> +norec aaaa odbc.ucas.com @ns0.netcentral.co.uk.
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 56307
;; flags: qr; QUERY: 1, ANSWER: 0, AUTHORITY: 1, ADDITIONAL: 4

;; QUESTION SECTION:
;odbc.ucas.com.                 IN      AAAA

;; AUTHORITY SECTION:
odbc.ucas.com.          3600    IN      NS      ns-lp.ucas.com.

;; ADDITIONAL SECTION:
ns-lp.ucas.com.         3600    IN      A       62.189.0.250
ns-lp.ucas.com.         3600    IN      A       81.171.139.250
ns-lp.ucas.com.         3600    IN      A       194.80.160.250
ns-lp.ucas.com.         3600    IN      A       195.188.99.250

;; Query time: 13 msec
;; SERVER: 212.57.232.5#53(212.57.232.5)
;; WHEN: Thu Jul 22 19:39:56 2010
;; MSG SIZE  rcvd: 115

This means that the start of authority (the zone cut) for odbc.ucas.com is odbc.ucas.com - but the NODATA response claimed it was ucas.com. BIND keeps track of where the zone cuts are, and requires that resource records in the authority section are subdomains of the zone cut. When this is not the case, BIND's NODATA parsing logic ends up at these lines of code:

	/*
	 * The responder is insane.
	 */
	log_formerr(fctx, "invalid response");
 

These load balancers are severely broken in other ways. If you ask them for any RR type other than A or AAAA they do not send any reply at all, leaving you to hang. Exceedingly rude! They also do not listen on TCP as they should.

(6 comments | Leave a comment)
Previous 50
Powered by LiveJournal.com