<?xml version="1.0" encoding="utf-8"?>
<!-- If you are running a bot please visit this policy page outlining rules you must respect. http://www.livejournal.com/bots/ -->
<feed xmlns="http://www.w3.org/2005/Atom" xmlns:lj="http://www.livejournal.com">
  <id>urn:lj:livejournal.com:atom1:fanf</id>
  <title>Tony Finch</title>
  <subtitle>Tony Finch</subtitle>
  <author>
    <email>dot@dotat.at</email>
    <name>Tony Finch</name>
  </author>
  <link rel="alternate" type="text/html" href="http://fanf.livejournal.com/"/>
  <link rel="self" type="text/xml" href="http://fanf.livejournal.com/data/atom"/>
  <updated>2009-06-09T22:24:07Z</updated>
  <lj:journal userid="936728" username="fanf" type="personal"/>
  <link rel="service.feed" type="application/x.atom+xml" href="http://fanf.livejournal.com/data/atom" title="Tony Finch"/>
  <entry>
    <id>urn:lj:livejournal.com:atom1:fanf:100759</id>
    <link rel="alternate" type="text/html" href="http://fanf.livejournal.com/100759.html"/>
    <link rel="self" type="text/xml" href="http://fanf.livejournal.com/data/atom/?itemid=100759"/>
    <title>Tempting Fate, and getting her unwanted attention</title>
    <published>2009-06-09T22:19:04Z</published>
    <updated>2009-06-09T22:24:07Z</updated>
    <content type="html">&lt;p&gt;Yesterday I dealt with a fairly routine question from a computer officer about sending email from a web server. It's always nice when someone asks for advice, even when they expect the answer won't contain any surprises, just in case it does. I gave our usual advice about the safest ways to set up web forms that send email. The bit that tempted fate was saying that web forms have caused spam problems for us "in the past" - and I had been reflecting recently that we haven't had a big web spam problem for quite a long time.&lt;/p&gt;

&lt;p&gt;Our recent problems have all been account compromises of one kind or another. One that will be familiar to other postmasters is caused by users sending their login credentials in reply to a phishing message. We seem to have a relatively clued-up userbase, based on tales from other universities; very few reply to most phishing spams and most of those replies do not contain passwords - they are more or less skeptical and sometimes taunt the spammer. However there's still the occasional idiot, and even a particularly special one who had to be re-educated &lt;i&gt;twice&lt;/i&gt;! If any passwords do escape and accounts get used to spam, our strict rate limits for remote email users shut the unwanted flows down pretty quick.&lt;/p&gt;

&lt;p&gt;More recently we have had problems with compromised accounts on email systems other than our own. These seem to have been due to infected PCs on networks with a Microsoft Windows domain and Exchange server - I don't think remote access (e.g. via Outlook Web Access) was to blame in these cases. We have been reasonably successful applying rate limits to mail servers that relay outgoing mail through us. We set the limit for each server to be about 10% above their normal peak usage, so it should not trigger unless something suspicious is happening, and we have a monitoring script to warn us if they stray into (or past!) the 10% headroom. The first time this mechanism caught a spam run it stopped an 80,000 message flood about 2% of the way through.&lt;/p&gt;

&lt;p&gt;So Fate observed my complacency about web form compromises and handed us a big plate full of crap today. Shortly before I got into work, one of our departments started spewing email and fairly swiftly whacked into their rate limits. I was foolishly trusting of their competence and aware of their occasional need to send large mailshots, so I tried to let the mail through - but it rapidly became clear that tens of thousands of messages was way beyond normal and their arrival rate of hundreds per second was going to cause us problems even if it was legit. Silly me for not looking at a sample message. I got in touch with the department's techies and it rapidly became apparent there was a compromise, and we went into lock-down and clean-up mode. Between us we had to delete about 100,000 queued messages, with the added joy that we had to avoid deleting the couple of dozen legitimate messages that had the same sender address as the spam. Sadly my error meant that 20,000 messages escaped (plus 6000 with invalid recipient addresses) whereas it should have been only two or three thousand :-(&lt;/p&gt;

&lt;p&gt;As I said yesterday, there are two ways to ensure that a web form sends email safely. The first is to fix the recipient address, for example in "send feedback to the webmaster" forms, or mail to registered users of your site. The second is to fix the contents of the message, for example in signup or purchase confirmation messages. Either of these is enough to make the form useless to spammers.&lt;/p&gt;

&lt;p&gt;Today's excitement was caused by a "mail a copy of this story" form, which should have been implemented safely in the second style. However the form allowed remote users to add their own comments above the story, and gave them plenty of space to do so - enough for a typically lengthy 419 scam.&lt;/p&gt;

&lt;p&gt;It's fair to say this has been a learning experience, more painful for some than for others :-) I should remember that technical cockups are one thing, but even if you are technically competent political requirements can still screw the pooch.&lt;/p&gt;</content>
  </entry>
  <entry>
    <id>urn:lj:livejournal.com:atom1:fanf:100365</id>
    <link rel="alternate" type="text/html" href="http://fanf.livejournal.com/100365.html"/>
    <link rel="self" type="text/xml" href="http://fanf.livejournal.com/data/atom/?itemid=100365"/>
    <title>CRSIDs and email addresses</title>
    <published>2009-06-02T13:11:56Z</published>
    <updated>2009-06-02T13:11:56Z</updated>
    <content type="html">&lt;p&gt;I've recently written a three page memo about the advantages of our username scheme compared to "friendly name" email addresses in large domains. I have put &lt;a href="http://www-uxsup.csx.cam.ac.uk/~fanf2/hermes/doc/misc/crsids.pdf"&gt;a copy of the PDF on the web&lt;/a&gt; which you can read if you like.&lt;/p&gt;</content>
  </entry>
  <entry>
    <id>urn:lj:livejournal.com:atom1:fanf:100245</id>
    <link rel="alternate" type="text/html" href="http://fanf.livejournal.com/100245.html"/>
    <link rel="self" type="text/xml" href="http://fanf.livejournal.com/data/atom/?itemid=100245"/>
    <title>Use your bonce</title>
    <published>2009-05-15T08:55:18Z</published>
    <updated>2009-05-15T08:55:18Z</updated>
    <content type="html">&lt;p&gt;&lt;a href="http://www.vestasys.org/"&gt;Vesta&lt;/a&gt; includes a purely functional programming language for specifying build rules. It has &lt;a href="http://www.vestasys.org/doc/pubs/pldi-00-04-20.pdf"&gt;an interesting execution model&lt;/a&gt; which avoids unnecessary rebuilds. Unlike &lt;tt&gt;make&lt;/tt&gt;, it automatically works out dependencies in a way that is independent of your programming language or tools - no manually maintained dependencies or parsing source for &lt;tt&gt;#include&lt;/tt&gt; etc. Also unlike make, it doesn't use timestamps to decide if dependencies are still valid, but instead uses a hash of their contents; it can do this efficiently because of its underlying version control repository. Vesta assumes that build tools are essentially purely functional, i.e. that their output files depend only on their input files, and that any differences (e.g. embedded timestamps) don't affect the functioning of the output.&lt;/p&gt;

&lt;p&gt;I've been wondering if Vesta's various parts can be unpicked. It occurred to me this morning that its build-once functionality could make a quite nice stand-alone tool. So here's an outline of a program called &lt;tt&gt;bonce&lt;/tt&gt; that I don't have time to write.&lt;/p&gt;

&lt;p&gt;&lt;tt&gt;bonce&lt;/tt&gt; is an adverbial command, i.e. you use it like &lt;tt&gt;bonce gcc -c foo.c&lt;/tt&gt;. It checks if the command has already been run, and if so it gets the results from its build results cache. It uses Vesta's dependency cache logic to decide if a command has been run. In the terminology of &lt;a href="http://www.vestasys.org/doc/pubs/pldi-00-04-20.pdf"&gt;the paper&lt;/a&gt;, the primary key for the cache is a hash of the command line, and the secondary keys are all the command's dependencies as recorded in the cache. If there is a cache miss, the command is run in dependency-recording mode. (Vesta does this using its magic NFS server, which is the main interface to its repository.) This can be done using an &lt;tt&gt;LD_PRELOAD&lt;/tt&gt; hack that intercepts system calls, e.g. &lt;tt&gt;open(O_RDONLY)&lt;/tt&gt; is a dependency and &lt;tt&gt;open(O_WRONLY)&lt;/tt&gt; is probably an output file, and &lt;tt&gt;exec()&lt;/tt&gt; is modified to invoke &lt;tt&gt;bonce&lt;/tt&gt; recursively. When the command completes, its dependencies and outputs are recorded in the cache.&lt;/p&gt;

&lt;p&gt;&lt;tt&gt;bonce&lt;/tt&gt; is likely to need some heuristic cleverness. For example, Vesta has some logic that simplifies the dependencies of higher-level build functions so that the dependency checking work for a top-level build invocation scales less than linearly with the size of the project. It could also be useful to look into git repositories to get SHA-1 hashes and avoid computing them.&lt;/p&gt;

&lt;p&gt;It should then be reasonable to write very naive build scripts or makefiles, with simplified over-broad dependencies that would normally cause excessive rebuilds - e.g. every object file in a module depends on every source file - which &lt;tt&gt;bonce&lt;/tt&gt; can reduce to the exact dependencies and thereby eliminate redundant work. No need for a special build language and no need to rewrite build scripts.&lt;/p&gt;</content>
  </entry>
  <entry>
    <id>urn:lj:livejournal.com:atom1:fanf:100022</id>
    <link rel="alternate" type="text/html" href="http://fanf.livejournal.com/100022.html"/>
    <link rel="self" type="text/xml" href="http://fanf.livejournal.com/data/atom/?itemid=100022"/>
    <title>Define SCM</title>
    <published>2009-05-14T11:32:33Z</published>
    <updated>2009-05-14T11:32:33Z</updated>
    <content type="html">&lt;p&gt;Here's an abbreviation to avoid, because its various meanings overlap so heavily.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Source Code Manager: synonym for version control system, e.g. as used by the Linux crowd in the early days of git.&lt;/li&gt;
&lt;li&gt;Software Configuration Manager: usually incorporates version control, build system, archival of build products, and CASE methodology baggage, e.g. ClearCASE and Vesta.&lt;/li&gt;
&lt;li&gt;System Configuration Manager: for system administrators rather than developers, e.g. cfengine and Puppet.&lt;/li&gt;
&lt;/ul&gt;</content>
  </entry>
  <entry>
    <id>urn:lj:livejournal.com:atom1:fanf:99700</id>
    <link rel="alternate" type="text/html" href="http://fanf.livejournal.com/99700.html"/>
    <link rel="self" type="text/xml" href="http://fanf.livejournal.com/data/atom/?itemid=99700"/>
    <title>Never delete anything</title>
    <published>2009-05-13T22:13:25Z</published>
    <updated>2009-05-13T22:13:25Z</updated>
    <content type="html">&lt;p&gt;How long will it be before it becomes normal to archive everything? It's already normal in some situations, and I think that's increasing. It's been the norm in software development for a long time. There's an increase in append-mostly storage systems (i.e. append-only with garbage collection) which become never-delete systems if you replace the GC with an archiver. Maybe the last hold-outs for proper deletion will be high data volume servers...&lt;/p&gt;

&lt;p&gt;Anyway, I feel like listing some interesting append-only and append-mostly systems. A tangent that I'm not going to follow is the rise of functional programming and immutability outside the field of storage. Many of these systems rely on cryptographic hashes to identify stuff they have already stored and avoid storing it again, making append-only much more practical.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;All version control systems, and software configuration management systems even more so. The former archive source code whereas the latter archive build tools and build products as well. &lt;a href="http://www.vestasys.org/"&gt;DEC's Vesta SCM&lt;/a&gt; is particularly interesting, being based on a purely functional build language designed to maximize memoization - i.e. minimize unnecessary rebuilds. It's sort of &lt;a href="http://ccache.samba.org/"&gt;ccache&lt;/a&gt; on steroids since it caches the results of entire module builds, not just individual source file compiles.&lt;/li&gt;

&lt;li&gt;&lt;a href="http://nixos.org/"&gt;Nix&lt;/a&gt; is a purely functional package manager. Unlike most packaging systems like dpkg or rpm, Nix packages do not conflict with each other: you upgrade by installing new packages alongside your existing ones, then you stop running the old ones and start running the new ones.&lt;/li&gt;

&lt;li&gt;Archival / backup systems, like &lt;a href="http://doc.cat-v.org/plan_9/4th_edition/papers/venti/"&gt;Venti&lt;/a&gt; which is Plan 9's append-only filesystem. Apple's &lt;a href="http://www.apple.com/macosx/features/timemachine.html"&gt;Time Machine&lt;/a&gt; isn't nearly as clever.&lt;/li&gt;

&lt;li&gt;Most filesystems don't use hash-based uniquification. Append-mostly filesystems often provide cool undelete features like snapshots, e.g. &lt;a href="http://media.netapp.com/documents/wp_3002.pdf"&gt;NetApp's WAFL&lt;/a&gt; or &lt;a href="http://www.sun.com/software/solaris/zfs.jsp"&gt;Sun's ZFS&lt;/a&gt;. Early filesystems of this kind, e.g. &lt;a href="http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.46.1317"&gt;BSD LFS&lt;/a&gt; tried to avoid wasting space, so didn't make old data available as snapshots, and sacrificed performance to eager garbage collection. More recently, &lt;a href="http://www.dragonflybsd.org/hammer/"&gt;DragonFly BSD's Hammer filesystem&lt;/a&gt; doesn't even have an in-kernel garbage collector, and running it is entirely optional.&lt;/li&gt;

&lt;li&gt;Email archives: gmail's ever-increasing quotas, cyrus &lt;a href="http://www-uxsup.csx.cam.ac.uk/~dpc22/cyrus/two_phase_expunge.html"&gt;delayed expunge&lt;/a&gt;.&lt;/li&gt;

&lt;/ul&gt;</content>
  </entry>
  <entry>
    <id>urn:lj:livejournal.com:atom1:fanf:99349</id>
    <link rel="alternate" type="text/html" href="http://fanf.livejournal.com/99349.html"/>
    <link rel="self" type="text/xml" href="http://fanf.livejournal.com/data/atom/?itemid=99349"/>
    <title>Some thoughts about git</title>
    <published>2009-04-23T20:01:02Z</published>
    <updated>2009-04-23T20:01:02Z</updated>
    <category term="hg"/>
    <category term="git"/>
    <category term="bzr"/>
    <category term="version control"/>
    <category term="dvcs"/>
    <content type="html">&lt;p&gt;I was originally planning to witter about distributed version control vs. centralized version control, especially the oft-neglected problem of breaking up a large &lt;a href="http://www.nongnu.org/cvs/"&gt;cvs&lt;/a&gt; / &lt;a href="http://subversion.tigris.org/"&gt;svn&lt;/a&gt; / &lt;a href="http://www.perforce.com/"&gt;p4&lt;/a&gt; repository. This was partly triggered by &lt;a href="http://www.youtube.com/watch?v=4XpnKHJAok8"&gt;Linus's talk about git at Google&lt;/a&gt; in which he didn't really address a couple of questions about how to migrate a corporate source repository to distributed version control. But in the end I don't think I have any point other than the fairly well-known one that distributed version control systems work best when your systems are split into reasonably modestly-sized and self-contained modules, one per repository. Most systems are modular, even if all the modules are in one huge central repository, but the build and system integration parts can often get tightly coupled to the repository layout making it much harder to decentralize.&lt;/p&gt;

&lt;p&gt;Instead I'm going to wave my hands a bit about the ways in which &lt;a href="http://git-scm.com/"&gt;git&lt;/a&gt; has unusual approaches to distributed version control, and how &lt;a href="http://bazaar-vcs.org/"&gt;bzr&lt;/a&gt; in particular seems to take diametrically opposing attitudes. I'm not saying one is objectively better than the other, because most of these issues are fairly philosophical and for practical purposes they are dominated by things like quality of implementation and documentation and support.&lt;/p&gt;

&lt;h3&gt;Bottom-up&lt;/h3&gt;

&lt;p&gt;Git's design is very bottom-up. Linus started by designing a repository structure that he thought would support his goals of performance, semantics, and features, and worked upwards from there. The upper levels, especially the user interface, were thought to be of secondary importance and something that could be worked on and improved further down the line. As a result it has a reputation for being very unfriendly to use, but that problem is pretty much gone now.&lt;/p&gt;

&lt;p&gt;Other VCSs take a similar approach, for example &lt;a href="http://www.selenic.com/mercurial/"&gt;hg&lt;/a&gt; is based on its &lt;a href="http://www.selenic.com/mercurial/wiki/index.cgi/Presentations?action=AttachFile&amp;amp;do=get&amp;amp;target=ols-mercurial-paper.pdf"&gt;revlog&lt;/a&gt; data structure, and &lt;a href="http://darcs.net/"&gt;darcs&lt;/a&gt; has its &lt;a href="http://darcs.net/manual/node9.html"&gt;patch algebra&lt;/a&gt;. However bzr seems to be designed from the top down, starting with a user interface and a set of supported workflows, and viewing its repository format and performance characteristics as of secondary importance and something that can be improved further down the line. As a result it has a reputation for being very slow.&lt;/p&gt;

&lt;h3&gt;Amortization&lt;/h3&gt;

&lt;p&gt;Most VCSs have a fairly intricate repository format, and every operation that writes to the repository eagerly keeps it in the canonical efficient form. Git is unusual because its write operations add data to the repository in an unpacked form which makes writing cheaper but makes reading from the repository gradually less and less efficient - until you repack the repo in a separate heavy-weight operation to make reads faster again. (Git will do this automatically for you every so often.) The advantage of this is that the packed repository format isn't constrained by any need for incremental updates, so it can optimise for read performance at the expense of greater pack write complexity because this won't slow down common write operations. Bzr being the opposite of git seems to do a lot more up-front work when writing to its repository than other VCSs, e.g. to make annotation faster.&lt;/p&gt;

&lt;p&gt;Thus git has two parallel repository formats, loose and packed. Other VCSs may have multiple repository formats, but only one at a time, and new formats are introduced to satisfy feature or performance requirements. Repository format changes are a pain and happily git's stabilized very early on - unlike bzr's.&lt;/p&gt;

&lt;h3&gt;Laziness&lt;/h3&gt;

&lt;p&gt;As well as being slack about &lt;i&gt;how&lt;/i&gt; it writes to its repository, git is also slack about &lt;i&gt;what&lt;/i&gt; it writes. There has been an inclination in recent VCSs towards richer kinds of changeset, with support for file copies and renames or even things like token renames in darcs. The bzr developers think this is &lt;a href="http://www.markshuttleworth.com/archives/123"&gt;vital&lt;/a&gt;. Git, on the other hand, doesn't bother storing that kind of information at all, and instead lazily calculates it when necessary. There are &lt;a href="http://permalink.gmane.org/gmane.comp.version-control.git/217"&gt;some good reasons&lt;/a&gt; for this, in particular that developers will often not bother to be explicit about rich change information, or the information might be lost when transmitting a patch, or the change might have come from a different VCS that doesn't encode the information. This implies that even VCSs that can represent renames &lt;a href="https://lists.canonical.com/archives/bazaar/2009q1/052948.html"&gt;still need to be able to infer them&lt;/a&gt; in some situations.&lt;/p&gt;

&lt;p&gt;Git's data structure helps to make this efficient: it identifies files and directories by a hash of their contents, so if the hash is the same it doesn't need to look any closer to find differences because there aren't any - and this implies a copy or rename. This means that you should not rename or copy a file and modify it in the same commit, because that makes git's rename inference harder. Similarly if you rename a directory, don't modify any of its contents (including renames and permissions changes) in the same commit.&lt;/p&gt;

&lt;p&gt;Mercurial also uses hashes to identify things, but they aren't pure content hashes: they include historical information, so they can't be used to identify files with the same contents but different histories. Thus efficiency forces hg to represent copies explicitly.&lt;/p&gt;

&lt;h3&gt;Any more?&lt;/h3&gt;

&lt;p&gt;I should say that I know very little about bzr, and nothing about &lt;a href="http://www.gnu.org/software/gnu-arch/"&gt;tla&lt;/a&gt;, &lt;a href="http://monotone.ca/"&gt;mtn&lt;/a&gt;, or &lt;a href="http://www.bitkeeper.com/"&gt;bk&lt;/a&gt;, so if any of the above is off the mark or over-states git's weirdness, then please correct me in a comment!&lt;/p&gt;</content>
  </entry>
  <entry>
    <id>urn:lj:livejournal.com:atom1:fanf:99168</id>
    <link rel="alternate" type="text/html" href="http://fanf.livejournal.com/99168.html"/>
    <link rel="self" type="text/xml" href="http://fanf.livejournal.com/data/atom/?itemid=99168"/>
    <title>LISTSERV crapness</title>
    <published>2009-04-08T13:18:01Z</published>
    <updated>2009-04-08T13:18:01Z</updated>
    <content type="html">&lt;p&gt;&lt;a href="http://www.lsoft.com/manuals/lsv-faq.stm#_Toc196887308"&gt;LISTSERV uses a null return path (RFC821 MAIL FROM:&amp;lt;&amp;gt;) on its administrative mail, and some mail hosts reject this.&lt;/a&gt; I discard message with a null return path that do not match a few simple heuristics, so I lose things like subscription confirmations from services like JISCfail. This makes me cross.&lt;/p&gt;

&lt;p&gt;L-Soft claim that LISTSERV is following the specifications, and they cite a couple of paragraphs from RFC 821 (published in 1982) and RFC 1123 (1989). However they fail to cite text from RFC 2821 (2001) which explicitly forbids what they are doing.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;There are several types of notification messages which are required by existing and proposed standards to be sent with a null reverse path, namely non-delivery notifications, other kinds of Delivery Status Notifications (DSNs), and also Message Disposition Notifications (MDNs). [...]&lt;/p&gt;
&lt;p&gt;All other types of messages (i.e., any message which is not required by a standards-track RFC to have a null reverse-path) SHOULD be sent with with a valid, non-null reverse-path.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;The only other permitted use of null return paths that I know of is vacation notifications, described in RFC 3834 (published in 2004).&lt;/p&gt;

&lt;p&gt;L-Soft needs to get a grip, read some RFCS,  and fix their software.&lt;/p&gt;</content>
  </entry>
  <entry>
    <id>urn:lj:livejournal.com:atom1:fanf:98862</id>
    <link rel="alternate" type="text/html" href="http://fanf.livejournal.com/98862.html"/>
    <link rel="self" type="text/xml" href="http://fanf.livejournal.com/data/atom/?itemid=98862"/>
    <title>Configuration management</title>
    <published>2009-04-02T16:15:40Z</published>
    <updated>2009-04-02T16:15:40Z</updated>
    <content type="html">&lt;p&gt;We've been discussing configuration management in the office this week. None of us are happy with the way we're doing things...&lt;/p&gt;

&lt;p&gt;On Hermes, we do most configuration management using rdist. On our management server we have a number of file trees containing files that differ from the default - a smattering of stuff in /etc, the contents of /home and /opt, and a few other bits scattered around. These trees are cloned and hacked for each flavour of machine (ppswitch, cyrus, lists, webmail, etc.) and each version of the underlying OS. These trees are mostly kept under revision control.&lt;/p&gt;

&lt;p&gt;This setup has the advantage of simplicity, and it's easy to push out small changes. One key feature is rdist's testing mode, which makes it easy for us to ensure that the servers are in sync with the configuration tree without changing anything. We often run a test across a cluster of 10 or 16 machines in parallel. It's easy to selectively push a change to an idle machine for testing before rolling it out to the rest of the cluster. For more tricky changes I do a phased rollout so I can check for unwanted changes in behaviour without breaking the entire service at once.&lt;/p&gt;

&lt;p&gt;Of course it has some serious disadvantages. We have to be root on the management host to be able to push changes out to the servers. We can't easily keep file ownership and permissions under revision control. This mechanism misses significant parts of the configuration, such as installed base OS packages and which rc scripts are enabled. There's also a lot of scope for improving our initial OS install scripts to reduce the amount that they get out of sync with the rdist configuration.&lt;/p&gt;

&lt;p&gt;So we'd like something better. Our colleagues have different system management setups with different problems, and they are also looking for something better.&lt;/p&gt;

&lt;p&gt;A couple of my colleagues have looked at &lt;a href="http://reductivelabs.com/products/puppet/"&gt;Puppet&lt;/a&gt; but weren't happy with it. I dislike its basic design. Managed servers pull their configurations from the master, which means you must never screw up a change on the master, and it's harder to test changes - you have to explicitly set up test server profiles. Yes, you can run Puppet in push mode but it's often a bad idea to work against a program's basic architecture. Puppet also has its own security mechanisms, whereas I'd prefer to avoid multiplying channels of trust. Finally, I &lt;b&gt;really hate&lt;/b&gt; writing configuration files for programs that write configuration files. It's a waste of brain cells to understand this superfluous abstraction layer: I just want to write the underlying configuration file directly.&lt;/p&gt;

&lt;p&gt;&lt;font size="1"&gt;Aside: this is why I &lt;b&gt;really really hate&lt;/b&gt; autoyast, which uses an XML configuration file to control a program that writes configuration files that control rc scripts that manipulate the underlying configuration files for the programs you actually care about. It takes hours to work out how to make it produce the correct results.&lt;/font&gt;&lt;/p&gt;

&lt;p&gt;I spent some time working on a program called pstx which was to be a suped-up replacement for rdist. It was going to be an sftp client (so it would require nothing special on the target servers) that had a simple configuration file to specify which directory trees to copy where, including ownerships and permissions (bringing them under revision control and removing the need to be root on the master), and possibly with added bells and whistles like remote diff, and a reverse (pull) mode. I also intended to make it easer to combine collections of files and thereby share common files. Sadly it's only about a third written and likely won't get much further.&lt;/p&gt;

&lt;p&gt;One of the conversations this week was about how to reduce the chance of mistakes when rolling out changes, in which we discussed the use of revision control to help with testing and rollback. At the moment Hermes mostly uses CVS and other unixy bits of the CS share a Subversion repository, so the conversation very much assumed a central repository model - which fits in well with a master configuration server. Recently I have also been investigating git more seriously, so I thought it might be part of a solution.&lt;/p&gt;

&lt;p&gt;The key things that git provides include a flexible and efficient network protocol for moving files around (flexible in that it can use ssh and http as well as git's native protocol, and efficient in that it's incremental and compressed), it can tell us what differs between a directory tree and the state of the repository, changes can be pushed and pulled, etc. The distributed version control functionality is way better at all this than pstx was ever going to be.&lt;/p&gt;

&lt;p&gt;The big missing part is the ability to track ownerships and permissions: git only supports normal development checkouts which should be done using the developer's uid and umask. There are also some low-level problems with the way git performs checkouts: it removes the destination file before writing the updated version in its place. I would like a special deployment command which checks all the changed files out under temporary names, fixes their ownerships and permissions, then renames them into place. The first stage almost exists in the form of &lt;tt&gt;git-checkout-index --temp d&lt;/tt&gt;, though it does weird things with symlinks. It doesn't look like much work to add the other stages.&lt;/p&gt;

&lt;p&gt;This could provide some really nice workflows. A configuration change is created on a small topic branch, then pushed to the configuration repositories on all the machines - which changes none of the live configuration. You can then switch a test machine to the new branch with a single &lt;tt&gt;git deploy &lt;i&gt;branch&lt;/i&gt;&lt;/tt&gt;, or roll back by re-deploying the master branch. You can find out the current configuration of a machine in summary using &lt;tt&gt;git status&lt;/tt&gt; or &lt;tt&gt;git diff&lt;/tt&gt; to make sure the summary isn't a lie.&lt;/p&gt;

&lt;p&gt;One interesting possibility might be to tame the clone-and-hack problem by using a shared repository for all the different classes of machines. This should make it easier to see the differences between each kind of machine and spot spurious ones.&lt;/p&gt;

&lt;p&gt;One thing that is very manual in our current setup is hupping daemons to make them load an updated configuration. It might be feasible to use a post-deploy hook to do this, provided the hook script is given enough information that it can avoid unnecessary restarts.&lt;/p&gt;

&lt;p&gt;Now I just need to find some time to turn the idea into code...&lt;/p&gt;</content>
  </entry>
  <entry>
    <id>urn:lj:livejournal.com:atom1:fanf:98711</id>
    <link rel="alternate" type="text/html" href="http://fanf.livejournal.com/98711.html"/>
    <link rel="self" type="text/xml" href="http://fanf.livejournal.com/data/atom/?itemid=98711"/>
    <title>Ada Lovelace Day</title>
    <published>2009-03-25T12:26:31Z</published>
    <updated>2009-03-25T12:26:31Z</updated>
    <content type="html">&lt;p&gt;This is a bit late, but I've been inspired to post after reading about the women other people admire. Here are a few of mine.&lt;/p&gt;
&lt;ul&gt;

&lt;li&gt;&lt;a href="http://www.eecs.harvard.edu/~margo/"&gt;Margo Seltzer&lt;/a&gt;: developer of the BSD log-structured filesystem and Berkeley DB. An open source entrepreneur - founder and CTO of Sleepycat Software - who also has a successful academic career.&lt;/li&gt;

&lt;li&gt;Dina Katabi invented &lt;a href="http://www.ana.lcs.mit.edu/dina/XCP/"&gt;XCP, the eXplicit Congestion control Protocol&lt;/a&gt;, which is a brilliant way for routers to make each individual flow adjust to the level of congestion without the need for any per-flow state. I think it was one of the first papers I read about advanced transport protocol research, and I have continued to follow the subject since then.&lt;/li&gt;

&lt;li&gt;&lt;a href="http://nih.blogspot.com/"&gt;Lisa Dusseault&lt;/a&gt; is a WebDAV expert and IETFer, currently one of the Applications Area Directors and therefore a member of the IESG. She was one of the funner people I met at the Paris IETF meeting a few years ago. I like the new ideas she's brought to the IESG, such as monthly updates on her work as Apps AD.&lt;/li&gt;

&lt;li&gt;And of course &lt;span class='ljuser' lj:user='rmc28' style='white-space: nowrap;'&gt;&lt;a href='http://rmc28.livejournal.com/profile'&gt;&lt;img src='http://l-stat.livejournal.com/img/userinfo.gif' alt='[info]' width='17' height='17' style='vertical-align: bottom; border: 0; padding-right: 1px;' /&gt;&lt;/a&gt;&lt;a href='http://rmc28.livejournal.com/'&gt;&lt;b&gt;rmc28&lt;/b&gt;&lt;/a&gt;&lt;/span&gt;, who got the University Card working properly and now slaves away on the student information system.&lt;/li&gt;

&lt;/ul&gt;</content>
  </entry>
  <entry>
    <id>urn:lj:livejournal.com:atom1:fanf:98545</id>
    <link rel="alternate" type="text/html" href="http://fanf.livejournal.com/98545.html"/>
    <link rel="self" type="text/xml" href="http://fanf.livejournal.com/data/atom/?itemid=98545"/>
    <title>John Taylor talks about the Corpus clock</title>
    <published>2009-03-15T19:57:40Z</published>
    <updated>2009-03-16T11:52:51Z</updated>
    <content type="html">&lt;p&gt;I posted a couple of articles (&lt;a href="http://fanf.livejournal.com/94043.html"&gt;one&lt;/a&gt;, &lt;a href="http://fanf.livejournal.com/94411.html"&gt;two&lt;/a&gt;) about &lt;a href="http://en.wikipedia.org/wiki/Corpus_Clock"&gt;the Corpus clock&lt;/a&gt; soon after it was unveiled, and I optained a copy of &lt;a href="http://www.chronophage.co.uk/shop.htm"&gt;the Chronophage book&lt;/a&gt; soon after it became available. The book is a bit lacking in technical detail, so I was pleased to find out that John Taylor would be talking about it for &lt;a href="http://www.admin.cam.ac.uk/sciencefestival/events.shtml?id=410&amp;amp;template=/sciencefestival/events/item.template"&gt;the Cambridge Science Festival&lt;/a&gt; and I attended his talk yesterday. The lecture theatre was packed.&lt;/p&gt;  &lt;p&gt;The talk was in three parts: a bit of autobiography, then a quick overview of the clock and its development, then questions. The talk revealed some interesting facts which do not appear in the book.&lt;/p&gt;

&lt;p&gt;&lt;a href="http://en.wikipedia.org/wiki/John_C._Taylor_(inventor)"&gt;Taylor's story&lt;/a&gt; of how he came to make a fortune in kettle thermostats is relevant to the clock not just because that's how he made the money: his horological hero, &lt;a href="http://en.wikipedia.org/wiki/John_Harrison"&gt;John Harrison&lt;/a&gt;, invented the &lt;a href="http://en.wikipedia.org/wiki/Grasshopper_escapement"&gt;grasshopper escapement&lt;/a&gt; that inspired the Chronophage, and also &lt;a href="http://en.wikipedia.org/wiki/Bimetallic_strip"&gt;invented the bimetal&lt;/a&gt; on which Taylor's thermostats rely. He explained his snap-action bimetallic disc, which has a C cut out to form a tongue in the middle; this tongue moves further than the centre of a disc without the cut-out does, which loosens the tolerances required to manufacture a thermostat. The calligraphic flourish engraved below his name on the clock's pendulum bob forms the shape of his bimetallic disc. (Sorry, I can't find a picture on line though it's visible in the book.)&lt;/p&gt;

&lt;p&gt;&lt;b&gt;ETA:&lt;/b&gt; I have found &lt;a href="http://www.flickr.com/photos/36320341@N02/3351359967/sizes/o/"&gt;a large picture of the pendulum bob&lt;/a&gt; in which you can see the shape of Taylor's bimetallic disc under the date. It's overdrawn by some of the flourish so it's a bit obscure unless you know what you are looking for.&lt;/p&gt;

&lt;p&gt;There were a couple of hints at the clock's workings. It is usual for clocks with chiming mechanisms to have a second set of drive springs or weights for the chimes. The Corpus clock has only one spring, so its hourly death-rattle is driven in an unusual way. This is why the clock ticks back and forth while rattling the chain in the coffin. There are two main mechanisms inside the Chronophage itself. It contains a mechanical pseudo-random mechanism to control the blinking and biting. The sting, however, is regular: it slowly rises then jabs down on each quarter hour.&lt;/p&gt;  &lt;p&gt;He also mentioned that the clock &amp;quot;has fun&amp;quot; four times a year: on 24th March, which is the date of Harrison's birth and also his death; on 25th November, &lt;a href="http://www.cambridge-news.co.uk/cn_news_home/displayarticle.asp?id=368991"&gt;Taylor's birthday&lt;/a&gt;; on new year's eve and new year's day; and on Corpus Christi day, the Church's festival that occurs on the Thurdsday after Trinity Sunday, the 8th Sunday after Easter. The clock is away for maintenance this week, so I hope it'll be back in time to muck around on Tuesday week. The fun seems to involve even more erratic ticking than usual, with more colourful lights.&lt;/p&gt;

&lt;p&gt;The clock's spring has to be particularly strong in order to overcome the momentum of the large escape wheel that goes around the outside of the clock. When prototyping the clock this caused a serious problem: the force of the spring is transmitted through the escapement to the pendulum, making it swing higher and higher and eventually breaking the clock. Taylor overcame this problem by adding a regulator, which also serves two other functions: it produces the clock's erratic behaviour that plays tricks with observers, and it listens to &lt;a href="http://en.wikipedia.org/wiki/MSF_time_signal"&gt;the MSF time signal&lt;/a&gt; to synchronize the clock every five minutes. (Taylor confirmed to me after the talk that these are all functions of the same mechanism. He also said that the hollow pendulum bob is not in fact &amp;quot;massive and weighty&amp;quot; as the book says, which answered my question about conservation of momentum.) I'm not sure how this is consistent with his assertions that the clock is purely mechanical - would it work if the computer regulator broke? &lt;a href="http://www.cambridge-news.co.uk/cn_news_cambridge/displayarticle.asp?id=357677"&gt;News reports&lt;/a&gt; about the clock's teething problems suggest that it's pretty vital. I wonder if the maintenance periods this week and back in January are for software patches rather than mechanical adjustment?&lt;/p&gt;

&lt;p&gt;I asked Taylor if he would publish more technical details about the clock, but he thinks that the mystery makes it fun. I disagree :-/&lt;/p&gt;</content>
  </entry>
  <entry>
    <id>urn:lj:livejournal.com:atom1:fanf:98078</id>
    <link rel="alternate" type="text/html" href="http://fanf.livejournal.com/98078.html"/>
    <link rel="self" type="text/xml" href="http://fanf.livejournal.com/data/atom/?itemid=98078"/>
    <title>Job Ad</title>
    <published>2009-03-05T14:26:10Z</published>
    <updated>2009-03-05T14:26:10Z</updated>
    <content type="html">My colleagues over in MISD (the University's administrative computing unit, as opposed to the CS which is the University's ISP and IT training department) are looking for another DBA to help look after the Oracle running under exciting services like the financial system and student information system.&lt;br /&gt;&lt;br /&gt;&lt;a href="http://www.admin.cam.ac.uk/offices/hr/jobs/vacancies.cgi?job=4781"&gt;http://www.admin.cam.ac.uk/offices/hr/jobs/vacancies.cgi?job=4781&lt;/a&gt;</content>
  </entry>
  <entry>
    <id>urn:lj:livejournal.com:atom1:fanf:97845</id>
    <link rel="alternate" type="text/html" href="http://fanf.livejournal.com/97845.html"/>
    <link rel="self" type="text/xml" href="http://fanf.livejournal.com/data/atom/?itemid=97845"/>
    <title>Microblogging</title>
    <published>2009-02-21T23:46:32Z</published>
    <updated>2009-02-21T23:47:00Z</updated>
    <content type="html">&lt;p&gt;Since April 2002 I have been keeping a &lt;a href="http://dotat.at/:/"&gt;link log&lt;/a&gt;: a web page where I record links to other pages that are notable in some way, with a brief comment about each one. It's gatewayed to &lt;a href="http://delicious.com/fanf"&gt;Delicious&lt;/a&gt;, &lt;a href="http://dotaturls.livejournal.com/"&gt;LiveJournal&lt;/a&gt;, and &lt;a href="http://twitter.com/fanf"&gt;Twitter&lt;/a&gt;. The Twitter feed is new.&lt;/p&gt;

&lt;p&gt;I've been on Twitter since March 2007, when it made a big splash at the SXSW festival. I joined to see what the fuss was about but soon stopped using it. None of my social circle were using it much, and there wasn't a sensible way of finding people in the more distant reaches of my social network. For 18 months my last tweet said "&lt;a href="http://twitter.com/fanf/status/85418462"&gt; Grumbling about the difficulty of finding out people's twitter usernames&lt;/a&gt;".&lt;/p&gt;

&lt;p&gt;Recently Twitter has taken off in the UK, since a fairly large contingent of journalists and celebrities have started using it and talking about it. (Reminds me in a small way of the rise of the web in 1993/4.) Coincidentally at about the same time I decided to get back into using Facebook and Twitter a bit more, to keep in better touch with my family's activities. (They're on Facebook but I linked my Twitter and Facebook statuses together for convenience.)&lt;/p&gt;

&lt;p&gt;The celebrity thing helped me to get more out of Twitter in a significant way. I had thought of it as more like Facebook or LiveJournal: a way of keeping up with the activities of people I already know. But in fact it's more like the wider blogging culture where it's normal to follow someone's writing just because they are interesting. (Yes, LJ blurs into this kind of blogging too.) In fact the even the Twitter guys didn't realise this at first: until June 2007 Twitter sent me email saying "You are foo's newest friend!" but the following month it started sending me email saying "foo is now following you on Twitter!". I now feel free to follow well-known people just because they are interesting, and I found more interesting people by looking through the lists of people they follow.&lt;/p&gt;

&lt;p&gt;Part of the Twitter culture is heavy use of short URL services when posting interesting links. I decided that my link log ought to be gatewayed to Twitter to join in the fun, so I embarked on a bit of hacking. Previously it had been just a static HTML page, with a command-line script that I used to insert a new link. There was another script to turn the HTML into an RSS feed for LJ, and some code for uploading the results to &lt;a href="http://www.chiark.greenend.org.uk/"&gt;chiark&lt;/a&gt;. I had a rudimentary click tracker so that I could get some idea of what links other people followed, but it did not shorten URLs. (That meant it was an open redirector until I found out what evil that made it vulnerable to.)&lt;/p&gt;

&lt;p&gt;I replaced all the old static HTML and RSS gubbins with a CGI script which can format my link log in HTML or Atom, as well as providing short URLs for all the links. I decidided to host my own short URLs partly to continue tracking what people find interesting, and partly because I have a vanity domain that's just right for this job :-) The command line script for adding new links remained basically the same, apart from new code to create short tags and to post the link to Twitter.&lt;/p&gt;

&lt;p&gt;After a couple of weeks of this new regime I decided I needed to be able to save links found when reading Twitter or LiveJournal on my iPod Touch. Its UI makes it easy to email links but not much else, so I set up a gateway from email to my link log. I send links to a specially tagged email address which is filtered to a special mailbox. A script polls this mailbox over IMAP every few minutes, looking for messages that match a number of paranoid checks. From these extracts a URL and a comment which it passes on to the old command-line script. (I still use the latter when I'm on a computer with a keyboard.)&lt;/p&gt;

&lt;p&gt;The end result is pretty neat: I can easily save interesting links that I see in places where that was not previously practical. It's probably a good thing I don't have an iPhone because then I'd be able to spod &lt;i&gt;everywhere&lt;/i&gt;.&lt;/p&gt;

&lt;p&gt;PS. a brief hate: web sites that redirect deep links to the mobile version of their front page. Utter cWAP. They'll get no Google Juice from me. (In fact the iPhone browser is good enough that mobile sites are usually NOT an improvement.)&lt;/p&gt;</content>
  </entry>
  <entry>
    <id>urn:lj:livejournal.com:atom1:fanf:97572</id>
    <link rel="alternate" type="text/html" href="http://fanf.livejournal.com/97572.html"/>
    <link rel="self" type="text/xml" href="http://fanf.livejournal.com/data/atom/?itemid=97572"/>
    <title>The joy of lpeg</title>
    <published>2009-02-21T00:29:11Z</published>
    <updated>2009-02-21T00:29:37Z</updated>
    <content type="html">&lt;p&gt;I've recently started playing with &lt;tt&gt;lpeg&lt;/tt&gt;, a parsing library for &lt;a href="http://www.lua.org"&gt;Lua&lt;/a&gt;. It is based on "&lt;a href="http://pdos.csail.mit.edu/~baford/packrat/"&gt;Parsing Expression Grammars&lt;/a&gt;", which were recently popularized by the prolific &lt;a href="http://www.brynosaurus.com/"&gt;Bryan Ford&lt;/a&gt;. PEGs have some nice properties: they're suitable for unified parsers that handle both the low-level lexical syntax as well as higher-level hierarchial syntax; they have much simpler operational semantics than traditional extended regexes or context-free grammars; and as well as familiar regex-like and CFG-like operators they have nice features for controlling lookahead and backtracking. PEGs were originally developed alongside &lt;a href="http://www.brynosaurus.com/pub/lang/packrat-icfp02.pdf"&gt;a cute algorithm for linear-time parsing&lt;/a&gt; which unfortunately also requires space linear in the input size with a fairly large multiplier. Lpeg instead uses a simple parsing machine, implemented somewhat like a bytecode interpreter. Its performance is quite competitive: &lt;a href="http://www.inf.puc-rio.br/~roberto/docs/peg.pdf"&gt;the long paper&lt;/a&gt;  says it has similar performance to &lt;tt&gt;pcre&lt;/tt&gt; and &lt;tt&gt;glibc&lt;/tt&gt;'s POSIX regex implementation, and &lt;a href="http://www.inf.puc-rio.br/~roberto/docs/ry08-4.pdf"&gt;the short paper&lt;/a&gt; says it has similar performance to &lt;tt&gt;lex&lt;/tt&gt; and &lt;tt&gt;yacc&lt;/tt&gt;.&lt;/p&gt;

&lt;p&gt;Lpeg actually consists of two modules. &lt;a href="http://www.inf.puc-rio.br/~roberto/lpeg/lpeg.html"&gt;The core lpeg module&lt;/a&gt; is written in C and allows you to compose parser objects using operator overloading, building them up from primitives returned from tersely named constructor functions. The resulting syntax is rather eccentric. On top of that is &lt;a href="http://www.inf.puc-rio.br/~roberto/lpeg/re.html"&gt;the &lt;tt&gt;re&lt;/tt&gt; module&lt;/a&gt; which provides a more normal PEG syntax for parsers, which despite the name of the module are rather different from regular expressions. This module is written in Lua, using an lpeg parser to parse PEGs and construct lpeg parsers from them. The PEG syntax is extended so that you can define "captures". Captures are the really nice thing about lpeg. You can use them like captures in Perl regexes to just extract substrings of the subject, but you can often do better. Lpeg captures are more like the semantic actions that you can attach to rules in parser generators like &lt;tt&gt;yacc&lt;/tt&gt;. So, where in Perl you would do the match then fiddle around with &lt;tt&gt;$1&lt;/tt&gt;, &lt;tt&gt;$2&lt;/tt&gt;, etc, with lpeg the match can incorporate the fiddling in a nice tidy way. (In fact, probably the closest comparison is with Perl 6 rules, but they're not yet practically usable.)&lt;/p&gt;

&lt;p&gt;The program I was writing with lpeg was to process some logs. I needed to convert the timestamps from ISO 8601 format into POSIX &lt;tt&gt;time_t&lt;/tt&gt; which implied converting 8 fields from strings to numbers. Rather than having to convert each capture individually, or loop over the captures, I could write a single grammar rule to match a pair of digits and convert it to a number, then refer to that rule elsewhere in the grammar. (In fact Lua will coerce strings to numbers implicitly in most - but not all - circumstances. I happened to be tripped up trying to compare a number with a string, which doesn't coerce.) In the end it's nicest to let the parser drive all the program's activity through its semantic actions.&lt;/p&gt;

&lt;a name="cutid1"&gt;&lt;/a&gt;
&lt;p&gt;In the following, [[...]] delimits a "long string" containing the PEG grammar. {...} in the grammar denotes a capture, whereas (...) is for non-capturing groups. &lt;tt&gt;pat -&amp;gt; fun&lt;/tt&gt; passes the captures of a pattern to a function. The second argument to &lt;tt&gt;compile&lt;/tt&gt; is a table where the keys are the function names referred to in the parser. The main entry point to the rest of the program is &lt;tt&gt;process&lt;/tt&gt;, whose definition I have left out.&lt;/p&gt;
&lt;pre&gt;
    local parser = re.compile([[
        line &amp;lt;- ( &amp;lt;stamp&amp;gt; ' ' {%a*} ' ' &amp;lt;count&amp;gt; !. ) -&amp;gt; process
        stamp &amp;lt;- ( &amp;lt;date&amp;gt; ' ' &amp;lt;time&amp;gt; ' ' &amp;lt;zone&amp;gt;   ) -&amp;gt; tostamp
        date  &amp;lt;- ( &amp;lt;num&amp;gt;&amp;lt;num&amp;gt; '-' &amp;lt;num&amp;gt; '-' &amp;lt;num&amp;gt; ) -&amp;gt; todate
        time  &amp;lt;- (      &amp;lt;num&amp;gt; ':' &amp;lt;num&amp;gt; ':' &amp;lt;num&amp;gt; ) -&amp;gt; totime
        zone  &amp;lt;- ( {[+-]} &amp;lt;num&amp;gt; &amp;lt;num&amp;gt;             ) -&amp;gt; tozone
        count &amp;lt;- {'@'*} -&amp;gt; tocount
        num   &amp;lt;- {%d^2} -&amp;gt; tonumber
    ]], {
        process = process,
        tonumber = tonumber,
        tocount = function (s) return #s end,
        todate = function (c,y,m,d)
            if m &amp;gt; 2 then m = m + 1;  y = c*100 + y
                     else m = m + 13; y = c*100 + y - 1 end
            return int(y*1461/4) - int(y/100) + int(y/400)
                 + int(m*153/5) + d - 719591
        end,
        totime = function (H,M,S)
            return H*3600 + M*60 + S
        end,
        tozone = function (zs,zh,zm)
            local z = zh*3600 - zm*60
            if zs == "-" then return -z
                         else return  z  end
        end,
        tostamp = function (date,time,zone)
            return date*86400 + time - zone
        end
    })

    for line in io.stdin:lines() do
        if not parser:match(line) then
            io.stdout:write("skipping "..line.."\n")
        end
    end
&lt;/pre&gt;
</content>
  </entry>
  <entry>
    <id>urn:lj:livejournal.com:atom1:fanf:97203</id>
    <link rel="alternate" type="text/html" href="http://fanf.livejournal.com/97203.html"/>
    <link rel="self" type="text/xml" href="http://fanf.livejournal.com/data/atom/?itemid=97203"/>
    <title>Impressive display of security clue from the Student Loans Company</title>
    <published>2009-02-05T13:52:04Z</published>
    <updated>2009-02-05T13:52:04Z</updated>
    <content type="html">&lt;p&gt;&lt;a href="http://www.slc.co.uk/pdf/FOI%20Exec%20Mgmt%20Board%20Minutes%20-%2018th%20January%202008%20(RSJ).pdf"&gt;The Student Loan Company executive management board minutes from a meeting just over a year ago&lt;/a&gt; says the following in section 6, "update on data security processes":&lt;/p&gt;

&lt;blockquote&gt;RSJ provided an update on Data Security and advised that information
which was being received from external sources confirmed that the
transfer of data on removable media devices was now unacceptable. He
stated that there was a need to consult with HEI’s as to the method of
transferring Attendance Confirmation Reports as SLC now had PGP
encryption software available which could replace the previous method of
transferring the data via CD’s. He also stated that the PGP software
which SLC were using should be checked to ensure that it was on the US
Government list of standard encryption as HEI’s are only permitted to
use PGP software from this list.&lt;/blockquote&gt;

&lt;p&gt;Not shipping media is good. Using end-to-end encryption is good. (Unlike banks which seem to like SMTP over TLS, which provides no additional security for inter-domain communication.) I wonder why the choice of PGP instead of S/MIME - I believe that PGP usually requires an add-on whereas S/MIME is often built in to MUAs. Perhaps they've been nobbled by a vendor.&lt;/p&gt;</content>
  </entry>
  <entry>
    <id>urn:lj:livejournal.com:atom1:fanf:96858</id>
    <link rel="alternate" type="text/html" href="http://fanf.livejournal.com/96858.html"/>
    <link rel="self" type="text/xml" href="http://fanf.livejournal.com/data/atom/?itemid=96858"/>
    <title>Meanies</title>
    <published>2009-01-29T16:39:54Z</published>
    <updated>2009-01-29T16:39:54Z</updated>
    <content type="html">&lt;p&gt;In &lt;a href="http://fanf.livejournal.com/96585.html"&gt;my previous post&lt;/a&gt;
  I said I was using the standard formula for the exponentially
  weighted moving average. This is not entirely true because the
  standard formula uses a complementary smoothing constant
  (&amp;alpha; = 1 &amp;minus; a) which allows you to write the formula as an
  incremental adjustment to the previous value.&lt;/p&gt;

&lt;blockquote&gt;
  &lt;img src="http://upload.wikimedia.org/math/5/9/b/59bee0ae64f264ab6de7e592da94898e.png" /&gt;
&lt;/blockquote&gt;

&lt;p&gt;It's also possible to express the standard unweighted mean in a
  similar manner.&lt;/p&gt;

&lt;blockquote&gt;
  &lt;img src="http://upload.wikimedia.org/math/4/b/e/4be6295d04e01d66b115bbbddc365af0.png" /&gt;
&lt;/blockquote&gt;

&lt;p&gt;I was quite pleased when I found out about this analogy :-)&lt;/p&gt;

&lt;p&gt;&lt;small&gt;(Thanks to wikipedia for rendering the maths.)&lt;/small&gt;&lt;/p&gt;</content>
  </entry>
  <entry>
    <id>urn:lj:livejournal.com:atom1:fanf:96585</id>
    <link rel="alternate" type="text/html" href="http://fanf.livejournal.com/96585.html"/>
    <link rel="self" type="text/xml" href="http://fanf.livejournal.com/data/atom/?itemid=96585"/>
    <title>Luck or judgment?</title>
    <published>2009-01-27T21:30:10Z</published>
    <updated>2009-01-27T22:51:10Z</updated>
    <content type="html">&lt;p&gt;In 2005 I developed a mathematical model for measuring average event rates which became the core of a new rate limiting feature for Exim. The model has a particularly useful property which I did not expect it to have and which I did not (until recently) fully understand.&lt;/p&gt;

&lt;h3&gt;The model&lt;/h3&gt;

&lt;p&gt;The central formula is the standard exponentially weighted moving average:&lt;/p&gt;

&lt;blockquote&gt;&lt;p&gt;
    r&lt;sub&gt;new&lt;/sub&gt; = (1 &amp;minus; a) &amp;lowast; r&lt;sub&gt;inst&lt;/sub&gt; + a &amp;lowast; r&lt;sub&gt;old&lt;/sub&gt;
&lt;/p&gt;&lt;/blockquote&gt;

&lt;p&gt;We use this to calculate the new average rate from the instantaneous rate and the result of the previous average calculation. These rates are all measured in events per some configurable time period. The smoothing factor &lt;i&gt;a&lt;/i&gt; is a value between 0 and 1 which determines how slowly the model forgets about past behaviour.&lt;/p&gt;

&lt;p&gt;We calculate &lt;i&gt;r&lt;sub&gt;inst&lt;/sub&gt;&lt;/i&gt; and &lt;i&gt;a&lt;/i&gt; from the raw inputs to the formula, which are &lt;i&gt;p&lt;/i&gt;, a time period configured by the postmaster, and &lt;i&gt;i&lt;/i&gt;, the time interval between the previous event and this event. By dividing them we get&lt;/p&gt;

&lt;blockquote&gt;&lt;p&gt;
    r&lt;sub&gt;inst&lt;/sub&gt; = p / i
&lt;/p&gt;&lt;/blockquote&gt;

&lt;p&gt;In this formula, &lt;i&gt;p&lt;/i&gt; determines the per time unit used for measuring rates, e.g. events per hour or events per day.&lt;/p&gt;

&lt;p&gt;The exponentially weighted moving average is usually used to smooth samples that are measured at fixed intervals, in which case the smoothing factor, &lt;i&gt;a&lt;/i&gt;, is also fixed. In our situation events occur at varying intervals, so the smoothing factor needs to be varied accordingly.&lt;/p&gt;

&lt;blockquote&gt;&lt;p&gt;
    a = e&lt;sup&gt;&amp;minus;i / p&lt;/sup&gt;
&lt;/p&gt;&lt;/blockquote&gt;

&lt;p&gt;In this formula, &lt;i&gt;p&lt;/i&gt; determines the smoothing period, i.e. the length of time it takes to forget 63% of past behaviour.&lt;/p&gt;

&lt;h3&gt;A useful property&lt;/h3&gt;

&lt;p&gt;When developing the model, I needed to understand how it reacts to changes in the rate of events. It's fairly simple to see that if the rate drops, the average decays exponentially towards the new rate. It's less clear what happens when the rate increases. A particular practical question is what happens if there's a sudden burst of messages? How much of the burst gets through before the average rate passes some configured maximum?&lt;/p&gt;

&lt;p&gt;I did some trial computations and I found that when the interval is very small (i.e. the rate is high) the average rate increases by nearly one for each event (the smaller the interval, the closer to one). This means that the maximum rate is also the maximum burst size. How wonderfully simple! The postmaster can configure the model with two numbers, a maximum number of events per a time period, which also directly specifies the units of measurement, the smoothing period, and the maximum burst size.&lt;/p&gt;

&lt;p&gt;(The maximum burst size is larger for slower senders, increasing to infinity for those sending below the maximum rate. However you don't have to be much above the maximum for your average to hit the limit within one period.)&lt;/p&gt;

&lt;p&gt;This property produces a simple user interface, but I did not understand how it works. It's obvious that when &lt;i&gt;i&lt;/i&gt; is small, &lt;i&gt;a &amp;asymp; 1&lt;/i&gt;, but it is not so clear why      
&lt;i&gt;(1 &amp;minus; a) &amp;lowast; r&lt;sub&gt;inst&lt;/sub&gt; &amp;asymp; 1&lt;/i&gt;. It seems I landed directly on a mathematical sweet spot without the fumbling around that might have led me to understand the situation better.&lt;/p&gt;

&lt;h3&gt;Arbitrary choices&lt;/h3&gt;

&lt;p&gt;I made two choices when developing the model which at the time seemed arbitrary, but which both must be right to get the &lt;i&gt;max rate = burst size&lt;/i&gt; property.&lt;/p&gt;

&lt;p&gt;Firstly, I decided to re-use the configured period for two purposes: as the smoothing period and as the per time unit of rates. I could instead have made them independently configurable, but this seemed to give no benefits that compensated for the extra complexity. Alternatively, I could have used a fixed value instead of &lt;i&gt;p&lt;/i&gt; in the formula for &lt;i&gt;r&lt;sub&gt;inst&lt;/sub&gt;&lt;/i&gt;, but that seemed liable to be confusing or awkward, and to require more mental arithmetic to configure.&lt;/p&gt;

&lt;p&gt;Secondly, I decided to use &lt;i&gt;e&lt;/i&gt; as the base of the exponent. I could have used 2, in which case the smoothing period would have been a half life, or 10, so that 90% of past behaviour would be forgotten after one period. There seemed to be no clear way to choose, so I split the difference and went with &lt;i&gt;e&lt;/i&gt; on the basis of mathematical superstition and because &lt;tt&gt;exp()&lt;/tt&gt; has fewer arguments than &lt;tt&gt;pow()&lt;/tt&gt;.&lt;/p&gt;

&lt;h3&gt;How it works&lt;/h3&gt;

&lt;p&gt;Looking back over my old notes this week, I had a revelation that the &lt;i&gt;max rate = burst size&lt;/i&gt; property comes from the fact that the gradient of &lt;i&gt;e&lt;sup&gt;x&lt;/sup&gt;&lt;/i&gt; is 1 when &lt;i&gt;x&lt;/i&gt; is 0. To show why this is so, I first need to define a bit of notation:&lt;/p&gt;

&lt;blockquote&gt;&lt;p&gt;
    y &amp;equiv; f(x) &amp;equiv; e&lt;sup&gt;x&lt;/sup&gt;
&lt;/p&gt;&lt;p&gt;
    &amp;delta;y &amp;equiv; f(x + &amp;delta;x) &amp;minus; f(x)
&lt;/p&gt;&lt;p&gt;
    &amp;delta;x &amp;equiv; &amp;minus;i / p
&lt;/p&gt;&lt;/blockquote&gt;

&lt;p&gt;This allows us to say, when &lt;i&gt;x&lt;/i&gt; is zero and &lt;i&gt;&amp;delta;x&lt;/i&gt; is small,&lt;/p&gt;

&lt;blockquote&gt;&lt;table&gt;&lt;tr&gt;
    &lt;td&gt;&lt;/td&gt;&lt;td&gt;(1 &amp;minus; a) &amp;lowast; r&lt;sub&gt;inst&lt;/sub&gt;&lt;/td&gt;
&lt;/tr&gt;&lt;tr&gt;
    &lt;td&gt; = &lt;/td&gt;&lt;td&gt;(1 &amp;minus; e&lt;sup&gt;&amp;minus;i/p&lt;/sup&gt;) &amp;lowast; p/i&lt;/td&gt;
&lt;/tr&gt;&lt;tr&gt;
    &lt;td&gt; = &lt;/td&gt;&lt;td&gt;(1 &amp;minus; e&lt;sup&gt;&amp;delta;x&lt;/sup&gt;) / &amp;minus;&amp;delta;x&lt;/td&gt;
&lt;/tr&gt;&lt;tr&gt;
    &lt;td&gt; = &lt;/td&gt;&lt;td&gt;(e&lt;sup&gt;&amp;delta;x&lt;/sup&gt; &amp;minus; 1) / &amp;delta;x&lt;/td&gt;
&lt;/tr&gt;&lt;tr&gt;
    &lt;td&gt; = &lt;/td&gt;&lt;td&gt;(e&lt;sup&gt;0 + &amp;delta;x&lt;/sup&gt; &amp;minus; e&lt;sup&gt;0&lt;/sup&gt;) / &amp;delta;x&lt;/td&gt;
&lt;/tr&gt;&lt;tr&gt;
    &lt;td&gt; = &lt;/td&gt;&lt;td&gt;(&amp;thinsp;f(0 + &amp;delta;x) &amp;minus; f(0)&amp;thinsp;) / &amp;delta;x&lt;/td&gt;
&lt;/tr&gt;&lt;tr&gt;
    &lt;td&gt; = &lt;/td&gt;&lt;td&gt;&amp;delta;y / &amp;delta;x&lt;/td&gt;
&lt;/tr&gt;&lt;tr&gt;
    &lt;td&gt; &amp;asymp; &lt;/td&gt;&lt;td&gt;dy / dx&lt;/td&gt;
&lt;/tr&gt;&lt;tr&gt;
    &lt;td&gt; = &lt;/td&gt;&lt;td&gt;e&lt;sup&gt;x&lt;/sup&gt;&lt;/td&gt;
&lt;/tr&gt;&lt;tr&gt;
    &lt;td&gt; = &lt;/td&gt;&lt;td&gt;1&lt;/td&gt;
&lt;/tr&gt;&lt;/table&gt;&lt;/blockquote&gt;

&lt;p&gt;Sweet.&lt;/p&gt;</content>
  </entry>
  <entry>
    <id>urn:lj:livejournal.com:atom1:fanf:96376</id>
    <link rel="alternate" type="text/html" href="http://fanf.livejournal.com/96376.html"/>
    <link rel="self" type="text/xml" href="http://fanf.livejournal.com/data/atom/?itemid=96376"/>
    <title>The MacBook saga</title>
    <published>2009-01-21T23:43:28Z</published>
    <updated>2009-01-21T23:52:05Z</updated>
    <content type="html">&lt;p&gt;Three months ago, Apple released their new "unibody" aluminium MacBook laptop, and just over a month before that they released the second-generation iPod touch. I was in the market for a new Mac laptop, and at some point I planned to upgrade my iPod when a 64GB Touch or iPhone became available. However when I found out that Apple were offering £95 "back to school" rebates to those who bought an iPod Touch and a MacBook, I couldn't wait any longer - this was the 30th October and the offer ceased at the end of the month.&lt;/p&gt;

&lt;p&gt;Unfortunately I made the order with a rather crusty old version of Firefox which handled Apple's customization JavaScript incorrectly - it discarded all the options I chose, and because I wasn't familiar with the order process I didn't know that the final description screen should have included more details than it did. This wasn't a problem for the bits and pieces that I could also buy at the local &lt;strike&gt;greengrocer&lt;/strike&gt; Apple Store, but one important option I chose was the US keyboard layout. I wanted it partly because I normally use a &lt;a href="http://www.pfusystems.com/hhkeyboard/images/lite2_us_sl.jpg"&gt;Happy Hacking keyboard&lt;/a&gt; which has a similar layout, and partly because &lt;a href="http://lowendmac.com/mail/mb07/art/macbook_keyboard.jpg"&gt;the ISO layout&lt;/a&gt; has thin return key that is awkward to press as well as being ugly.&lt;/p&gt;

&lt;p&gt;When I discovered this upon unboxing the new Shiny! my heart sank. You can't change the keyboard on the current MacBooks without replacing the entire machine. But I phoned Apple to see what I could do about it, and I was pleasantly surprised to find out that they would replace the machine with the correct model for free, no questions asked.&lt;/p&gt;

&lt;p&gt;I discovered Apple's FAIL when the second MacBook arrived. I had been sent the &lt;a href="http://km.support.apple.com/library/APPLE/APPLECARE_ALLGEOS/HT2841/304933_08.gif"&gt;"International English"&lt;/a&gt; model, which is identical to the &lt;a href="http://km.support.apple.com/library/APPLE/APPLECARE_ALLGEOS/HT2841/304933_02.gif"&gt;"British"&lt;/a&gt; layout apart from a couple of currency symbols. They also sent me a continental power plug instead of a BS 1363 plug. When I phoned Apple they were suitably contrite, even offering me £70 compensation for the error!&lt;/p&gt;

&lt;p&gt;Until this point (two weeks after the original order) I was very happy with Apple's support - no dumb scripts, reasonably competent staff. Sadly this didn't last. While the phone staff continued to be helpful and informative about what was going on, I had to keep phoning to get an update because the supervisor who was supposed to be dealing with the problem was too busy doing other stuff. The cause seemed to be that the department which handled replacements didn't believe that US keyboards were available in the UK (despite what the online store said) and the support people were incapable of fixing or working around this failure.&lt;/p&gt;

&lt;p&gt;What was worse was that it took another three weeks to come to this conclusion. (I was away in France for one of them, during which I didn't chase Apple so nothing happened.) I couldn't use my new iPod because I didn't want to have the pain of re-pairing and re-populating it with music etc. This all made me extremely cross.&lt;/p&gt;

&lt;p&gt;In the end I had to return the second MacBook for a refund and buy a third one with the correct configuration, which finally arrived six weeks after the original order. Fortunately this didn't affect my eligibility for the rebate - in fact they had already sent me the £95, though I hadn't sent in the proof-of-purchase paperwork they said they needed. (Um, I didn't notice the extra money until this month!) Unfortunately they cancelled the compensation they promised me! They fixed that after I made an irate phone call, but sheesh, I still get angry when remembering it.&lt;/p&gt;

&lt;p&gt;But I suppose it's all OK in the end. The MacBook and iPod Touch are things of astounding beauty and utility. The iPod utterly blows away the Nokia 770 I previously used for spodding, and the MacBook replaces both my elderly PC laptop and my Mac Mini (though the latter will probably become a backup server). It's really nice to be able to hack away on the comfy sofa, and I'm playing choonz via the Airport Express a lot more than I did.&lt;/p&gt;</content>
  </entry>
  <entry>
    <id>urn:lj:livejournal.com:atom1:fanf:96172</id>
    <link rel="alternate" type="text/html" href="http://fanf.livejournal.com/96172.html"/>
    <link rel="self" type="text/xml" href="http://fanf.livejournal.com/data/atom/?itemid=96172"/>
    <title>RJ45 is too fat</title>
    <published>2009-01-16T17:11:15Z</published>
    <updated>2009-01-16T17:12:00Z</updated>
    <content type="html">&lt;p&gt;One notable feature of &lt;a href="http://images.apple.com/macbook/images/specs_connections20081014.jpg"&gt;the ports on the new MacBook&lt;/a&gt; is that the RJ45 Ethernet socket only just fits - it wouldn't if the machine were much thinner. In fact &lt;a href="http://images.apple.com/macbookair/images/specs_peripheral20081014.png"&gt;the MacBook Air&lt;/a&gt; notoriously doesn't have any wired ethernet connection at all, relying on IEEE 802.11n WiFi for connectivity.&lt;/p&gt;

&lt;p&gt;Of course other manufacturers are making expensive thin laptops. The Dell Adamo neatly solves the RJ45 thickness problem by putting &lt;a href="http://www.engadget.com/photos/more-adamo-images/"&gt;the port behind the display hinge&lt;/a&gt;, where the machine is as thick as the body plus the display.&lt;/p&gt;

&lt;p&gt;The Voodoo Envy's solution is to &lt;a href="http://www.engadget.com/2008/10/03/voodoo-envy-133-unboxing-and-impressions/"&gt;put the ethernet socket in the power supply&lt;/a&gt;, which acts as a WiFi hot spot which the laptop uses for connectivity. This is cute but it means you only get 54Mb/s because the base station doesn't have 802.11n support.&lt;/p&gt;

&lt;p&gt;A cheaper and faster alternative would be to pass a gigabit Ethernet connection through from an RJ45 on the power supply to a combined power   network port on the laptop. The connector could be much thinner than RJ45 without sacrificing compatibility.&lt;/p&gt;</content>
  </entry>
  <entry>
    <id>urn:lj:livejournal.com:atom1:fanf:95831</id>
    <link rel="alternate" type="text/html" href="http://fanf.livejournal.com/95831.html"/>
    <link rel="self" type="text/xml" href="http://fanf.livejournal.com/data/atom/?itemid=95831"/>
    <title>Weird network bug</title>
    <published>2008-12-01T20:13:50Z</published>
    <updated>2008-12-01T21:18:53Z</updated>
    <content type="html">I just spent a couple of hours debugging a strange network problem, which I haven't seen before.&lt;br /&gt;&lt;br /&gt;The initial description was that email relayed from one departmental email server to another via my email server sometimes failed with a timeout error. This only occurred with messages with multiple recipients.&lt;br /&gt;&lt;br /&gt;My initial suspicion was that it might be related to an old (now fixed) Exim callout verification bug, which had caused us some problems in the past. If a callout's target server was slow, and there were many recipients needing callout verification, and the client used pipelining, the total time for the callout delays could add up to more than the client's timeout, so it wouldn't see Exim's pipelined replies. Exim now flushes its replies before doing a callout to avoid this problem.&lt;br /&gt;&lt;br /&gt;However, some testing showed that the relevant servers were quite swift, which cast some doubt on this diagnosis.&lt;br /&gt;&lt;br /&gt;The problem report included some good details which were unfortunately too old for me to be able to correlate with our logs. I had a look to see if there was anything more recent that I could examine. It turned out that there were a couple of messages on our queues that had been delayed by this problem. I ran a test delivery of one of them and it did in fact time out between us sending the message envelope and them sending the reply. Consistent with my guess, but not consistent with other testing.&lt;br /&gt;&lt;br /&gt;I tried a delivery with full debugging from Exim, but there was no enlightenment. I tried to send a copy of the problem message's envelope manually. This worked. Um, WTF? What was the difference between what Exim did and what I did?&lt;br /&gt;&lt;br /&gt;I ran tcpdump to analyze the two connections. It turned out that Exim was sending the whole envelope (MAIL, RCPT, RCPT, ..., DATA) in one packet, whereas my cut-and-paste into telnet split the envelope into a packet for the MAIL command and one for the rest. So I created a file containing the envelope and used a little shell script to drive telnet. Bingo. I had reproduced the problem.&lt;br /&gt;&lt;br /&gt;At this point I thought it might be an MTU-related problem, e.g. blocking ICMP "Fragmentation Needed" messages. I tried pinning down the packet size at which the problem started occurring, with a guess of something related to 512 bytes. After a couple of goes I noticed that the problem message had an envelope of 513 bytes, and an envelope of 512 bytes worked OK.&lt;br /&gt;&lt;br /&gt;Then I tried a larger envelope, and - boggle - it worked. 512 or less worked, 513 failed, 514 or greater worked. This also explained why the problem affected message envelopes but not message data. The usual symptom of MTU firewall problems is timeouts at the message data stage, not during the envelope, and in this case message data was getting through fine. (The overhead of message headers makes it very unlikely that a message will have as little as 513 bytes of data.)&lt;br /&gt;&lt;br /&gt;A couple of friends suggested testing 1025 as well, and it demonstrated similar but weirder problems. Again 1024 worked, 1025 failed, and 1026 worked. However in the failure case, the timeout occurred after I received the destination's replies to the first half of the envelope - enough commands to fit in 512 bytes. A closer look at both the 513 and 1025 cases revealed that I was getting TCP ACKs for all the data I sent, but something was going wrong after that.&lt;br /&gt;&lt;br /&gt;I guess the problem is a firewall that's doing some kind of TCP interception and re-segmentation, and getting it wrong. The ACKs would therefore have been generated by the firewall and not by the actual destination. Someone needs to be given a good whipping with a copy of &lt;a href="http://web.mit.edu/Saltzer/www/publications/endtoend/endtoend.pdf"&gt;end-to-end arguments in system design&lt;/a&gt; before having their firewall programming licence revoked.</content>
  </entry>
  <entry>
    <id>urn:lj:livejournal.com:atom1:fanf:95681</id>
    <link rel="alternate" type="text/html" href="http://fanf.livejournal.com/95681.html"/>
    <link rel="self" type="text/xml" href="http://fanf.livejournal.com/data/atom/?itemid=95681"/>
    <title>Licence revoked</title>
    <published>2008-11-18T12:26:57Z</published>
    <updated>2008-11-18T12:27:58Z</updated>
    <content type="html">&lt;p&gt;I'm happy to say that the problem I ranted about in &lt;a href="http://fanf.livejournal.com/95404.html"&gt;my previous entry&lt;/a&gt; has been fixed. The general &lt;a href="http://www.admin.cam.ac.uk/committee/isss/otherguidelines/bulkemail.html"&gt;guidelines for bulk email&lt;/a&gt; have been restored, and the new &lt;a href="http://www.admin.cam.ac.uk/committee/isss/otherguidelines/internalbulkemail.html"&gt;guidelines for &lt;strike&gt;intraspam&lt;/strike&gt; large internal mailing lists&lt;/a&gt; have been published on their own page.&lt;/p&gt;</content>
  </entry>
  <entry>
    <id>urn:lj:livejournal.com:atom1:fanf:95404</id>
    <link rel="alternate" type="text/html" href="http://fanf.livejournal.com/95404.html"/>
    <link rel="self" type="text/xml" href="http://fanf.livejournal.com/data/atom/?itemid=95404"/>
    <title>Licence to spam</title>
    <published>2008-11-06T19:30:48Z</published>
    <updated>2008-11-07T08:56:52Z</updated>
    <content type="html">&lt;p&gt;Over a year ago we had some discussions inside the Computing Service about better provision for large mailing lists within the University. At the moment we rely too much on paper spam, and electronic spam is distributed to staff by departmental administrators who manually forward irrelevant utterances from the Old Schools. My hope was that officially sanctioned mailing lists for internal communications would improve this situation because, as well as being more efficient, there would be better moderation and recipients would be able to unsubscribe. We approved a document which I thought was quite good, and which I expected would live alongside our existing, very general, bulk email guidelines. This then went to the IT Syndicate for approval.&lt;/p&gt;

&lt;p&gt;Unfortunately the IT Syndicate was disbanded at about this time to be replaced by an Information Services and Strategy Syndicate which has a broader remit. The large list policy got dropped for several months as the committee rebooted. When the work item was picked up again it somehow got corrupted. It was rewritten with absolutely no input from us, and it replaced the bulk email policy instead of being adopted alongside it. As a result there is now almost no policy against spam intentionally sent by University members to recipients inside &lt;i&gt;and outside&lt;/i&gt; the University. I am extremely annoyed.&lt;/p&gt;

&lt;p&gt;&lt;a href="http://www.admin.cam.ac.uk/committee/isss/otherguidelines/bulkemail.html"&gt;The new policy is on the ISSS web site&lt;/a&gt; and I'll put a copy of the old policy behind a cut to preserve it after Google's cache expires.&lt;/p&gt;

&lt;a name="cutid1"&gt;&lt;/a&gt;

&lt;h2&gt;Guidelines on Use of Bulk Email in Cambridge&lt;/h2&gt;

&lt;p&gt;Although the use of bulk email can on occasion be in the interests of the University, it can nevertheless present real problems and dangers. Guidance on the use of bulk email is often sought but it is very hard to be precise. Nevertheless, it is felt that an attempt should be made and the IT Syndicate has approved the following guidelines.&lt;/p&gt;

&lt;h3&gt;Terminology&lt;/h3&gt;

&lt;p&gt;Bulk email is identical email sent out to groups of individuals, irrespective of whether this is done by repeated sendings or single sendings, via mailing list addresses or any other means.&lt;/p&gt;

&lt;p&gt;Unsolicited email is email of a category that the individual did not request, irrespective of whether it is welcome.&lt;/p&gt;

&lt;h3&gt;Under what circumstances is bulk email a bad idea?&lt;/h3&gt;

&lt;p&gt;It is unacceptable to do anything likely to invite external retaliation (such as black-listing) against the University or any part thereof. It is therefore never acceptable to send bulk unsolicited email out of the CUDN.&lt;/p&gt;

&lt;p&gt;Even within the University it is normally unacceptable to send bulk unsolicited email unless there is reason to suppose that a substantial proportion of the recipients will be interested or need to know.&lt;/p&gt;

&lt;p&gt;Although an individual may have assented to certain bulk email implicitly by being on some particular list, or indeed by virtue of his or her position, say as an employee or student, caution is needed in any such presumption. Assent to mail inappropriate to a list is never implied, and mere appearance in some email directory does not imply assent either.&lt;/p&gt;

&lt;p&gt;It is unacceptable to send email in such bulk as would overwhelm the underlying mechanisms, be this on account of the number of ultimate targets, volume of data transmitted, volume of data consequentially stored or anything else.&lt;/p&gt;

&lt;p&gt;Electronic bulletin boards are frequently more appropriate than email and should be preferred wherever possible and reasonable.&lt;/p&gt;

&lt;p&gt;Bulk unsolicited email is frequently counter-productive and can generate considerable annoyance. The utmost caution and reluctance should precede any emission of bulk email.&lt;/p&gt;

&lt;p&gt;Rules and guidelines that apply to ordinary individual or personal email apply at least as strongly to bulk email, strictures being the more significant on account of the size and generality of the audience. See the section on mail etiquette at &lt;a href="http://www.cam.ac.uk/cs/email/etiquette.html"&gt;http://www.cam.ac.uk/cs/email/etiquette.html&lt;/a&gt;, and the IT Syndicate guidelines on the use and misuse of computing facilities at &lt;a href="http://www.cam.ac.uk/cs/itsyndicate/rules/guidelines.html"&gt;http://www.cam.ac.uk/cs/itsyndicate/rules/guidelines.html&lt;/a&gt;.&lt;/p&gt;

&lt;h3&gt;Points to consider in bulk email&lt;/h3&gt;

&lt;p&gt;If it is accepted that some particular instance of bulk unsolicited email is appropriate, surviving the guidelines above, these further guidelines should be applied to the composition and transmission, for essentially the same reasons.&lt;/p&gt;

&lt;p&gt;The email should be legible on the most basic of equipment, not requiring anything that recipients might reasonably not have. In particular, it should be wholly in plain text, not encoded in any way, and certainly not include any part in proprietary format.&lt;/p&gt;

&lt;p&gt;The message should be short, perhaps a page at most.&lt;/p&gt;

&lt;p&gt;If the object is to draw attention to bulkier material or material inherently in other formats, then references to this (e.g. URLs) can be included, which interested recipients can pursue. There is still no warrant to contravene the preceding two paragraphs.&lt;/p&gt;

&lt;p&gt;The message as a whole should make it plain that it is indeed a circular, and should make plain the general constituency of its recipients.&lt;/p&gt;

&lt;p&gt;The headers should be such as to prevent any replies accidentally going to the whole constituency as well. Expert advice should be sought as to specific methods.&lt;/p&gt;

&lt;p&gt;If the constituency is large it may well be appropriate to take special steps to minimize the logistical impact. It is always the responsibility of the sender to verify that the underlying systems can cope with what he or she wants to do. Expert advice should be sought.&lt;/p&gt;

&lt;p&gt;The address list used should be appropriate for the message. In particular , see the posting policy in &lt;a href="http://www.cam.ac.uk/cs/docs/leaflets/G90/"&gt;the Computing Service leaflet on mailing lists, G90&lt;/a&gt;.&lt;/p&gt;

</content>
  </entry>
  <entry>
    <id>urn:lj:livejournal.com:atom1:fanf:95018</id>
    <link rel="alternate" type="text/html" href="http://fanf.livejournal.com/95018.html"/>
    <link rel="self" type="text/xml" href="http://fanf.livejournal.com/data/atom/?itemid=95018"/>
    <title>REST FAIL</title>
    <published>2008-11-05T19:07:12Z</published>
    <updated>2009-04-30T10:39:43Z</updated>
    <content type="html">&lt;p&gt;This afternoon my colleague Andy gave a talk about Microsoft Exchange 2007, which I attended since I need to know what the competition has to offer and I have to provide support for its SMTP interfaces. One thing he mentioned which perked up my interest was "autodiscovery" for Outlook 2007. &lt;a href="http://fanf.livejournal.com/39428.html"&gt;The unnecessary difficulty of configuring email software&lt;/a&gt; has irritated me for a long time, so after the talk I immediately went to find out more. It turns out to be a mixture of good stuff and utter FAIL.&lt;/p&gt;

&lt;p&gt;&lt;a href="http://technet.microsoft.com/en-gb/library/cc511507.aspx"&gt;The documentation&lt;/a&gt; describes three ways that Outlook 2007 can configure user accounts automatically - server names, security requirements, etc. If you are logged into a Windows Domain then it first tries querying the Active Directory. If that succeeds then it can find &lt;em&gt;everything&lt;/em&gt; out by itself. Nice. If not, it falls back to an open protocol, which any standards-based mail server can implement, that configures the server settings automatically given an email address. More about this below.&lt;/p&gt;

&lt;p&gt;If server-supported autodiscovery doesn't work, Outlook tries to guess the settings by attempting various combinations of host name, port number, TLS or not, POP or IMAP, etc. and stopping when it finds something that works. I think this is a great idea, so it's a damn shame that it prefers to use the lame POP not the studly IMAP if both are available. By itself that is enough to make it worth our while to implement the autodiscover protocol; but because our primary mail domain is &lt;tt&gt;cam.ac.uk&lt;/tt&gt; but our service name is &lt;tt&gt;hermes.cam.ac.uk&lt;/tt&gt;, Outlook's guesses will fail. This has the advantage of preventing users walking blindly into bad configurations, but the disadvantage that they're less likely to be able to configure it at all.&lt;/p&gt;

&lt;p&gt;And so to the autodiscovery protocol itself. How can something so simple be fucked up in so many ways? It looks to me like it was bodged together by a clueless intern over two or three summer breaks, each time getting worse as the requirements evolved. The result is a case study in what &lt;a href="http://en.wikipedia.org/wiki/Representational_State_Transfer"&gt;REST&lt;/a&gt;ful protocol design is not.&lt;/p&gt;

&lt;p&gt;The essence is simple: the client sends a request containing the user's email address, and the server sends back an XML document containing the configuration settings. The request is sent using HTTP-over-SSL to a URL derived from the domain part of the user's email address. (In fact it tries &lt;tt&gt;https://&lt;i&gt;domain&lt;/i&gt;/autodiscover/autodiscover.xml&lt;/tt&gt; first, then tries &lt;tt&gt;https://autodiscover.&lt;i&gt;domain&lt;/i&gt;/autodiscover/autodiscover.xml&lt;/tt&gt;.) So far, so good.&lt;/p&gt;

&lt;p&gt;The first mistake is that the request is a POST with the email address contained in another XML document in the request body. It would have been more RESTful to use a GET with the email address encoded in the URL. This mistake turns into FAIL when you realise that most email services have the same settings for all users, in which case &lt;tt&gt;autodiscover.xml&lt;/tt&gt; can be a static file, but many web servers do not allow POST requests to static files. If they had done it right, using a GET request with the email address in the query string would have Just Worked for both CGI scripts and static files.&lt;/p&gt;

&lt;p&gt;It gets worse when they bodge around this mistake. A POST to a file commonly results in a "405 Method Not Allowed" error. Microsoft specify that if this is the case for your web server, you should configure it to send the &lt;tt&gt;autodiscover.xml&lt;/tt&gt; file in the error reply, as if the request was successful. A foul perversion of the protocol.&lt;/p&gt;

&lt;p&gt;It gets worse when they add support for virtual domains. The requirement seems to be that the service provider wants to host the &lt;tt&gt;autodiscover.xml&lt;/tt&gt; script centrally, and doesn't want to fork out for an SSL certificate for every virtual domain. So if both of the &lt;tt&gt;https&lt;/tt&gt; URLs fail, Outlook tries a GET request to the &lt;tt&gt;autodiscover.&lt;i&gt;domain&lt;/i&gt;&lt;/tt&gt; URL, which can redirect to the central &lt;tt&gt;autodiscover.xml&lt;/tt&gt;. However that's not secure enough so there must also be an &lt;tt&gt;_autodiscover._tcp.&lt;i&gt;domain&lt;/i&gt;&lt;/tt&gt; SRV record in the DNS pointing to the same central web server. However that's still not secure enough so Outlook also admonishes the user to click OK in a popup dialogue box.&lt;/p&gt;

&lt;p&gt;It gets worse when you look at the &lt;tt&gt;autodiscover.xml&lt;/tt&gt; "schema". It isn't a schema, it's a commented skeleton fill-in-the-blanks &lt;tt&gt;autodiscover.xml&lt;/tt&gt; file. That's just everyday lameness; the real FAIL is to be found in the various elements described therein. The redirect I mentioned above isn't an HTTP redirect, it's an &lt;tt&gt;autodiscover.xml&lt;/tt&gt; document containing various redirection settings. There are also settings for controlling the expiry time of the document, because of course configuring these in a &lt;tt&gt;.htaccess&lt;/tt&gt; file is too hard.&lt;/p&gt;

&lt;p&gt;Ugh. I think that just about exhausts the rantage this stuff has triggered. I am most of the way to implementing it; I just need to get the necessary SSL certificates. It turns out to be easier than I expected because newish Apache does not get upset by POST requests aimed at static files: it just throws away the request body and returns the file with a "200 OK". No need to fart around with &lt;tt&gt;ErrorDocument 405 autodiscover.xml&lt;/tt&gt;. With any luck it'll make Outlook &lt;a href="http://www.cam.ac.uk/cs/email/muasettings/outlook.html"&gt;trivial to get to work with Hermes&lt;/a&gt;.&lt;/p&gt;</content>
  </entry>
  <entry>
    <id>urn:lj:livejournal.com:atom1:fanf:94505</id>
    <link rel="alternate" type="text/html" href="http://fanf.livejournal.com/94505.html"/>
    <link rel="self" type="text/xml" href="http://fanf.livejournal.com/data/atom/?itemid=94505"/>
    <title>LOLauditors</title>
    <published>2008-10-08T12:35:45Z</published>
    <updated>2008-10-08T12:36:36Z</updated>
    <content type="html">&lt;p&gt;A colleague tells me that our auditors are quizzing our system administrators about our backup schedules. A wag asked them if they were interested in restores too. The auditors replied, "No, just backups."&lt;/p&gt;

&lt;p&gt;&lt;i&gt;PS.&lt;/i&gt; I bought a copy of the book about the Chronophage from Corpus Christi plodge for &amp;pound;8. It is a glossy A4 booklet of 44 pages with large type and lots of colourful pictures.&lt;/p&gt;</content>
  </entry>
  <entry>
    <id>urn:lj:livejournal.com:atom1:fanf:94411</id>
    <link rel="alternate" type="text/html" href="http://fanf.livejournal.com/94411.html"/>
    <link rel="self" type="text/xml" href="http://fanf.livejournal.com/data/atom/?itemid=94411"/>
    <title>More on the Corpus Christi Chronophage clock</title>
    <published>2008-09-30T10:34:10Z</published>
    <updated>2008-09-30T10:34:37Z</updated>
    <content type="html">&lt;ul&gt;
&lt;li&gt;I said the clock has three circles of 60 slits. In fact it has two circles of 60 and one of 48 - the hour circle has a slit for every quarter hour.&lt;/li&gt;
&lt;li&gt;&lt;a href="http://www.chronophage.co.uk/"&gt;A website about the clock&lt;/a&gt; is going live soon.&lt;/li&gt;
&lt;li&gt;A book about the clock is being published, and it will be available from the Corpus Christi porters' lodge amongst other places.&lt;/li&gt;
&lt;/ul&gt;</content>
  </entry>
  <entry>
    <id>urn:lj:livejournal.com:atom1:fanf:94043</id>
    <link rel="alternate" type="text/html" href="http://fanf.livejournal.com/94043.html"/>
    <link rel="self" type="text/xml" href="http://fanf.livejournal.com/data/atom/?itemid=94043"/>
    <title>The Corpus Christi Chronophage Clock</title>
    <published>2008-09-22T22:55:48Z</published>
    <updated>2008-09-22T22:55:48Z</updated>
    <content type="html">&lt;p&gt;I have been admiring the new clock on &lt;a href="http://maps.google.com/?ll=52.203705,0.117624&amp;amp;z=21"&gt;the corner of Trumpington St and Bene't St&lt;/a&gt;. Instead of having hands, the clock has three concentric rings hidden behind its face. The outer ring marks seconds, the middle ring marks minutes, and the inner ring marks hours. Each ring is driven by the next outer ring via a gearing mechanism that makes the rings tick from one mark to the next. It is fairly normal for the second hand's motion to be quantized in this manner, but more unusual for the minute and hour hands to tick like this too.&lt;/p&gt;

&lt;p&gt;Each ring sits in front of a circle of 60 blue LEDs, and the face has three circles of 60 slots which would allow the LEDs to shine through if the rings weren't in the way. The position of each disk is indicated by a slot which allows one LED to shine through the disk and the face to be seen by the viewers.&lt;/p&gt;

&lt;p&gt;In fact, each ring has 61 slots, one of which is usually aligned with a slot in the face to indicate the time, and 60 of which are normally out of alignment. When a ring ticks from one position to the next, its 60 normally-unaligned slots line up with the face in turn before the normally-aligned slot lines up with the next slot in the face. This has the effect of making it look like the ring has flown around 366&amp;deg; whereas it only moved by 6&amp;deg;&lt;/p&gt;

&lt;p&gt;I have made &lt;a href="http://dotat.at/random/clock.html"&gt;a little animation&lt;/a&gt; which shows how this works. It ticks once every 4 seconds and it only has 12 slots on the face and 13 on the disk, so that you can clearly see the mechanism behind the effect. I have drawn the face's slots in the outer circle and the ring's slots in the inner circle. It uses the new-fangled &amp;lt;canvas&amp;gt; HTML element, so you may need to upgrade your browser to view it.&lt;/p&gt;

&lt;p&gt;There is &lt;a href="http://www.admin.cam.ac.uk/offices/communications/1522.html"&gt;a video of the clock&lt;/a&gt; where some of its features are described by the designer, John Taylor, who made his fortune from his invention of the kettle thermostat.&lt;/p&gt;</content>
  </entry>
</feed>
