?

Log in

No account? Create an account

fanf

Configuration management

« previous entry | next entry »
2nd Apr 2009 | 17:10

We've been discussing configuration management in the office this week. None of us are happy with the way we're doing things...

On Hermes, we do most configuration management using rdist. On our management server we have a number of file trees containing files that differ from the default - a smattering of stuff in /etc, the contents of /home and /opt, and a few other bits scattered around. These trees are cloned and hacked for each flavour of machine (ppswitch, cyrus, lists, webmail, etc.) and each version of the underlying OS. These trees are mostly kept under revision control.

This setup has the advantage of simplicity, and it's easy to push out small changes. One key feature is rdist's testing mode, which makes it easy for us to ensure that the servers are in sync with the configuration tree without changing anything. We often run a test across a cluster of 10 or 16 machines in parallel. It's easy to selectively push a change to an idle machine for testing before rolling it out to the rest of the cluster. For more tricky changes I do a phased rollout so I can check for unwanted changes in behaviour without breaking the entire service at once.

Of course it has some serious disadvantages. We have to be root on the management host to be able to push changes out to the servers. We can't easily keep file ownership and permissions under revision control. This mechanism misses significant parts of the configuration, such as installed base OS packages and which rc scripts are enabled. There's also a lot of scope for improving our initial OS install scripts to reduce the amount that they get out of sync with the rdist configuration.

So we'd like something better. Our colleagues have different system management setups with different problems, and they are also looking for something better.

A couple of my colleagues have looked at Puppet but weren't happy with it. I dislike its basic design. Managed servers pull their configurations from the master, which means you must never screw up a change on the master, and it's harder to test changes - you have to explicitly set up test server profiles. Yes, you can run Puppet in push mode but it's often a bad idea to work against a program's basic architecture. Puppet also has its own security mechanisms, whereas I'd prefer to avoid multiplying channels of trust. Finally, I really hate writing configuration files for programs that write configuration files. It's a waste of brain cells to understand this superfluous abstraction layer: I just want to write the underlying configuration file directly.

Aside: this is why I really really hate autoyast, which uses an XML configuration file to control a program that writes configuration files that control rc scripts that manipulate the underlying configuration files for the programs you actually care about. It takes hours to work out how to make it produce the correct results.

I spent some time working on a program called pstx which was to be a suped-up replacement for rdist. It was going to be an sftp client (so it would require nothing special on the target servers) that had a simple configuration file to specify which directory trees to copy where, including ownerships and permissions (bringing them under revision control and removing the need to be root on the master), and possibly with added bells and whistles like remote diff, and a reverse (pull) mode. I also intended to make it easer to combine collections of files and thereby share common files. Sadly it's only about a third written and likely won't get much further.

One of the conversations this week was about how to reduce the chance of mistakes when rolling out changes, in which we discussed the use of revision control to help with testing and rollback. At the moment Hermes mostly uses CVS and other unixy bits of the CS share a Subversion repository, so the conversation very much assumed a central repository model - which fits in well with a master configuration server. Recently I have also been investigating git more seriously, so I thought it might be part of a solution.

The key things that git provides include a flexible and efficient network protocol for moving files around (flexible in that it can use ssh and http as well as git's native protocol, and efficient in that it's incremental and compressed), it can tell us what differs between a directory tree and the state of the repository, changes can be pushed and pulled, etc. The distributed version control functionality is way better at all this than pstx was ever going to be.

The big missing part is the ability to track ownerships and permissions: git only supports normal development checkouts which should be done using the developer's uid and umask. There are also some low-level problems with the way git performs checkouts: it removes the destination file before writing the updated version in its place. I would like a special deployment command which checks all the changed files out under temporary names, fixes their ownerships and permissions, then renames them into place. The first stage almost exists in the form of git-checkout-index --temp d, though it does weird things with symlinks. It doesn't look like much work to add the other stages.

This could provide some really nice workflows. A configuration change is created on a small topic branch, then pushed to the configuration repositories on all the machines - which changes none of the live configuration. You can then switch a test machine to the new branch with a single git deploy branch, or roll back by re-deploying the master branch. You can find out the current configuration of a machine in summary using git status or git diff to make sure the summary isn't a lie.

One interesting possibility might be to tame the clone-and-hack problem by using a shared repository for all the different classes of machines. This should make it easier to see the differences between each kind of machine and spot spurious ones.

One thing that is very manual in our current setup is hupping daemons to make them load an updated configuration. It might be feasible to use a post-deploy hook to do this, provided the hook script is given enough information that it can avoid unnecessary restarts.

Now I just need to find some time to turn the idea into code...

| Leave a comment | Share

Comments {11}

yes!

from: jmason
date: 2nd Apr 2009 22:01 (UTC)

definitely git.

Reply | Thread

Thorfinn

from: thorfinn
date: 3rd Apr 2009 03:08 (UTC)

Are you stuck with free beer software, or can your organisation afford to pay?

Perforce is extremely awesome, and as far as non-free VCS goes, it's pretty much not only the best in terms of functionality, but in terms of price. They have a free two client demo that you can use to try it out.

Reply | Thread

Tony Finch

from: fanf
date: 3rd Apr 2009 11:31 (UTC)

I'm aware of p4, mainly through FreeBSD. It fails for this application because it isn't distributed. As far as I can tell git is winning the VCS competition so I don't plan to spend time on anything else. Its open architecture means it should be straightforward to add the functionality I need without having to hack on its innards,

Reply | Parent | Thread

Thorfinn

from: thorfinn
date: 4th Apr 2009 07:34 (UTC)

When you say, "distributed", I assume you mean "must be able to do commits whilst offline"?

p4 certainly is able to do work whilst offline (since your client views are entirely stored on local disk), it just isn't able to do checkout/commit type stuff.

But yes, if you want a free VCS, git is probably the best thing out there.

Reply | Parent | Thread

Tony Finch

from: fanf
date: 5th Apr 2009 19:13 (UTC)

No, "distributed" means that there's no central repository. I suppose that isn't actually necessary for VCS-based configuration deployment, especially if your setup is big enough to include a host dedicated to configuration management. For smaller setups (a pair of servers, say) I expect that a masterless configuration system would give you a useful amount of extra flexibility in how you can manage things.

Reply | Parent | Thread

Shae Erisson

darcs?

from: shae
date: 15th Apr 2009 18:49 (UTC)

Darcs can be masterless/distributed/etc and does http and/or ssh as well email or whatever else, but does not preserve ownership and permissions better than git. On the other hand, darcs isn't nearly as cross-arch as git, so...

Much of this sounds suspiciously like what apt does in Debian, but I can't think of a good way to steal that functionality for a non-Linux Unix.

Anyway, I look forward to hearing your conclusions.

Reply | Parent | Thread

Tony Finch

Re: darcs?

from: fanf
date: 18th Apr 2009 10:17 (UTC)

Darcs is interesting but lacking in mindshare. I suspect that its patch-based design makes it harder to do what I want compared to git's directory-tree-based design.

I don't want to use a packaging system for several reasons. Firstly, they aren't standardized across different unixes. Using one would add a lot of extra complexity to the build system. Package managers are designed to solve the software distribution and selection problems not the configuration management problem. My systems are small enough (as in the number of locally maintained packages) that I don't have a dependency tracking problem.

I don't need anything that apt does. What I want is dpkg's installer with the package gubbins replaced by a DVCS, but that part of dpkg is too small and too welded to the package gubbins to be worth re-using.

Reply | Parent | Thread

alsuren

etckeeper

from: alsuren
date: 3rd Apr 2009 14:36 (UTC)

The issue you describe of losing permissions is discussed at length in https://bugs.launchpad.net/bzr/+bug/67589

The key ideas can be summarised as "bzr is for source code, not configs"
"arch can preserve permissions... but is also rubbish"
"etckeeper has a cool hack of storing a shell script called .etckeeper which can be sourced as root at the end of an update in order to fix permissions"
"associating permissions/meta-data with filenames is fragile: you probably want a plugin that can associate metadata with the VCS' internal file id"

Do check out etckeeper though, and tell me how it goes.

Reply | Thread

Tony Finch

Re: etckeeper

from: fanf
date: 3rd Apr 2009 15:04 (UTC)

Thanks for the pointers. Yes, I'm aware of etckeeper; there's also a hook script for saving and restoring permissions that comes with git in its contrib directory. Both of these only solve half of the problem: the other half is installing new versions of files in a manner that's as safe as possible on a live system, i.e. renaming the new file into place instead of overwriting the old one.

Reply | Parent | Thread

from: anonymous
date: 7th Apr 2009 20:07 (UTC)

We used Slack (http://code.google.com/p/slack/) at a previous gig to handle this sort of problem. It might be worth taking a quick peek at it.

Reply | Thread

Tony Finch

from: fanf
date: 8th Apr 2009 11:19 (UTC)

Thanks for the pointer.

Reply | Parent | Thread