Log in

No account? Create an account


Never delete anything

« previous entry | next entry »
13th May 2009 | 22:57

How long will it be before it becomes normal to archive everything? It's already normal in some situations, and I think that's increasing. It's been the norm in software development for a long time. There's an increase in append-mostly storage systems (i.e. append-only with garbage collection) which become never-delete systems if you replace the GC with an archiver. Maybe the last hold-outs for proper deletion will be high data volume servers...

Anyway, I feel like listing some interesting append-only and append-mostly systems. A tangent that I'm not going to follow is the rise of functional programming and immutability outside the field of storage. Many of these systems rely on cryptographic hashes to identify stuff they have already stored and avoid storing it again, making append-only much more practical.

  • All version control systems, and software configuration management systems even more so. The former archive source code whereas the latter archive build tools and build products as well. DEC's Vesta SCM is particularly interesting, being based on a purely functional build language designed to maximize memoization - i.e. minimize unnecessary rebuilds. It's sort of ccache on steroids since it caches the results of entire module builds, not just individual source file compiles.
  • Nix is a purely functional package manager. Unlike most packaging systems like dpkg or rpm, Nix packages do not conflict with each other: you upgrade by installing new packages alongside your existing ones, then you stop running the old ones and start running the new ones.
  • Archival / backup systems, like Venti which is Plan 9's append-only filesystem. Apple's Time Machine isn't nearly as clever.
  • Most filesystems don't use hash-based uniquification. Append-mostly filesystems often provide cool undelete features like snapshots, e.g. NetApp's WAFL or Sun's ZFS. Early filesystems of this kind, e.g. BSD LFS tried to avoid wasting space, so didn't make old data available as snapshots, and sacrificed performance to eager garbage collection. More recently, DragonFly BSD's Hammer filesystem doesn't even have an in-kernel garbage collector, and running it is entirely optional.
  • Email archives: gmail's ever-increasing quotas, cyrus delayed expunge.

| Leave a comment |

Comments {6}


from: ewx
date: 14th May 2009 08:18 (UTC)

Some version control systems do have a means of removing historical data (e.g. p4 obliterate), for the practical reason that some information is illegal or tortious to possess.

Reply | Thread

Simon Tatham

from: simont
date: 14th May 2009 09:21 (UTC)

Or just tactically unwise. If you were to (say) accidentally check your password file in to your publicly visible svn repository, you'd definitely want a way to get rid of it permanently.

(In svn this can be done by making a text-file dump of the repository, hand-editing it, and then reloading it to a fresh actual repository, which is doable but an utter pain.)

Reply | Parent | Thread


from: ewx
date: 14th May 2009 09:24 (UTC)

In that case I'd regard all the passwords as compromised and change them, rather than attempt to lock the stable door post-bolt.

Reply | Parent | Thread

Tony Finch

from: fanf
date: 14th May 2009 11:06 (UTC)

Yes. Git makes it quite easy to throw away young revisions, though selectively getting rid of older stuff is harder.

I didn't talk about the underlying implementations much, I think because it's basically orthogonal whether they use a functional data structure or not, and expose functional semantics or not. Mercurial's implementation and semantics are functional. Git has functional semantics, but not a strictly functional implementation. AIUI Bitkeeper is based on SCCS files which are append-only, whereas Perforce uses RCS files which must be rewritten on each update.

Reply | Parent | Thread

Ben Harris

from: bjh21
date: 14th May 2009 11:03 (UTC)

As far as I know, the 4.4BSD LFS garbage collector (known as the "cleaner") has always been a user process and has always been optional. I certainly ran an LFS without one for a long while when I had lots of spare space and didn't want lfs_cleanerd holding up user I/O.

While LFS doesn't expose snapshots through the filesystem, dump_lfs constructs one internally.

Unfortunately, BSD LFS is proving unmaintainable for NetBSD, so we'll probably lose it soon.

Reply | Thread

Tony Finch

from: fanf
date: 14th May 2009 11:10 (UTC)

Oh right, I should have re-read the paper :-)

FreeBSD dropped LFS back in the dawn of time when its VM system got rewritten in ways that broke some of the more arcane semantics. The filesystem layering stuff (nullfs, unionfs) was another victim, and NFS suffered too :-/

Reply | Parent | Thread