?

Log in

No account? Create an account

fanf

Use your bonce

« previous entry | next entry »
15th May 2009 | 09:44

Vesta includes a purely functional programming language for specifying build rules. It has an interesting execution model which avoids unnecessary rebuilds. Unlike make, it automatically works out dependencies in a way that is independent of your programming language or tools - no manually maintained dependencies or parsing source for #include etc. Also unlike make, it doesn't use timestamps to decide if dependencies are still valid, but instead uses a hash of their contents; it can do this efficiently because of its underlying version control repository. Vesta assumes that build tools are essentially purely functional, i.e. that their output files depend only on their input files, and that any differences (e.g. embedded timestamps) don't affect the functioning of the output.

I've been wondering if Vesta's various parts can be unpicked. It occurred to me this morning that its build-once functionality could make a quite nice stand-alone tool. So here's an outline of a program called bonce that I don't have time to write.

bonce is an adverbial command, i.e. you use it like bonce gcc -c foo.c. It checks if the command has already been run, and if so it gets the results from its build results cache. It uses Vesta's dependency cache logic to decide if a command has been run. In the terminology of the paper, the primary key for the cache is a hash of the command line, and the secondary keys are all the command's dependencies as recorded in the cache. If there is a cache miss, the command is run in dependency-recording mode. (Vesta does this using its magic NFS server, which is the main interface to its repository.) This can be done using an LD_PRELOAD hack that intercepts system calls, e.g. open(O_RDONLY) is a dependency and open(O_WRONLY) is probably an output file, and exec() is modified to invoke bonce recursively. When the command completes, its dependencies and outputs are recorded in the cache.

bonce is likely to need some heuristic cleverness. For example, Vesta has some logic that simplifies the dependencies of higher-level build functions so that the dependency checking work for a top-level build invocation scales less than linearly with the size of the project. It could also be useful to look into git repositories to get SHA-1 hashes and avoid computing them.

It should then be reasonable to write very naive build scripts or makefiles, with simplified over-broad dependencies that would normally cause excessive rebuilds - e.g. every object file in a module depends on every source file - which bonce can reduce to the exact dependencies and thereby eliminate redundant work. No need for a special build language and no need to rewrite build scripts.

| Leave a comment |

Comments {15}

from: ptc24
date: 15th May 2009 09:20 (UTC)

Interesting how the "o" changes pronunciation when you contract (presumably) "build once" to "bonce".

Reply | Thread

Simon Tatham

from: simont
date: 15th May 2009 09:25 (UTC)

In 1997 I had a summer job working with some code stored in ClearCase, and that did a very similar set of things: interface to source code repository via magic filesystem instead of checkouts in ordinary user filespace, special make utility automatically tracking dependencies, caching of build results so it could (in its terminology) "wink in" files that had already been built to the identical specification by somebody else.

Unfortunately it didn't speed matters up much in my case, since the back end of ClearCase is a big database file which, brilliantly, somebody thought it would be a good idea to mount via NFS from the other side of the Atlantic. A wink-in from that database took almost as long as a local compile would have!

I too have often thought that it would be neat to construct a make utility that automatically tracked dependencies via LD_PRELOAD, but never quite had the energy to try it.

Reply | Thread

Tony Finch

from: fanf
date: 15th May 2009 10:26 (UTC)

Vesta has the advantage of a distributed replicated repository, so it should be local enough to be fast enough.

Reply | Parent | Thread

from: hsenag
date: 15th May 2009 10:53 (UTC)

Clearcase has that too (you might have to pay more for it) but it's still slow as hell.

Reply | Parent | Thread

from: ingulf
date: 15th May 2009 18:46 (UTC)

There is such a thing here: http://www.op59.net/yabs/readme.html

Reply | Parent | Thread

Tony Finch

from: fanf
date: 15th May 2009 18:52 (UTC)

Yay lazyweb :-)

Reply | Parent | Thread

Pete

from: pjc50
date: 15th May 2009 11:01 (UTC)

We have now developed our own build scheduler and cacheing system at Azuro, after finding that make+ccache+distcc is starting to show serious strain on 1.1m lines of C++ monolith. It took over 30 seconds simply for make to parse its .d files...

Reply | Thread

Simon Tatham

from: simont
date: 15th May 2009 11:09 (UTC)

Mmm. Yes, I have that problem at work (not nearly so many lines of code, but since I maintain a C library, that means a huge number of very tiny objects and hence no end of .d files).

I'm inclined to think that if I were writing a dependency-tracking make utility of this nature, I'd do the same trick as the IDE that came with BeOS: every compile generated a .d, but the IDE then proactively read it and incorporated its contents into the project file. I'm sure it would be possible to set up some sort of database file alongside the makefile such that you could get away without having to think about all the dependencies every time: as soon as you know which files have changed, a dependency database with the right data structure should be able to tell you what recompiles need doing in O(number of changes) time rather than O(size of entire project).

Reply | Parent | Thread

Tony Finch

from: fanf
date: 15th May 2009 13:33 (UTC)

If bonce is done right it should be able to do what your last sentence says - Vesta supposedly can.

Reply | Parent | Thread

Pete

from: pjc50
date: 15th May 2009 14:17 (UTC)

Related problem: we believe the dependencies are excessive (e.g a header file is included but nothing from it is actually referenced). The current solution to this is a tool someone here wrote called rince that can exhaustively try #include removal. However, that doesn't identify opportunities where a slight functional change could produce good decoupling benefits.

Ultimately we're going to have to make some architectural moats that are difficult to cross and have more than one compile unit, but that's a huge amount of work.

Reply | Parent | Thread

Tony Finch

from: fanf
date: 15th May 2009 14:34 (UTC)

One good example of excessive dependencies is all-encompassing header files, e.g. exim.h...

Reply | Parent | Thread

Simon Tatham

from: simont
date: 15th May 2009 17:21 (UTC)

A few months ago, in a different area at work, I found I wanted to do a similar sort of analysis for the purpose of finding out whether any of a substantial code base was using a particular cranny of a big sprawling API, so that I'd know how feasible it was to redesign that bit. I thought for a while about how it should be done and came to the conclusion that what I really wanted was a command-line option in gcc that would output diagnostic information every time it matched up a source-language definition to some use of that definition (whether it was the use of a type, the expansion of a macro, calling of a function or C++ method, reference to a structure field, you name it), including the file name and line number of both the reference and the definition. Given that diagnostic data from the entire code base, I could easily have grepped through it to find out what I needed.

(This would have been easy enough in pure C, where every function would have had a distinct name and I could have got close enough just by grepping for all the function names. In C++ it's much more of a pain, because ninety-seven classes all have similarly named methods and only an actual compiler can disambiguate which occurrences of DoFoo() in the source code are references to which of them...)

Turned out that gcc was not designed helpfully for such an option – its front end has nothing remotely resembling a centralised point at which refs and defs get matched up. So I resorted to the low-tech approach of renaming everything in the affected region of header file to put a z on the front and then seeing what failed to compile, which was just about feasible given the volumes involved in my case.

But it now occurs to me that that sort of data would also be about right for trimming makefile dependencies as you suggest. Hmm. Perhaps someone ought to at least stick it on gcc's BTS, if it has a suitable one.

Edited at 2009-05-15 05:22 pm (UTC)

Reply | Parent | Thread

(Deleted comment)

Tony Finch

from: fanf
date: 15th May 2009 16:15 (UTC)

Yay lazyweb :-)

Reply | Parent | Thread

ccache

from: anonymous
date: 17th Feb 2010 03:07 (UTC)

It's called ccache.

http://ccache.samba.org/

Reply | Thread

Tony Finch

Re: ccache

from: fanf
date: 17th Feb 2010 14:58 (UTC)

ccache doesn't do dependency analysis. It operates on a per-translation-unit basis after the source has been preprocessed, so the preprocessor limits the amount of time it can save you. It also cannot skip build commands that aren't C compiler invocations, whereas bonce could.

Reply | Parent | Thread