Log in

No account? Create an account



« previous entry | next entry »
19th Sep 2005 | 11:23

Last week, a discussion started on exim-users about how Exim's excessive number of little languages could be rationalized. (link). I have thought about this problem to some extent, so I wrote the following...

There are two sets of languages that are relevant to Exim: configuration languages (of which I count a generous handful) and extension languages (currently 2: Perl via ${perl and C via local_scan() and ${dlfunc). Configuration languages are important because they are the user interface of the program, and everyone has to live with them. Exim's problem is that it has too many sub-languages: two filter languages (Exim's own, plus Sieve); the ACL language; the driver language for routers etc; the list match language; the string expansion language; and regular expressions. This count is rather inflated: it's a bit of a cheat to count regexes separately, because nowadays they're part of every decent language, and list matching and string expansion function as the expression syntax to the other languages' statement syntax. But there's a lot of overlap and non-orthogonality, so plenty of room for improvement.

Time for a bit of terminology. Configuration languages are a subset of "domain-specific languages". The scope of the term is quite broad, and is fairly well illustrated by the "little languages" of the traditional Unix tools: typesetting languages like troff, tbl, pic; compiler generation tools like lex and yacc; text-processing languages like sed, awk; command languges like make and the shell; configuration languages like crontab, inetd.conf, printcap, termcap; etc. These may or may not be usable as general-purpose languages; the point is that they are targeted at a specific domain (i.e. purpose).

There is an observation that DSLs, especially for complicated pieces of software, either need to be programmable, or they become programmable as they accumulate features. The latter has happened to Exim twice (one, two). This leads to the argument that programmability should be designed in from the start; further more, if you base the DSL on an existing programming language then you don't have to do the language implementation yourself and can concentrate on the domain-specific code. Hence the idea of "embedded domain-specific languages": DSLs that are implemented within the framework of a programming language. We were speculating about replacing Exim's configuration language with a DSL designed for programmability, and I suggested making it an EDSL. Then we got into an agrument about which language should be the host for the embedding. So what makes a good host language? I think the most important thing is extensible flow control operators. The reason for this is that Exim's declarative configuration style hides quite a lot of flow complexity: many decisions are four-way (accept/reject/defer/pass) or more, and there is implicit short-cutting and iteration over addresses. The EDSL configuration should preserve this hiding of complexity, which means that configuration keywords like drop/deny/defer/accept have to be able to affect the control flow without requiring boilerplate from the user. This is even more important in the routers, where instead of dropping back into Exim's core, you usually want to skip to the next router. It's better to make the whole chain of routers a single routine (rather than one per router) because then the postmaster can code complicated routing decisions beyond the usual sequencing, but this in turn makes difficult demands of the host language.

Tcl is of course famously designed to be a host for EDSLs (such as expect); it isn't a particularly nice language in itself with its clumsy variable assignment and expression evaluation commands, but this is less of a problem if your common commands have rich semantics and it's compensated by Tcl's brilliance at non-standard flow control. This is mainly because it's easy to quote blocks of code for evaluation now or later, so unlike many languages it's trivial to define your own if command because order of evaluation is not rigid. In if [test] {then} {else} the [] specifies evaluation now and the {} specifies evaluation later, so the if command's implementation can just look at the value of its first argument and evaluate its second or third accordingly.

Lisp is another big EDSL host - in fact this is part of the culture of Lisp: when writing a Lisp program you first design an EDSL then you code the solution in your new language. Lisp is less nice than Tcl as the basis for a configuration language, though, because of the irritating superfluous parentheses. Still Emacs makes a plausible existence proof.

However both these languages suffer from lack of static checking (in the case of Tcl even at the level of basic syntax) which imposes a burden of testing on the postmaster which in an ideal world would be performed automatically. Which is why (apart from personal aesthetic preference) I suggest Haskell as the host language. Like Lisp, it has a culture of EDSLs. However these tend to focus on sets of "combinators" that are used to tie bits of code together - exactly the kind of do-it-yourself flow control we want to be able to do. In addition to that, the "monad" concept is brilliant for tucking away the implicit state manipulation and the short-cutting flow control that Exim does all the time, without cluttering the syntax seen by the user.

Functional programming has a reputation for taking "hair shirt" purity too far, to the extent of being useless for practical purposes. However, at least one plausible Internet server application has been written in Haskell, and Pugs is showing that it isn't a completely undigestable language for Perl hackers.

But at the moment this is just idle speculation - though I do have a cute name for the idea ("Elegant Configuration using Haskell for Internet Mail") - but it's unlikely to actually happen until I find some extra tuits...

| Leave a comment | Share

Comments {9}

from: anonymous
date: 19th Sep 2005 13:53 (UTC)

It's certainl;y possible to do "compile-time" checking in a Lisp-embedded DSL. As a proof of concept, I could hang out my "define a state machine for a game object ``AI''" that statically checks that all states transitioned to are defined in the relevant FSM. It doesn't, at the moment, check that all targets (bar possibly the first) have a possible transition to them, mostly because I felt that was less of an issue (an unused space it at worst code-clutter, a transition to a non-existing state means "no further processing" for that FSM instance). Don't let that dictate what you do or not do, though (if you want, I can post the definer macro and the one (so far) defined AI).

As for parentheses, it is a potential problem, though having a slightly syntaxing front-end may alleviate that.

Reply | Thread

Just a random swede

from: vatine
date: 19th Sep 2005 16:44 (UTC)

And of copurse, my browser's managed to log me out without me noticing. Ah well...

Reply | Parent | Thread


from: filecoreinuse
date: 19th Sep 2005 13:55 (UTC)

You could always go down the XSL route and abuse XML.

Remember the Microsoft mantra: It must be better, it is XML.

Reply | Thread

Steven J. Murdoch


from: sjmurdoch
date: 19th Sep 2005 15:59 (UTC)

Have you looked at Lua? It is designed as an extension language and is powerful but has a lightweight and very well designed C API. It does this by only having one compound type - the table, which replaces arrays, hashtables and objects. It does do syntax checking (as it compiles to bytecode), but not very much static type checking. I used it to script a fairly simple model checker written in Ada95 and found it to be very pleasant,.

Reply | Thread

Tony Finch

Re: Lua

from: fanf
date: 19th Sep 2005 16:05 (UTC)

The extension API of the language is much less important than the attributes of the language itself, because only the developers have to deal with the API.

Reply | Parent | Thread

from: hsenag
date: 19th Sep 2005 23:47 (UTC)

You'd probably want to start by embedding Hugs. Since it's written in C and is an interpreter, it should be relatively straightforward - and it does have an embedding interface. I've played around with doing this a bit before, but didn't get much past the stage of being able to get it to interpret arbitrary strings due to lack of time (to make a decent embedding, you'd obviously want to be able to pass in structured data and stuff).

Reply | Thread

embedded DSL

from: anonymous
date: 20th Sep 2005 14:16 (UTC)

Please use Ruby or Scheme.

Reply | Thread


from: nonameyet
date: 27th Sep 2005 06:38 (UTC)

I don't use any of the languages yet suggested (OK I do have some XML).
As you say, Exim has a lot of languages, but they are small. The net result is that I doubt that Echsim would mean I'd have to learn less than I had to in order to write my current config file.

Backward compatibility with the current config languages is so important, and I suspect so fiddly to test, that I'm not comfortable
with the idea of attempting to implement a convertor from Exim to Echsim.
I can imagine me wanting to run Exim for say five years
after Echsim becomes standard.

Reply | Thread

Tony Finch

from: fanf
date: 27th Sep 2005 11:09 (UTC)

Well this is unlikely to ever actually happen, and even if it does it's more likely to be a proof-of-concept rather than a complete replacement.

One of the big advantages of Exim for non-programmers is that it's shallow - the concepts tend to be fairly simple and self-contained. The speculation about a replacement is thinking along the lines of an axiomatic orthogonal design, which is nice for programmers but may not work so well for sysadmins.

Reply | Parent | Thread