?

Log in

No account? Create an account

fanf

Clustered hints databases for Exim?

« previous entry | next entry »
25th Jul 2005 | 17:21

Shortly before my wedding there was a discussion on the Exim-users mailing list about Exim's handling of its hints databases, which cache information about the retry status of remote hosts and messages in the queue, callout results, ratelimit state, etc. At the moment Exim just uses a standard Unix DB library for this, e.g. GDBM, with whole-file locking to protect against concurrent access from multiple Exim processes.

There are two disadvantages with this: firstly, the performance isn't great because Exim tends to serialize on the locks, and the DB lock/open/fsync/close/unlock cycle takes a while, limiting the throughput of the whole system; secondly, if you have a cluster of mail servers the information isn't shared between them, so each machine has to populate its own database which implies poor information (e.g. an incomplete view of clients' sending rates) and duplicated effort (e.g. repeated callouts, wasted retries).

The first problem is a bit silly, because the databases are just caches and can be safely deleted, so they don't need to be on persistent storage. In fact some admins mount a ram disk on Exim's hints db directory which avoids the fsync cost and thereby ups the maximum throughput. If you go a step further, you can take the view that Exim is to some extent using the DB as an IPC mechanism.

The traditional solution to the second problem is to slap a standard SQL DB on the back-end, but if you do this the SQL DB becomes a single point of failure. This is bad given that a system like ppswitch which is a cluster of identical machines that relay email does not currently have a SPOF. It also compounds the excessive persistence silliness.

It occurs to me that what I want is something like Splash, which is a distributed masterless database, which uses the Spread toolkit to reliably multicast messages around the cluster. Wonderful! The hard work has already been done for me, so all I need to do is overhaul Exim's hints DB layer for the first time in 10 years - oh, and get a load of other stuff off the top of my to-do list first.

If it's done properly it should greatly improve the ratelimit feature on clustered systems, and make it much easier to write a high-quality greylisting implementation. (A BALGE implementation is liable to cause too many operational problems to be acceptable to us.) It should also be good for Exim even in single-host setups, by avoiding the hints lock bottleneck. The overhaul of the hints DB layer will also allow Exim to make use of other more sophisticated databases as well as Splash, e.g. standard SQL databases or Unix-style libraries that support multiple-reader / single-writer locking.

| Leave a comment | Share

Comments {8}

The Lusercop

from: lusercop
date: 25th Jul 2005 18:14 (UTC)

Splash/spread has some interesting failure modes that Ben has yet to fix, in particular, if you end up with a high data rate going into spread (such as happens when a splash machine reappears in the spread group, but also under other circumstances) then spread falls over, and all of them crash. Splash needs some work as a result to make it do the rate-limiting itself (we saw this happen with splashcache at the bunker). You might also be interested to know that sepulchre (the machine which runs anoncvs.aldigital.co.uk) is probably still running FreeBSD 2.2.6... :-)

Check with Ben for the exact failure modes, but I know a fair amount of work was done on trying to establish what the problem was (though not fixing them, as far as I know).

Reply | Thread

Tony Finch

from: fanf
date: 26th Jul 2005 08:58 (UTC)

Hmm, do you know why Splash uses more traffic when a machine comes back? Is it because of data replication or just because of the Spread ring reconfiguring? We don't need reliable data replication because the hints data is just a cache - it can be replicated on demand or even not at all - so perhaps Splash is overkill.

Reply | Parent | Thread

The Lusercop

from: lusercop
date: 26th Jul 2005 09:49 (UTC)

OK, as I understand it, when a machine comes back, the machine's splash db gets populated from other things on the spread network, so the data replication you mention above. I think there is also some overhead from the spread ring reconfiguring, but as far as I recall, from what Ben said, the problem is the rate of data you inject into spread. I don't know how reliable the data replication is, but you might as well keep the hints data around if you can. Certainly it would be nice if splash got fixed, because it would mean that splashcache actually became useful for SSL session-key cacheing in clusters again, and I think Ben might possibly be doing his "it's open source, someone else can fix it" kind of attitude :-/

Reply | Parent | Thread

from: kaet
date: 25th Jul 2005 21:58 (UTC)

You could get a bit of the way with memcached, backed by the per-client databases.

http://www.danga.com/memcached/

Reply | Thread

Tony Finch

from: fanf
date: 26th Jul 2005 08:56 (UTC)

Actually, memcached by itself is probably sufficient - we don't need reliable storage because the data is just hints that can be reconstructed if it is lost.

Thanks for pointing that out.

Reply | Parent | Thread

Linz

from: k425
date: 26th Jul 2005 12:34 (UTC)

I have no idea about Exim, just wanted to say I've seen some of the wedding pics and you both looked fantastic. Congrats and best wishes!

Reply | Thread

memcache/splash

from: anonymous
date: 7th Aug 2005 07:25 (UTC)

So we need a nice abstraction layer which lets you choose db, splash, memcache etc :-)

I hadn't thought about Splash, but it makes sense for rate-limiting, where maybe you *do* care about data reliability. Also, being able to send logs via Spread to spreadlogd would be excellent in a cluster environment.

memcache has the advantage of being lightweight, and simple enough to deploy even in a single-node mailserver. Furthermore, memcached will expire records automatically, saving the need for a periodic exim_tidydb run. --BrianCandler

Reply | Thread

I fixed Splash!

from: anonymous
date: 16th Sep 2005 14:23 (UTC)

See http://www.links.org/?p=8

Reply | Thread