iCalendar is wrong

9th Dec 2009 | 23:20

(This article is a much-expanded version of a comment I wrote months ago on mathew's blog.)

There's a programmers' rule of thumb that timestamps should always be stored in a form that's unambiguously inter-convertible with UTC, or some reasonable approximation such as POSIX time_t. In particular, you should never store local time without also storing its timezone, and you should represent timezones as UTC offsets instead of using a familiar but ambiguous abbreviation. For textual representations the right answer is usually ISO 8601 / RFC 3339.

This rule of thumb is good if you are storing the times of events that happened in the past, such as in logs or in message headers. However it isn't good for events that happen in the future, when those events have any bearing on time as used by people in the outside world. The reason for this is the instability of time zones.

Bad solutions to timezone problems

The problem is particularly clear for repeating events. If you specify an event's time of day using a fixed offset from UTC then it will be an hour wrong for half of the year when because the time zone offset is different in winter and summer time. This is why Unix's cron scheduler works in local time.

The solution chosen for iCalendar is to store the complete timezone data (summer and winter offsets and the changeover schedule) alongside the event. It has a number of problems. Firstly bloat, though iCalendar reduces that by allowing multiple time stamps to refer to the same timezone data. Secondly, it isn't robust against changes to a timezone's DST schedule.

An outstanding example of failure caused by storing timestamps in the wrong format was provided by the US DST schedule change in 2007. People running Microsoft Exchange had to run a special tool that scanned the entire database to find timestamps that needed adjusting. If the data model had been designed properly this would not have been necessary.

The underlying error is to do the timezone data lookup too early. As we learned from David Wheeler, the fix is to add a layer of indirection so that the lookup can be delayed. Instead of storing the numerical offset, store a reference to the timezone, e.g. its name from the Olson tz database.

iCalendar TZID values are typically Olson timezone names, or something very similar, but this is not required by the specification. There is still no interoperable standard for timezone names, so iCalendar objects have to include the complete VTIMEZONE data, not just the name. There are plans to fix this, but it's unclear if the standard timezone name registry will be based on the Unicode Common Locale Data Repository, or perhaps the Olson tz database (depending on how its management changes around ADO's retirement) or something else.

Unfortunately timezone names are still not a complete solution. As well as DST schedule changes, there are often timezone boundary changes. If an event is to happen in a place that is affected by a boundary change, and its time is recorded with respect to the place's old timezone, then this time will be wrong after the change. Indiana has provided many instances of this problem, since it straddles a timezone boundary, each county in the state chooses its timezone independently, and every so often some of them will change their mind about whether they want to follow Central Time or Eastern Time or even both (depending on the time of year). The solution to unpredictable timezone boundary changes is, of course, another layer of indirection.

My solution

The time of an event in the future should be recorded in local time coupled with the event's location. The location is used to look up the timezone, and the timezone data determines the UTC offset. (I should probably clarify that Olson tz names are not locations even though they are derived from locations. It's nonsense to say that the Edinburgh Tattoo will occur in Europe/London.)

Recording the location of an event instead of its timezone makes all sorts of problems simpler, not just problems resulting from timezone mutations. A lot of the benefit comes from just making the data aware of locations and the effect they have on scheduling. Also, perhaps unexpectedly, it allows extremely simple platforms that are unaware of timezones!

Often all that is required of a PDA calendar is to keep a single person's appointments, and the times only need to be meaningful wherever that person is going to be when the event occurs. In this simple case, if all the times are stored in local time at the appointment's location, the PDA does not need to do any timezone translation in order to display them in a useful way: the stored time is good enough. In this scenario, the only timezone manipulation that occurs is the user manually resetting the PDA's clock when a timezone offset change happens (because of travel or because of DST).

It's more usual to want to share calendar events, in which case you soon encounter situations where it's useful to know when events in other timezones will occur according to your own local time. If the software knows your current location, it's a straightforward matter to translate times from place to place. This should not be done significantly earlier than when displaying the time. For example, in a calendaring app based on early binding of events to timezones, the programmer might be tempted to translate an event's time to the user's local timezone when importing the event. This optimization is clearly bogus in a location-based app, because it amounts to moving the event to a location where it is not occurring!

One case where it seems not to make sense to fix an event in a location is when it occurs in more than one place: telephone calls or (worse) conference calls. The thing to do in this situation is to decide on a primary location, such as the location of the organizer, and list the other locations as supplementary. This allows the software to display all the relevant times, so it's immediately apparent what the timing is for each participant and if it happens to be inconvenient for any of them. If politicians happen to muck around with any of the timezones the organizer is naturally responsible for any adjustments that may be necessary, so it makes sense to keep their view of the event as straight-forward as possible.

An interesting case is travel between timezones. It's usual for flight bookings to give departure and arrival times in the local time of the origin and destination locations, which I always find confusing. However if a computer has this information, it can easily display both times in both timezones and work out the total travel time. It would be even nicer if your PDA could use this information to automatically update its idea of your location, and therefore its idea of local time. If it can use this method to work out where you will be in the future it could also display future events with all three relevant times: their native time, the time according to your current timezone, and according to the timezone of your location when the event occurs.

iCalendar has the concept of a "floating" timestamp, which represents the time in whatever is your current timezone. Floating timestamps cannot be communicated reliably to another person, because the time they represent will be interpreted according to the recipient's location, not yours. One way to make them reliable would be to add another layer of indirection: attach an event to a person and provide a way of looking up the person's location. This is absurdly complicated and an invasion of privacy, and I think it shows that the concept of floating events (occurring wherever you are at the time) is unwise. They do make sense for purely personal events, such as wake-up alarms or medication reminders - you don't want your PDA to tell you to wake up in Cambridge as usual when you are currently in New York. But if an event involves more than one person and its location is in doubt, it's better to give it a provisional location so that changes have to be communicated explicitly.

With a local time plus location model, if a timezone does change, the only events that are affected by the change from the point of view of the software are also affected from the point of view of the human world. For instance, a conference call that spans multiple timezones may need to be rescheduled because its local time may change in some of the participants' locations, and this may lead to scheduling clashes that were not there when the call was originally organized. Events at a single location that occur near the old and new clock changes may need to be rescheduled to cope with inserted or omitted hours - but it's rare to schedule events for the small hours of Sunday morning. The majority of events that fall between the old clock change and the new clock change are not affected: no special bulk data fix-up tools are required.


The local time plus location model is not quite sufficient as I have described it so far. If an event is scheduled near the time the clocks go back, the local time by itself is not enough to tell if it occurs in the hour before or the hour after the change. The way to fix this is to add a disambiguation flag. However, once again the usual way this is done is wrong. POSIX struct tm, for example, has a tm_isdst flag, which states whether the broken-down time is expressed in summer time or not. The problem here is that this flag can disagree with the timezone data: it's nonsense for the flag to be zero for a time in the middle of summer. It also means correct timestamps get turned into nonsense when politicians mess around with timezones.

The correct solution is for the flag to apply only when the the time is ambiguous. At other times the flag must be ignored and should be omitted when generating timestamps. In effect the semantics of the flag are "prefer the earlier/later time if there's more than one". When phrased like this, the flag also works in weird cases. William Willett's original proposal was to phase DST in and out by skipping or repeating 20 minutes on four successive Sundays in April and September. My silly "sunrise time" idea involves changing the clock by a minute or so most nights. The isdst flag doesn't have enough bits to identify which version of local time a timestamp belongs to when there are more than two, but the disambiguation flag never needs to distinguish between more than two.

The second complication is those odd locations that do not have a single agreed idea of local time. Decades ago in the USA, arguments over DST sometimes meant that different parts of government (federal/state/local) would have different ideas of local time; see David Prerau's book "Saving the Daylight" for examples. At present the most well-known instance of this problem is Xinjiang, the Uighur Autonomous Region of China. Officially, the whole of China is on Beijing time, UTC+8. This is a bit uncomfortable in Xinjiang in the far west of the country, so the independent-minded Uighurs use their own time, UTC+6, even though their Han neighbours use the national time. (See the LA Times for a report on this subject.)

I think the way to accommodate places like Xinjiang is to treat locations as a geo-political concept rather than a purely geographical one. So you might have "Xinjiang (Han)" and "Xinjiang (Uighur)" in your location database. Xinjiang also breaks the Olson/Eggert tz naming scheme, so I think there's unlikely to be any particularly elegant way to handle it.

The third complication is how to specify locations. A significant problem for many calendaring applications is that we lack a database of which locations are in which timezone. This is usually viewed as a user-friendliness problem, but for my proposal it is more fundamental. Furthermore there's an incompatibility of scale between the kind of location that makes sense for a timezone database (e.g. centred around large cities) and the kind of location that makes sense for a meeting (e.g. room C304). I think it's reasonable to make people enter enough detail about locations to fill the gap between the room-level resolution and the city-level resolution. Only one person should ever have to enter the details of a particular location into a system (or set of connected systems) after which everyone else can re-use the data, so the burden should be small.

This leads to another problem with iCalendar: its idea of a location is both too weak and too complicated. You can specify latitude and longitude, which isn't very practical for software that lacks a built-in map, nor can it be translated into a timezone in Xinjiang. You can also (as well as or instead) specify a human-friendly location as a free text string with an optional URL pointing to a more computer-friendly representation. This latter can be anything, though you hope it is something sensible like a vCard containing a postal address. A vCard address has a fixed format which I suppose can be stretched a bit to cover meeting rooms and other ad-hoc locations in such a way that they can be tied to a timezone, but it isn't designed for the purpose.


Sadly it seems that the world is stuck with iCalendar, and when timezone-related problems occur calendar programmers blame politics or DST, rather than their inadequate data model. What is worse is that it appears to be very unlikely that a properly designed calendar program could interoperate with iCalendar data without loads of ad-hockery and lossage because of the mis-match between the data models.

How annoying.

Comments

Gerald the cuddly duck

from: gerald_duck
date: 10th Dec 2009 01:35 (UTC)

How does this handle scheduling conference calls? That's a major headache for us at work when the call might be convened by either the UK or New England office, inside or outside local DST, for a time inside or outside local DST, which might be inside or outside DST for any given participant.

Of course, by the time the conference call takes place, the laptop from which it was convened might be in Hyderabad.

Also, what about scheduling journeys? They frequently begin and end in different timezones.

I think a proper solution would allow users to say how timezone mapping was to occur: relative to here, home, the server or the appointment location, converted according to the timezone now, the timezone currently expected to be in force when the appointment occurs or the timezone actually prevailing when it occurs. I agree that the timezone actually prevailing at the appointed time in the appointment location is a reasonable default, though.

(One of my friends once had immense trouble with his finance department when he tried filing an expenses claim for ten hotel nights in nine days. They didn't understand his explanation that he'd always traveled East and thus gone full circle.)

Tony Finch

from: fanf
date: 10th Dec 2009 09:08 (UTC)

Conference calls: see paragraph 5 of "my solution". The location of the laptop it is convened from is irrelevant: when you schedule an event you specify its location(s) - though for convenience the software might try to guess for you. Travel: see paragraph 6 of "my solution".

Timezone mapping: relative to places is just specifying the location of the event. Relative to the current timezone or the predicted timezone is likely to mean the time gets screwed up, which is why I say it should always be the timezone actually in force at the time of the event.

Gerald the cuddly duck

from: gerald_duck
date: 10th Dec 2009 10:11 (UTC)

Sorry — were those paragraphs there last night? If so, I was clearly half asleep when I read your posting. )-8

Being able to set an appointment according to the currently-prevailing timezone is useful if thinking in terms of "eight hours hence". Yes, you could just say "eight hours hence", but frequently users do some mental arithmetic based on the current time before they start talking to the computer.

Being able to set an appointment according to the timezone it is currently anticipated will prevail when the appointment occurs has the desirable property that changes in the local timezone rules won't result in the appointment's time suddenly changing for people in different locales. This is likely useful for recurring conference calls.

…or maybe software just needs to notice both these cases and warn the user. Certainly, it needs to handle the case of shifting timezone rules meaning an appointment has to be rescheduled for some, but not all, participants.

To be clearer about the relativity of appointments, say I want to set a reminder to take some medicine at 9pm daily. If I fly to California, I want that appointment to shift timezone. Similar questions arise if we move our 3pm meeting from Atlanta, GA to Birmingham, AL: is it now at 3pm CST or 2pm CST (3pm EST)?

Again, maybe software should notice the problem and only prompt the user if it arises, rather than making people think about it all the time. Hmm.

Tony Finch

from: fanf
date: 10th Dec 2009 15:06 (UTC)

"Eight hours hence" might be just a matter of user interface, in which case it need not affect the data model. Alternatively you might actually want an event to occur eight hours after another event, regardless of intervening timezone shenanigans. This requires an extension to my model, about which I shall write more in another post.

Regarding conference calls, my model is that you choose the primary location so that it covers the people who least like to be inconvenienced. You record the other locations in the event so that the software can easily find events that span timezones when they change so that they can be rescheduled if necessary. This is really the most the software can do about it since it's a real-world problem. (Compare the Exchange TZ adjustment tool which had to bulk-reschedule every event occurring in the affected weeks, instead of just those that spanned timezones.)

Medicine is an interesting case. For drugs that have fairly flexible schedules (so it doesn't matter if you take them earlier or later than normal because of travel) then you just schedule them as floating personal events. These are recorded without any location or timezone information and implicitly occur in whatever local time is in effect. If the drugs are picky about being taken regularly then you want a repeating fixed interval between events which (since it is a fixed interval) is unaffected by timezone changes.

Moving the location of an event is independent of changing its time. Its time remains 15:00 local time at the event's location however much you change the location.

from: ext_44906
date: 10th Dec 2009 08:27 (UTC)

This is partly a sequel to http://lpar.ath0.com/2009/03/16/chronological-pitfalls/ as well, right?

Tony Finch

from: fanf
date: 10th Dec 2009 09:09 (UTC)

Er, yes, that's sort of what I meant by the link in the little preamble :-)

from: anonymous
date: 10th Dec 2009 09:53 (UTC)

Oops. I'll try and pay more attention next time. ;)

from: cartesiandaemon
date: 10th Dec 2009 10:40 (UTC)

It's nonsense to say that the Edinburgh Tattoo will occur in Europe/London

I remember this coming up in the pub. At the time, I said something like "not even Ryanair would tout "Edinburgh" as "Edinburgh - London". But come to think of it, if they built a really good maglev track, Edinburgh would be as close to London as many other places people fly to!

from: anonymous
date: 11th Dec 2009 12:06 (UTC)

Why do we need timezones anyway? Currently in my location, it gets dark at 4pm and I have to turn on the light. But in summer there is light until 11pm. Similarly, I wake up at 7am, but depending on the season the sun has been up for hours, or it's still dark. So there is no relationship between time values and the amount of light outdoors. Let's get rid of timezones and daylight savings time alltogether!

Tony Finch

from: fanf
date: 11th Dec 2009 13:39 (UTC)

An admirable sentiment which has not the slightest chance of being implemented :-)

from: sla29970
date: 11th Dec 2009 21:01 (UTC)

I leave my digital camera set to UT always, everywhere. I really dislike the notion of resetting my PDA manually. Unfortunately it seems that the predictions of leapsecond.com webmaster Tom van Baak are likely to become true -- that it will become illegal for individuals to decide what time it is. I suspect that future PDAs may act like the iPhone, in which the local cell phone towers decide what time it is.

Alas, the folks in Brazil have already seen what happens when the cell phone company decides what time it is. That will leave us all at the mercy of our service provider and the implementors of our particular iCalendar client.

Tony Finch

from: fanf
date: 17th Dec 2009 15:15 (UTC)

I have in the past run my timepieces on UTC, but it's not something that most people would want to do.

I doubt it will ever become illegal to set your clocks however you like. I'm actually quite happy that more devices are setting their clocks automatically. Yes that makes them more vulnerable to incompetent systems management, but Brazil is such a basket case wrt timezones that they have more serious problems than just configuring their cell towers correctly.

Location precision is a issue

from: anonymous
date: 9th Apr 2010 20:22 (UTC)

The problem with your idea using locations is it is never possible to determine whether a given location specification is precise enough. For example, if the location is a major city, perhaps some day that city will be split into different timezones. This could occur for example when a city spans multiple states or nations. But really it could be arbitrarily. It's not possible to know where a new timezone divider line will be drawn by some politician tomorrow.

Tony Finch

Re: Location precision is a issue

from: fanf
date: 9th Apr 2010 23:47 (UTC)

True. (Most sane places try to avoid timezone boundaries that cross population centres - see the history of timezones in Indiana for an example. Xinjiang is the obvious counterexample!)

What I imagine is that you'll typically enter locations that are meaningful to the event. For example, a meeting in room C304 in the Cockcroft building in the New Museums Site in Cambridge in England. You'll only have to enter the full details of a place at most once - in most cases someone else will have already done so for you, e.g. for a previous event. Locations have enough structure that the timezone can be attached at whatever level makes most sense. For example, in Europe you'd probably attach the timezone to the country.

You have to allow the details of a location to change, without updating all the events that refer to the location. If higher levels of locations are shared then when timezones change you only need to make a tiny number of changes, e.g. update England to be in CET/CEST instead of GMT/BST, and all inner locations follow.

