?

Log in

fanf

Counting the days

« previous entry | next entry »
10th Sep 2008 | 00:12

Nearly 10 years ago I developed a neat function for converting a Gregorian date into a count of days since some epoch. There was nothing particularly new about it; I just tweaked the numbers until the formula was as simple as possible. I've re-used this code a fair number of times since then without re-examining it. This evening I worked out that it can be distilled even more.

The function adds together some expressions that count the number of days indicated by the three components of the date, plus a constant to set the epoch.

The number of normal days before the year y (full four digit year) is just y*365.

The number of leap days up to and including the year y is y/4-y/100+y/400. This is the usual "add a leap day in every fourth year, but not in every hundredth year, but do in every 400th year" rule. I'm using integer division that discards any remainder, i.e. rounds down.

The previous two paragraphs are inconsistent about including or excluding the current year. This is because leap days occur inside a leap year, not at the end, so their contribution to the day count does not depend solely on the year. We can fix this by re-assigning January and February to the previous year using a small administrative fiddle, as follows (where m counts months starting from 1). Then a leap day falls just before the start of years that are multiples of four (etc.) and the expression in the previous paragraph magically does the right thing. The fiddle also requires an adjustment to the epoch constant.

        if (m > 2) m -= 2; else m += 10, y -= 1;

Months are the part that I spent most time fiddling with. Section 1.12 of Calendrical Calculations develops various equations for calendars (such as the Julian calendar) that spread their leap years evenly, like Bresenham's line drawing algorithm spreads out raster steps. The equations also work for month lengths in the Julian and Gregorian calendars, if you treat 30-day months as normal years and 31 day months as leap years, and ignore February. We want to calculate the number of days in a year before the start of a given month, which is calculated by equation 1.73:

        (L*y - L + D*L % C) / C + (y - 1) * N

where L is the number of leap years in a cycle, C is the total number of years in a cycle, D is the offset of the start of the cycle, N is the number of days in a normal year, and years are actually months. Yuck. It turns out that if we perform the right administrative fiddling, we can arrange that D=0, and we can substitute m=y-1. This gives us the greatly simplified:

        m*L/C + m*N = m * (N*C+L) / C

For Julian and Gregorian months, N=30, C=12, and L=7, giving m*367/12. This pretends that February has 30 days, but the pretence doesn't matter because we have moved February to the end of the year so we never count days after February 28th or 29th . The complete function is then as follows (with the epoch set so that 1st January, 1 A.D. is day 1).

    int A(int y, int m, int d) {
        if (m > 2) m -= 2; else m += 10, y -= 1;
        return y*365 + y/4 - y/100 + y/400 + m*367/12 + d - 336;
    }

In fact you don't have to treat a whole year as a cycle for the purpose of the month calculation. The most basic pattern is a five month cycle containing three long months, as in March-July and August-December. (January and February are the start of a third truncated cycle.) However when I tried to make m*153/5 work in 1999 I could not eliminate the D offset to keep the equations simple. My problem was that I only allowed myself to count months starting from zero or one, and I was flailing around without the sage guidance of messrs. Dershowitz and Reingold. (I didn't use their equation 1.73 when I was developing this code, but it's a good example of what I was trying to avoid.)

It turns out that we can play around with the administrative fiddle to adjust D so that the cycle falls in the right place, so long as we compensate by adjusting the epoch constant. For the Julian and Gregorian five-month cycle D=4, i.e. we need to shift March (the start of the cycle) to month number 4, which gives us the following code.

    int B(int y, int m, int d) {
        if (m > 2) m += 1; else m += 13, y -= 1;
        return y*365 + y/4 - y/100 + y/400 + m*153/5 + d - 428;
    }

I mentioned above that equation 1.73 works for Julian leap years, so we can use it for years as well as months. This combines first two terms of the return expression leaving the Gregorian correction separate. No additional fiddling is necessary. We can't merge in the Gregorian correction as well, because evenly spreading the leap years gives intervals of 4 or 5, not 4 or 8.

    int C(int y, int m, int d) {
        if (m > 2) m += 1; else m += 13, y -= 1;
        return y*1461/4 - y/100 + y/400 + m*153/5 + d - 428;
    }

There's a related function for calculating the number of days in a month. It uses the remainders from the divisions to find the length of the current month, instead of using the results to count the preceding days. My old code re-used the same administrative fiddle:

    int D(int y, int m) {
        if (m > 2) m -= 2; else m += 10;
        return m*367%12 > 4 ? 31
             : m != 12     ? 30
             : y % 4      ? 28
             : y % 100   ? 29
             : y % 400  ? 28
                       : 29;
    }

In fact you don't need a fiddle in this case, because there's no need to move February to the end of the year. You can just pick values of C and L which make the pattern fall in the right place.

    int E(int y, int m) {
        return m*275%9 > 3 ? 31
             : m != 2     ? 30
             : y % 4     ? 28
             : y % 100  ? 29
             : y % 400 ? 28
                      : 29;
    }

| Leave a comment | Share

Comments {9}

Simon Tatham

from: simont
date: 10th Sep 2008 11:58 (UTC)

if (m > 2) m += 1; else m += 13, y -= 1;
This bit is bothering me. I'm sure it ought to be possible to do it in a nicer way. For instance, you could write
y -= m < 3, m = (m + 9) % 12 + 4;
but really that's only moving the ickiness about, not removing it: instead of having the mod-12 offset applied to m encoded in two places, I've now got the wrap-round point encoded (obscurely) in two places.

If only C had multiple return values and the library div function were re-prototyped to use them, you could do something like

y, m = div(12*y+m - 3, 12); m += 4;
So I suppose the best I can do is
m += y*12 - 3; y = m/12; m = m%12 + 4;
which is barely any shorter than your version anyway, and probably slower too unless division is cheap compared to conditional branching.

Edited at 2008-09-10 12:01 pm (UTC)

Reply | Thread

Tony Finch

from: fanf
date: 10th Sep 2008 12:47 (UTC)

Yes, I've played around with various versions of that code and in the end the most straight-forward version seems best. The test can turn into OKish straight-line code on CPUs that have conditional move instructions. Sadly I have not found a way to make the five-month cycle fit with March=3 which would eliminate a branch of the condition.

Reply | Parent | Thread