Tony Finch (fanf) wrote,
Tony Finch

Counting the days

Nearly 10 years ago I developed a neat function for converting a Gregorian date into a count of days since some epoch. There was nothing particularly new about it; I just tweaked the numbers until the formula was as simple as possible. I've re-used this code a fair number of times since then without re-examining it. This evening I worked out that it can be distilled even more.

The function adds together some expressions that count the number of days indicated by the three components of the date, plus a constant to set the epoch.

The number of normal days before the year y (full four digit year) is just y*365.

The number of leap days up to and including the year y is y/4-y/100+y/400. This is the usual "add a leap day in every fourth year, but not in every hundredth year, but do in every 400th year" rule. I'm using integer division that discards any remainder, i.e. rounds down.

The previous two paragraphs are inconsistent about including or excluding the current year. This is because leap days occur inside a leap year, not at the end, so their contribution to the day count does not depend solely on the year. We can fix this by re-assigning January and February to the previous year using a small administrative fiddle, as follows (where m counts months starting from 1). Then a leap day falls just before the start of years that are multiples of four (etc.) and the expression in the previous paragraph magically does the right thing. The fiddle also requires an adjustment to the epoch constant.

        if (m > 2) m -= 2; else m += 10, y -= 1;

Months are the part that I spent most time fiddling with. Section 1.12 of Calendrical Calculations develops various equations for calendars (such as the Julian calendar) that spread their leap years evenly, like Bresenham's line drawing algorithm spreads out raster steps. The equations also work for month lengths in the Julian and Gregorian calendars, if you treat 30-day months as normal years and 31 day months as leap years, and ignore February. We want to calculate the number of days in a year before the start of a given month, which is calculated by equation 1.73:

        (L*y - L + D*L % C) / C + (y - 1) * N

where L is the number of leap years in a cycle, C is the total number of years in a cycle, D is the offset of the start of the cycle, N is the number of days in a normal year, and years are actually months. Yuck. It turns out that if we perform the right administrative fiddling, we can arrange that D=0, and we can substitute m=y-1. This gives us the greatly simplified:

        m*L/C + m*N = m * (N*C+L) / C

For Julian and Gregorian months, N=30, C=12, and L=7, giving m*367/12. This pretends that February has 30 days, but the pretence doesn't matter because we have moved February to the end of the year so we never count days after February 28th or 29th . The complete function is then as follows (with the epoch set so that 1st January, 1 A.D. is day 1).

    int A(int y, int m, int d) {
        if (m > 2) m -= 2; else m += 10, y -= 1;
        return y*365 + y/4 - y/100 + y/400 + m*367/12 + d - 336;

In fact you don't have to treat a whole year as a cycle for the purpose of the month calculation. The most basic pattern is a five month cycle containing three long months, as in March-July and August-December. (January and February are the start of a third truncated cycle.) However when I tried to make m*153/5 work in 1999 I could not eliminate the D offset to keep the equations simple. My problem was that I only allowed myself to count months starting from zero or one, and I was flailing around without the sage guidance of messrs. Dershowitz and Reingold. (I didn't use their equation 1.73 when I was developing this code, but it's a good example of what I was trying to avoid.)

It turns out that we can play around with the administrative fiddle to adjust D so that the cycle falls in the right place, so long as we compensate by adjusting the epoch constant. For the Julian and Gregorian five-month cycle D=4, i.e. we need to shift March (the start of the cycle) to month number 4, which gives us the following code.

    int B(int y, int m, int d) {
        if (m > 2) m += 1; else m += 13, y -= 1;
        return y*365 + y/4 - y/100 + y/400 + m*153/5 + d - 428;

I mentioned above that equation 1.73 works for Julian leap years, so we can use it for years as well as months. This combines first two terms of the return expression leaving the Gregorian correction separate. No additional fiddling is necessary. We can't merge in the Gregorian correction as well, because evenly spreading the leap years gives intervals of 4 or 5, not 4 or 8.

    int C(int y, int m, int d) {
        if (m > 2) m += 1; else m += 13, y -= 1;
        return y*1461/4 - y/100 + y/400 + m*153/5 + d - 428;

There's a related function for calculating the number of days in a month. It uses the remainders from the divisions to find the length of the current month, instead of using the results to count the preceding days. My old code re-used the same administrative fiddle:

    int D(int y, int m) {
        if (m > 2) m -= 2; else m += 10;
        return m*367%12 > 4 ? 31
             : m != 12     ? 30
             : y % 4      ? 28
             : y % 100   ? 29
             : y % 400  ? 28
                       : 29;

In fact you don't need a fiddle in this case, because there's no need to move February to the end of the year. You can just pick values of C and L which make the pattern fall in the right place.

    int E(int y, int m) {
        return m*275%9 > 3 ? 31
             : m != 2     ? 30
             : y % 4     ? 28
             : y % 100  ? 29
             : y % 400 ? 28
                      : 29;
  • Post a new comment


    default userpic

    Your reply will be screened

    Your IP address will be recorded 

    When you submit the form an invisible reCAPTCHA check will be performed.
    You must follow the Privacy Policy and Google Terms of use.
Around this time last year I was fiddling about with a piece of Perl that needed to determine the day of the week for a given y,m,d, and I ended up with this:
$date =~ /^(\d+)-(\d+)-(\d+)$/; $mm=($1*12+$2)-22803;
$day = qw/Wed Thu Fri Sat Sun Mon Tue/[$d%7];
That assumes leap years are simply one-every-four-years, so it only works between March 1900 and February 2100, which was enough for my purposes. (Indeed, my constant 22803 is arranged in order to place the epoch at 1 March 1900.)

I derived my magic constant 30.6 in a rather different way from you. I took a much more rough-and-ready approach: I wanted a linear function y=mx+c to which I could feed months-since-March and get back something which would round to the correct value for days-since-start-of-March in all cases, and I arranged this by simply writing down 24 inequalities for m and c, displaying them in gnuplot, and picking a nice-looking point somewhere in the resulting region. I'm entertained to find that my value of 30.6, picked arbitrarily in that way, is exactly equal to the rigorously derived 153/5 in your B() :-)
The book has a bit about values of m that make the linear equation round correctly, with a nice graph :-)

Back in my youth I hacked up my own implementation of Bresenham's algorithm, though I can't remember the details now. (Oddly enough I think I can remember the similar circle algorithm more clearly.) I wonder if his paper has some nice maths I can compare with the calendrical calculations.
What was wrong with
use POSIX;
$day = strftime("%a", 0, 0, 0, $1, $2-1, $3-1900);


September 10 2008, 10:47:29 UTC 13 years ago Edited:  September 10 2008, 10:48:53 UTC

Less fun!

fanf could have done the same in C, equally. The point is not just to get the job done; the point is intellectual curiosity about how little code it's possible to get the job done in – counting the contents of any standard library functions towards the total quantity of code.
I was hoping for a story of how you were running Perl on an embedded device that didn't have enough memory for the POSIX module so had to roll your own...

(I recently implemented my own calendar computation, for an application that needs to be portable to devices without the standard C library. I used the 153/5 trick too, which I first came across in Wikipedia's Julian day article.)
%a has localization braindamage. Indexing an array using %w is a better choice.

Also, what Simon said :-)
LOL. That's amazing. I'll restrain the impulse to lobby for an opcode for the next chip that does (L*y - L + D*L % C) / C + (y - 1) * N atomically :)


September 10 2008, 11:58:23 UTC 13 years ago Edited:  September 10 2008, 12:01:55 UTC

if (m > 2) m += 1; else m += 13, y -= 1;
This bit is bothering me. I'm sure it ought to be possible to do it in a nicer way. For instance, you could write
y -= m < 3, m = (m + 9) % 12 + 4;
but really that's only moving the ickiness about, not removing it: instead of having the mod-12 offset applied to m encoded in two places, I've now got the wrap-round point encoded (obscurely) in two places.

If only C had multiple return values and the library div function were re-prototyped to use them, you could do something like

y, m = div(12*y+m - 3, 12); m += 4;
So I suppose the best I can do is
m += y*12 - 3; y = m/12; m = m%12 + 4;
which is barely any shorter than your version anyway, and probably slower too unless division is cheap compared to conditional branching.
Yes, I've played around with various versions of that code and in the end the most straight-forward version seems best. The test can turn into OKish straight-line code on CPUs that have conditional move instructions. Sadly I have not found a way to make the five-month cycle fit with March=3 which would eliminate a branch of the condition.