how many 1sts of any month were sundays - since 1901-01-01?

2007-12-27 11:57:56 CET | Tags: , ,

nixternal wrote about boost library for c++.

with it he was able to find the answer to title question in miliseconds (he didn’t specify how many, but let’s assume that is was less than 10 ms).

so i decided to check how fast can i do it in postgresql …

first, i will need list of all days from given date range (1901-01-01 to 2000-12-31):

select
'1901-01-01'::date + i as date
FROM
generate_series(0, '2000-12-31'::date - '1901-01-01'::date) as i
;

now i need to check which of these days are 1st days of month:

select
'1901-01-01'::date + i as date
FROM
generate_series(0, '2000-12-31'::date - '1901-01-01'::date) as i
WHERE
extract(day FROM '1901-01-01'::date + i) = 1
;

and the “sunday” question:

select
'1901-01-01'::date + i as date
FROM
generate_series(0, '2000-12-31'::date - '1901-01-01'::date) as i
WHERE
extract(day FROM '1901-01-01'::date + i) = 1 and extract(dow FROM '1901-01-01'::date + i) = 0
;

great, the code works ok, so now, let’s get count, and time the query:

# SELECT
=# count(*)
=# FROM
=# generate_series(0, '2000-12-31'::date - '1901-01-01'::date) as i
=# WHERE
=# extract(day FROM '1901-01-01'::date + i) = 1 AND extract(dow FROM '1901-01-01'::date + i) = 0
=# ;
count
-------
171
(1 row)
Time: 59.698 ms

not bad, but i’d like to get the time a bit lower.

perhaps i should go with single comparison with “to_char” ?

# SELECT
=# count(*)
=# FROM
=# generate_series(0, '2000-12-31'::date - '1901-01-01'::date) as i
=# WHERE
=# to_char('1901-01-01'::date + i, 'D DD') = '1 01'
=# ;
count
-------
171
(1 row)
Time: 113.314 ms

oops. single to_char() is slower than 2 separate extract()s.

what else can i do?

i know. let’s use the fact that i can add intervals, and not only days. theoretically if i’ll add ‘1 month’ i should stay on 1st day of the month. quick test:

# select '1901-01-01'::date + '1 month'::interval * i from generate_series(1,10) i;
?column?
---------------------
1901-02-01 00:00:00
1901-03-01 00:00:00
1901-04-01 00:00:00
1901-05-01 00:00:00
1901-06-01 00:00:00
1901-07-01 00:00:00
1901-08-01 00:00:00
1901-09-01 00:00:00
1901-10-01 00:00:00
1901-11-01 00:00:00
(10 rows)

looks cool.

so, how to use it?

basic approach:

# SELECT
count(*)
FROM
generate_series(0, ('2000-12-31'::date - '1901-01-01'::date)/28) as i
WHERE
extract(day FROM '1901-01-01'::date + i * '1 month'::INTERVAL) = 1 AND extract(dow FROM '1901-01-01'::date + i * '1 month'::INTERVAL) = 0
AND '1901-01-01'::date + i * '1 month'::INTERVAL < '2000-12-31'::date
;
count
-------
171
(1 row)
Time: 6.083 ms

shows success :)

now, it would be cool to de-uglify the query, and perhaps make it so it will need to have dates specified only once:

# select count(*)
from (
select
cast(start.date + '1 month'::INTERVAL * generate_series(0, (finish.date - start.date)/28) as date) as date,
finish.date as "finish"
from
(select '2000-12-31'::date as date) as finish, (select '1901-01-01'::date as date) as start
) x
where 1 = extract(day from x.date) and 0 = extract(dow from x.date) AND x.date < x.finish
;
count
-------
171
(1 row)
Time: 4.905 ms

it’s a bit faster this some as date + interval additions are done only once for each month.

in general - i was able to write a 5ms query to calculate “number of sundays since 1901″, in interpreted language. and the language/engine is not even calendar centric (designed to work mostly with dates).

it’s just one of the things why i love postgresql.

5 Responses to “how many 1sts of any month were sundays - since 1901-01-01?”

  1. David Fetter Says:

    I think you can do this more efficiently (and more correctly) like this. The 1200 comes from the fact that you’re looking at 100 years, each of which has 12 months.


    SELECT
    count(*)
    FROM
    generate_series(0,1200) i
    CROSS JOIN
    (VALUES ('1900-01-01'::date)) AS d
    WHERE
    extract(
    'dow' FROM d.column1 + i * '1 month'::interval
    ) = 0
    AND
    extract(
    'day' FROM d.column1 + i * '1 month'::interval
    ) = 1;

  2. depesz Says:

    the difference is that i didn’t want to hardcode numbers like this (1200) - i wanted the query to calculate it by its own. that’s why i have this:

    generate_series(0, (finish.date - start.date)/28)

    and later:

    where … AND x.date < x.finish

  3. David Fetter Says:

    Fair enough. I think this one’s faster and just as easy to parameterize as yours :)

    SELECT count(*)
    FROM (
    SELECT start + generate_series(0,
    12 * date_part(’year’, finish)::integer +
    date_part(’month’, finish)::integer -
    12 * date_part(’year’, start)::integer -
    date_part(’month’, start)::integer
    ) * INTERVAL ‘1 month’ AS i
    FROM
    (VALUES(
    ‘1901-01-01′::date,
    ‘2000-01-01′::date
    )) AS t(”start”, “finish”)
    ) AS j
    WHERE
    extract(’dow’ FROM i)=0
    AND
    extract(’day’ FROM i)=1
    ;

  4. David Fetter Says:

    oops. formatted this time:


    SELECT count(*)
    FROM (
    SELECT start + generate_series(0,
    12 * date_part('year', finish)::integer +
    date_part('month', finish)::integer -
    12 * date_part('year', start)::integer -
    date_part('month', start)::integer
    ) * INTERVAL '1 month' AS i
    FROM
    (VALUES(
    '1901-01-01'::date,
    '2000-01-01'::date
    )) AS t("start", "finish")
    ) AS j
    WHERE
    extract('dow' FROM i)=0
    AND
    extract('day' FROM i)=1
    ;

  5. depesz Says:

    just re-tested my approach (last sql from post) and davids.
    it is tested on different machine, so times will be different, but i wanted to check how much faster davids approach will be.

    i ran both queries 5 times, and got results:

    (depesz way):
    Time: 9.272 ms
    Time: 9.316 ms
    Time: 9.225 ms
    Time: 9.422 ms
    Time: 9.224 ms

    (david way):
    Time: 7.736 ms
    Time: 6.622 ms
    Time: 3.124 ms
    Time: 2.591 ms
    Time: 2.600 ms

    whoa. that’s some nice speedup in last 3 cases :) and it’s repeatable!

    david, kudos to you :)

Leave a Reply