nixternal wrote about boost library for c++.
with it he was able to find the answer to title question in miliseconds (he didn’t specify how many, but let’s assume that is was less than 10 ms).
so i decided to check how fast can i do it in postgresql …
first, i will need list of all days from given date range (1901-01-01 to 2000-12-31):
select
'1901-01-01'::date + i as date
FROM
generate_series(0, '2000-12-31'::date - '1901-01-01'::date) as i
;
now i need to check which of these days are 1st days of month:
select
'1901-01-01'::date + i as date
FROM
generate_series(0, '2000-12-31'::date - '1901-01-01'::date) as i
WHERE
extract(day FROM '1901-01-01'::date + i) = 1
;
and the “sunday” question:
select
'1901-01-01'::date + i as date
FROM
generate_series(0, '2000-12-31'::date - '1901-01-01'::date) as i
WHERE
extract(day FROM '1901-01-01'::date + i) = 1 and extract(dow FROM '1901-01-01'::date + i) = 0
;
great, the code works ok, so now, let’s get count, and time the query:
# SELECT
=# count(*)
=# FROM
=# generate_series(0, '2000-12-31'::date - '1901-01-01'::date) as i
=# WHERE
=# extract(day FROM '1901-01-01'::date + i) = 1 AND extract(dow FROM '1901-01-01'::date + i) = 0
=# ;
count
-------
171
(1 row)
Time: 59.698 ms
not bad, but i’d like to get the time a bit lower.
perhaps i should go with single comparison with “to_char” ?
# SELECT
=# count(*)
=# FROM
=# generate_series(0, '2000-12-31'::date - '1901-01-01'::date) as i
=# WHERE
=# to_char('1901-01-01'::date + i, 'D DD') = '1 01'
=# ;
count
-------
171
(1 row)
Time: 113.314 ms
oops. single to_char() is slower than 2 separate extract()s.
what else can i do?
i know. let’s use the fact that i can add intervals, and not only days. theoretically if i’ll add ’1 month’ i should stay on 1st day of the month. quick test:
# select '1901-01-01'::date + '1 month'::interval * i from generate_series(1,10) i;
?column?
---------------------
1901-02-01 00:00:00
1901-03-01 00:00:00
1901-04-01 00:00:00
1901-05-01 00:00:00
1901-06-01 00:00:00
1901-07-01 00:00:00
1901-08-01 00:00:00
1901-09-01 00:00:00
1901-10-01 00:00:00
1901-11-01 00:00:00
(10 rows)
looks cool.
so, how to use it?
basic approach:
# SELECT
count(*)
FROM
generate_series(0, ('2000-12-31'::date - '1901-01-01'::date)/28) as i
WHERE
extract(day FROM '1901-01-01'::date + i * '1 month'::INTERVAL) = 1 AND extract(dow FROM '1901-01-01'::date + i * '1 month'::INTERVAL) = 0
AND '1901-01-01'::date + i * '1 month'::INTERVAL < '2000-12-31'::date
;
count
-------
171
(1 row)
Time: 6.083 ms
shows success
now, it would be cool to de-uglify the query, and perhaps make it so it will need to have dates specified only once:
# select count(*)
from (
select
cast(start.date + '1 month'::INTERVAL * generate_series(0, (finish.date - start.date)/28) as date) as date,
finish.date as "finish"
from
(select '2000-12-31'::date as date) as finish, (select '1901-01-01'::date as date) as start
) x
where 1 = extract(day from x.date) and 0 = extract(dow from x.date) AND x.date < x.finish
;
count
-------
171
(1 row)
Time: 4.905 ms
it’s a bit faster this some as date + interval additions are done only once for each month.
in general – i was able to write a 5ms query to calculate “number of sundays since 1901″, in interpreted language. and the language/engine is not even calendar centric (designed to work mostly with dates).
it’s just one of the things why i love postgresql.

5 comments
I think you can do this more efficiently (and more correctly) like this. The 1200 comes from the fact that you’re looking at 100 years, each of which has 12 months.
SELECT
count(*)
FROM
generate_series(0,1200) i
CROSS JOIN
(VALUES ('1900-01-01'::date)) AS d
WHERE
extract(
'dow' FROM d.column1 + i * '1 month'::interval
) = 0
AND
extract(
'day' FROM d.column1 + i * '1 month'::interval
) = 1;
the difference is that i didn’t want to hardcode numbers like this (1200) – i wanted the query to calculate it by its own. that’s why i have this:
generate_series(0, (finish.date – start.date)/28)
and later:
where … AND x.date < x.finish
Fair enough. I think this one’s faster and just as easy to parameterize as yours
SELECT count(*)
FROM (
SELECT start + generate_series(0,
12 * date_part(‘year’, finish)::integer +
date_part(‘month’, finish)::integer -
12 * date_part(‘year’, start)::integer -
date_part(‘month’, start)::integer
) * INTERVAL ’1 month’ AS i
FROM
(VALUES(
’1901-01-01′::date,
’2000-01-01′::date
)) AS t(“start”, “finish”)
) AS j
WHERE
extract(‘dow’ FROM i)=0
AND
extract(‘day’ FROM i)=1
;
oops. formatted this time:
SELECT count(*)
FROM (
SELECT start + generate_series(0,
12 * date_part('year', finish)::integer +
date_part('month', finish)::integer -
12 * date_part('year', start)::integer -
date_part('month', start)::integer
) * INTERVAL '1 month' AS i
FROM
(VALUES(
'1901-01-01'::date,
'2000-01-01'::date
)) AS t("start", "finish")
) AS j
WHERE
extract('dow' FROM i)=0
AND
extract('day' FROM i)=1
;
just re-tested my approach (last sql from post) and davids.
it is tested on different machine, so times will be different, but i wanted to check how much faster davids approach will be.
i ran both queries 5 times, and got results:
(depesz way):
Time: 9.272 ms
Time: 9.316 ms
Time: 9.225 ms
Time: 9.422 ms
Time: 9.224 ms
(david way):
Time: 7.736 ms
Time: 6.622 ms
Time: 3.124 ms
Time: 2.591 ms
Time: 2.600 ms
whoa. that’s some nice speedup in last 3 cases
and it’s repeatable!
david, kudos to you