December 27th, 2007 by depesz | Tags: | 11 comments »
Did it help? If yes - maybe you can help me?

nixternal wrote about boost library for c++.

with it he was able to find the answer to title question in miliseconds (he didn't specify how many, but let's assume that is was less than 10 ms).

so i decided to check how fast can i do it in postgresql …

first, i will need list of all days from given date range (1901-01-01 to 2000-12-31):

select
    '1901-01-01'::date + i as date
FROM
    generate_series(0, '2000-12-31'::date - '1901-01-01'::date) as i
;

now i need to check which of these days are 1st days of month:

select
    '1901-01-01'::date + i as date
FROM
    generate_series(0, '2000-12-31'::date - '1901-01-01'::date) as i
WHERE
    extract(day FROM '1901-01-01'::date + i) = 1
;

and the “sunday" question:

select
    '1901-01-01'::date + i as date
FROM
    generate_series(0, '2000-12-31'::date - '1901-01-01'::date) as i
WHERE
    extract(day FROM '1901-01-01'::date + i) = 1 and extract(dow FROM '1901-01-01'::date + i) = 0
;

great, the code works ok, so now, let's get count, and time the query:

# SELECT
=#     count(*)
=# FROM
=#     generate_series(0, '2000-12-31'::date - '1901-01-01'::date) as i
=# WHERE
=#     extract(day FROM '1901-01-01'::date + i) = 1 AND extract(dow FROM '1901-01-01'::date + i) = 0
=# ;
 count
-------
   171
(1 row)
Time: 59.698 ms

not bad, but i'd like to get the time a bit lower.

perhaps i should go with single comparison with “to_char" ?

# SELECT
=#     count(*)
=# FROM
=#     generate_series(0, '2000-12-31'::date - '1901-01-01'::date) as i
=# WHERE
=#     to_char('1901-01-01'::date + i, 'D DD') = '1 01'
=# ;
 count
-------
   171
(1 row)
Time: 113.314 ms

oops. single to_char() is slower than 2 separate extract()s.

what else can i do?

i know. let's use the fact that i can add intervals, and not only days. theoretically if i'll add ‘1 month' i should stay on 1st day of the month. quick test:

# select '1901-01-01'::date + '1 month'::interval * i from generate_series(1,10) i;
      ?column?
---------------------
 1901-02-01 00:00:00
 1901-03-01 00:00:00
 1901-04-01 00:00:00
 1901-05-01 00:00:00
 1901-06-01 00:00:00
 1901-07-01 00:00:00
 1901-08-01 00:00:00
 1901-09-01 00:00:00
 1901-10-01 00:00:00
 1901-11-01 00:00:00
(10 rows)

looks cool.

so, how to use it?

basic approach:

# SELECT
    count(*)
FROM
    generate_series(0, ('2000-12-31'::date - '1901-01-01'::date)/28) as i
WHERE
    extract(day FROM '1901-01-01'::date + i * '1 month'::INTERVAL) = 1 AND extract(dow FROM '1901-01-01'::date + i * '1 month'::INTERVAL) = 0
    AND '1901-01-01'::date + i * '1 month'::INTERVAL < '2000-12-31'::date
;
 count
-------
   171
(1 row)
Time: 6.083 ms

shows success 🙂

now, it would be cool to de-uglify the query, and perhaps make it so it will need to have dates specified only once:

# select count(*)
from (
select
    cast(start.date + '1 month'::INTERVAL * generate_series(0, (finish.date - start.date)/28) as date) as date,
    finish.date as "finish"
from
    (select '2000-12-31'::date as date) as finish, (select '1901-01-01'::date as date) as start
) x
where 1 = extract(day from x.date) and 0 = extract(dow from x.date) AND x.date < x.finish
;
 count
-------
   171
(1 row)
Time: 4.905 ms

it's a bit faster this some as date + interval additions are done only once for each month.

in general – i was able to write a 5ms query to calculate “number of sundays since 1901", in interpreted language. and the language/engine is not even calendar centric (designed to work mostly with dates).

it's just one of the things why i love postgresql.

  1. 11 comments

  2. Dec 28, 2007

    I think you can do this more efficiently (and more correctly) like this. The 1200 comes from the fact that you’re looking at 100 years, each of which has 12 months.

    SELECT
        count(*)
    FROM
        generate_series(0,1200) i
        CROSS JOIN
        (VALUES ('1900-01-01'::date)) AS d
    WHERE
        extract(
            'dow' FROM d.column1 + i * '1 month'::interval
        ) = 0
    AND
        extract(
            'day' FROM d.column1 + i * '1 month'::interval
        ) = 1;
  3. Dec 28, 2007

    the difference is that i didn’t want to hardcode numbers like this (1200) – i wanted the query to calculate it by its own. that’s why i have this:

    generate_series(0, (finish.date – start.date)/28)

    and later:

    where … AND x.date < x.finish

  4. Dec 29, 2007

    Fair enough. I think this one’s faster and just as easy to parameterize as yours 🙂

    SELECT count(*)
    FROM (
    SELECT start + generate_series(0,
    12 * date_part(‘year’, finish)::integer +
    date_part(‘month’, finish)::integer –
    12 * date_part(‘year’, start)::integer –
    date_part(‘month’, start)::integer
    ) * INTERVAL ‘1 month’ AS i
    FROM
    (VALUES(
    ‘1901-01-01’::date,
    ‘2000-01-01’::date
    )) AS t(“start”, “finish”)
    ) AS j
    WHERE
    extract(‘dow’ FROM i)=0
    AND
    extract(‘day’ FROM i)=1
    ;

  5. Dec 29, 2007

    oops. formatted this time:

    SELECT count(*)
    FROM (
        SELECT start + generate_series(0,
                    12 * date_part('year', finish)::integer +
                    date_part('month', finish)::integer  -
                    12 * date_part('year', start)::integer  -
                    date_part('month', start)::integer 
                ) * INTERVAL '1 month' AS i
        FROM
            (VALUES(
                '1901-01-01'::date,
                '2000-01-01'::date
            )) AS t("start", "finish")
    ) AS j
    WHERE
        extract('dow' FROM i)=0
    AND
        extract('day' FROM i)=1
    ;
  6. Jan 5, 2008

    just re-tested my approach (last sql from post) and davids.
    it is tested on different machine, so times will be different, but i wanted to check how much faster davids approach will be.

    i ran both queries 5 times, and got results:

    (depesz way):
    Time: 9.272 ms
    Time: 9.316 ms
    Time: 9.225 ms
    Time: 9.422 ms
    Time: 9.224 ms

    (david way):
    Time: 7.736 ms
    Time: 6.622 ms
    Time: 3.124 ms
    Time: 2.591 ms
    Time: 2.600 ms

    whoa. that’s some nice speedup in last 3 cases 🙂 and it’s repeatable!

    david, kudos to you 🙂

  7. # M.Le_maudit
    Aug 1, 2011

    These are very interesting queries. Thanks for the post.

    But what about finding the lasts Sundays and not the 1sts ones ?

    Wouldn’t this be more challenging as the last Sunday in a month may be the 4th or the 5th one ?

    Have fun 🙂

  8. Aug 2, 2011

    @M.Le_maudit:

    Not sure I understand.

    What exactly do you want to find?

  9. # M.Le_maudit
    Aug 4, 2011

    Your right, it’s not exactly the same kind of challenge…

    Well, I’m looking for the dates of all last Sundays in a month during a period of time and the rank of these Sundays in the month.

    for example :
    – in August 2011, last Sunday will be August 28 and it will be the 4th Sunday in the month.
    – in October 2011, last Sunday will be October 30 and it will be the 5th Sunday in the month.

    So why not something that would give this kind of result for the year 2011 :

    Year | Month | Day | Rank
    —–+———-+—–+—–
    2011 | January | 30 | 5
    2011 | February | 27 | 4
    2011 | March | 27 | 4
    2011 | April | 24 | 4
    2011 | May | 29 | 5

    2011 | August | 28 | 4

    2011 | October | 30 | 5

    2011 |December | 25 | 4

    I’m not sure this will be so challenging for you, but it is for me… yes, I have to confess that I’m not a SQL pro. 🙂
    Anyway, I need to manage recurring events that occur every first and every last Sundays in a month, so I need to identify the dates of these days.

  10. Aug 4, 2011

    @M.Le_maudit:

    select
    extract(year from i) as year,
    to_char(i, 'Month') as month,
    max(extract(day from i)) as day,
    count(*) as rank
    from
    generate_series('2011-01-01'::date, '2011-12-31'::date, '1 day'::interval) i
    where
    0 = extract(dow from i)
    group by year, month, extract(month from i)
    order by extract(month from i);
  11. Oct 23, 2011

    @depesz
    Hi, this is my first post on your blog and I must say all kudos to you and commentators.
    Truly great readings for postres average like me.
    However, while I was reading this page and fiddling with code I did spot few minor errors.
    Depesz, in your final ‘select’ dividing with 28 generates unnecessary rows. IMHO it’s more correct to
    borrow from DAVID FETTER example; resulting in:

    select count(*)
    from (
    select
    	cast(start.date + '1 month'::INTERVAL * generate_series(0,
    		12 * date_part('year', finish.date)::integer +
    		date_part('month', finish.date)::integer -
    		(12 * date_part('year', start.date)::integer +
    		date_part('month', start.date)::integer)) as date) as date,
    	finish.date as "finish"
    from
    	(select '2000-12-31'::date as date) as finish, (select '1901-01-01'::date as date) as start
    ) x
    where 1 = extract(day from x.date) and 0 = extract(dow from x.date) AND x.date < x.finish
    ;

    @DAVID FETTER
    In your final ‘select’ IMHO it’s more correct to change this part

    12 * date_part('year', finish)::integer +
    date_part('month', finish)::integer -
    12 * date_part('year', start)::integer -
    date_part('month', start)::integer

    to

    12 * date_part('year', finish)::integer +
    date_part('month', finish)::integer -
    (12 * date_part('year', start)::integer +
    date_part('month', start)::integer)

    Simplifying even more this could be expressed as:

    SELECT
    	count(*)
    FROM (
    	SELECT generate_series(start,
    		finish, '1 month') AS i
    	FROM
    		(VALUES(
    			'1901-01-01'::date,
    			'2000-01-01'::date
    	)) AS t("start", "finish")
    ) AS j
    WHERE
    	extract('dow' FROM i)=0
    AND
    	extract('day' FROM i)=1;
  1. 1 Trackback(s)

  2. Feb 11, 2013: Log Buffer #77: A Carnival of the Vanities for DBAs

Sorry, comments for this post are disabled.