October 26th, 2007 by depesz | Tags: | 10 comments »
Did it help? If yes - maybe you can help me? Donate BTC to 19zPa5diT2LZqGtTi8f8bfApLn8rw9zBHx

so, there you have a users table, with a very basic structure:

Table "public.users"
Column | Type | Modifiers
-----------+---------+-----------
id | integer | not null
birthdate | date |
Indexes:
"x_pkey" PRIMARY KEY, btree (id)

then, you have a task: find a query that will return all users which have birthday tomorrow. how will you do it?

there are couple of problems with it.

first – let's assume that you want the search using:

select * from users where to_char(birthdate, 'MM-DD') = to_char(now() + '1 day'::interval, 'MM-DD');

this is simple, yet it seq-scans whole table:

# EXPLAIN ANALYZE select * from users where to_char(birthdate, 'MM-DD') = to_char(now() + '1 day'::interval, 'MM-DD');
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------
Seq Scan on users (cost=0.00..43674.95 rows=7458 width=8) (actual time=0.036..5537.293 rows=2553 loops=1)
Filter: (to_char((birthdate)::timestamp with time zone, 'MM-DD'::text) = to_char((now() + '1 day'::interval), 'MM-DD'::text))
Total runtime: 5538.686 ms
(3 rows)

now, you surely can index it with functional index. right? wrong!:

# create index q on users (to_char(birthdate, 'MM-DD'));
ERROR: functions in index expression must be marked IMMUTABLE

ouch.

to_char() is “stable", but not “immutable" – if you don't know what it means – consult the manual.

of course i can write my own wrapper around to_char:

CREATE OR REPLACE FUNCTION indexable_month_day(date) RETURNS TEXT as $BODY$
SELECT to_char($1, 'MM-DD');
$BODY$ language 'sql' IMMUTABLE STRICT;

ok, now i can:

# create index q on users ( indexable_month_day(birthdate) );
CREATE INDEX
# explain analyze select * from users where indexable_month_day(birthdate) = indexable_month_day( (now() + '1 day'::interval)::date);
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------
Bitmap Heap Scan on users (cost=138.71..10719.75 rows=7458 width=8) (actual time=1.693..5.091 rows=2553 loops=1)
Recheck Cond: (indexable_month_day(birthdate) = indexable_month_day(((now() + '1 day'::interval))::date))
-> Bitmap Index Scan on q (cost=0.00..136.85 rows=7458 width=0) (actual time=1.104..1.104 rows=2553 loops=1)
Index Cond: (indexable_month_day(birthdate) = indexable_month_day(((now() + '1 day'::interval))::date))
Total runtime: 6.103 ms

now bad. but is it correct?

let's assume that we have a person that was born on 29th of february, 1980. and let's assume today is 28th of february of 2007.

now() + ‘1 day' will give me 1st of march, and i will not find the guy from 29th of february.

of course – this is technically right, but anyway – it would be cool to wish the guy “happy birthday".

also – if the guy was born on 29th of february – should be greet him on 28th of february or on 1st of march?

of course – if current year is also leap year, the problem doesn't exist. but – the odds are that it is not leap year.

to make the story short, i wrote a function which returns list of dates that match some “now() + x days" criteria:

CREATE OR REPLACE FUNCTION birthdates_soon(in_today date, in_after_days INT4) RETURNS TEXT[] as $BODY$
DECLARE
work_date date := in_today + in_after_days * '1 day'::INTERVAL;
output TEXT[] := ARRAY[ indexable_month_day( work_date ) ];
BEGIN
IF NOT is_leap_year(in_today) THEN
IF output[1] in ('02-28', '03-01') THEN
output := output || '02-29'::TEXT;
END IF;
END IF;
RETURN output;
END;
$BODY$ language plpgsql IMMUTABLE STRICT;

this function needs this small function:

CREATE OR REPLACE FUNCTION is_leap_year(date) RETURNS bool as $BODY$
DECLARE
use_year INT4 := date_part('year', $1)::INT4;
BEGIN
IF 0 < use_year % 4 THEN
RETURN false;
ELSIF 0 < use_year % 100 THEN
RETURN true;
ELSIF 0 < use_year % 400 THEN
RETURN false;
END IF;
RETURN true;
END;
$BODY$ language plpgsql IMMUTABLE strict;

what it does? let's see:

# select
x.date as today,
birthdates_soon(x.date, 1) as tomorrow
from (
values
('2007-02-28'::date),
('2007-03-01'::date),
('2007-02-27'),
('2000-02-27'),
('2000-02-28'),
('2000-02-29'),
(now()::date)
) as x (date);
today | tomorrow
------------+---------------
2007-02-28 | {03-01,02-29}
2007-03-01 | {03-02}
2007-02-27 | {02-28,02-29}
2000-02-27 | {02-28}
2000-02-28 | {02-29}
2000-02-29 | {03-01}
2007-10-26 | {10-27}
(7 rows)

usually we will be calling the function with first argument set to now()::date, so - let's make a wrapper:

CREATE OR REPLACE FUNCTION birthdates_soon(INT4) RETURNS TEXT[] as $BODY$
SELECT birthdates_soon(now()::date, $1);
$BODY$ language sql IMMUTABLE STRICT;

this will allow us to write: ... birthdates_soon(1) or ... birthdates_soon( '2006-01-01'::date, 1).

and now, i can use the function in our "user-finding" query:

# explain analyze select * from users where indexable_month_day(birthdate) = ANY ( birthdates_soon(1) );
QUERY PLAN
------------------------------------------------------------------------------------------------------------------
Bitmap Heap Scan on users (cost=138.45..8789.73 rows=7458 width=8) (actual time=1.442..4.913 rows=2553 loops=1)
Recheck Cond: (indexable_month_day(birthdate) = ANY ('{10-27}'::text[]))
-> Bitmap Index Scan on q (cost=0.00..136.59 rows=7458 width=0) (actual time=0.870..0.870 rows=2553 loops=1)
Index Cond: (indexable_month_day(birthdate) = ANY ('{10-27}'::text[]))
Total runtime: 5.899 ms
(5 rows)

as you can see we have to use = ANY() construct because our function returns array. if it was returning "setof text", i would have to use ... in ().

so, that would be all. hope you'll find it useful.

  1. 10 comments

  2. # Neil
    Oct 26, 2007

    Minor quibble: using current_date seems simpler and clearer than using now()::date.

  3. # Michael Glaesemann
    Oct 27, 2007

    Thank, Hubert, for presenting this. The birthday problem is something I’ve wanted to take a stab at and your post prompted me to look at it now.

    It’s a personal pet peeve, but I really don’t like casting a datetime value to text. You can get away with it here because you’re relying on the fact that the collation of text representation is going to be the same as the ordering of the dates. As far as I know, you have to do something like this because there isn’t a type to represent just the month and day portion of a datetime value.

    To avoid using to_char, I chose to convert the month and day to an integer representation:

    create function month_and_day_integer(p_date date)
    returns integer
    immutable strict
    language sql as $func$
    select (cast(extract(month from $1) as integer) * 100 +
    cast(extract(day from $1) as integer))
    $func$;
    comment on function month_and_day_integer(date) is
    'Create an immutable (and therefore indexable) integer representation of month '
    'and day.';

    It’s the same in spirit as what you’re doing with to_char, but I think it’s a bit clearer what is happening: you’re no longer dealing with something that looks like a date. I haven’t profiled it to see how the two pairs of extract and cast and a little arithmetic compare to the single to_char call: I suspect the former may be slower.

    I also took advantage of the fact that today always follows yesterday: rather than determine whether or not a leap year is involved and conditionally add February 29, I just compare the date in question to see if it’s after yesterday: during a leap year, yesterday relative to March 1 (301) is February 29 (229) and otherwise it’s February 28 (228). In any year, the month_and_day_integer representation of February 29 is going to fall between yesterday and today on March 1.

    Here’s the rest of the code:
    create function month_and_day_integer(p_timestamp timestamp without time zone)
    returns integer
    immutable strict
    language sql as $func$
    select month_and_day_integer(cast($1 as date))
    $func$;
    comment on function month_and_day_integer(timestamp without time zone) is
    'Wrapper for month_and_day_integer(date) to cast parameter from timestamp to date.';

    create function is_birthday(p_birthdate date, p_date date)
    returns boolean
    immutable strict
    language sql as $func$
    select month_and_day_integer($1) > month_and_day_integer($2 - 1)
    and month_and_day_integer($1) month_and_day_integer($3 - 1)
    and month_and_day_integer($1)

    Not having a huge amount of data on hand, I generated a bit using:

    create table persons (person text primary key, birthdate date not null);
    insert into persons (person, birthdate)
    select 'person-' || x, cast('1970-01-01' as date) + x
    from generate_series(1,100000) as foo(x);
    insert into persons (person, birthdate)
    select 'person-2-' || x, cast('1970-01-01' as date) - x
    from generate_series(1,100000) as foo(x);

    create index person_birthday_idx on persons (month_and_day_integer(birthdate));

    Query plans are similar (though they look messier due to the inlined SQL):

    -- birthday within one day (today or tomorrow)
    explain analyze
    select persons.*
    from persons
    where is_birthday_in(birthdate, interval '1 day');

    http://explain-analyze.info/query_plans/1227-birthday-within-one-day

    I was a bit surprised that the is_birthday call wasn't inlined and didn't take advantage of person_birthday_idx, rather using a sequential scan with very unfortunate results:

    explain analyze
    select persons.*
    from persons
    where is_birthday(birthdate, 'tomorrow'::date);

    http://explain-analyze.info/query_plans/1228-birthday-in-tomorrow-seq-scan

    Inlining the function body in the query worked well, however:

    -- inlining function body
    explain analyze
    select *
    from persons
    where month_and_day_integer(birthdate) > month_and_day_integer('tomorrow'::date - 1)
    and month_and_day_integer(birthdate)

    http://explain-analyze.info/query_plans/1229-birthday-in-manually-inlined

    The results are all with default statistics and an analyzed table.

    Just for comparison, I used the functions you provided on the generated data:

    explain analyze
    select * from
    persons
    where indexable_month_day(birthdate) = ANY ( birthdates_soon(1) );

    http://explain-analyze.info/query_plans/1230-birthdays-soon

    explain analyze
    select *
    from persons
    where indexable_month_day(birthdate) = ANY ( birthdates_soon(7) );

    http://explain-analyze.info/query_plans/1231-birthdays-soon-7-days

    explain analyze
    select *
    from persons
    where is_birthday_in(birthdate, interval '1 week');

    http://explain-analyze.info/query_plans/1232-birthdays-in-one-week

    Granted, my little analysis isn't very rigorous, but it looks like your array code is a bit faster than the comparisons I'm using.

  4. Oct 28, 2007

    aargh. andrewsn ranted, and grzm showed me that i made something really bad – used to_char() to get integer value of part of date.

    function (is_leap_year()) is now fixed to properly use date_part().

  5. # Michael Glaesemann
    Oct 28, 2007

    Also per AndrewSN, redefining is_birthday to not be strict allows it to be properly inlined.

  6. # Hussein
    Oct 31, 2007

    I’m confused as to why you would’t just:

    create index idx_bday on users( date_part(‘doy’,birthday));

    then query as say:

    select * from users where date_part(‘doy’,birthday) = date_part(‘doy’,’feb 28, 2008′::date + 1);

  7. # Hussein
    Oct 31, 2007

    Oops, my previous post is obviously wrong. Pardon the noise.

  8. # Ravshan
    Mar 28, 2008

    Hello! I am from Tojikiston. Your blog is very intrestin, but it very difficult to read it with on-line translator. Can you make it in my own Tojikiston language?

  9. Mar 28, 2008

    @Ravshan:
    sorry,. but i dont know your language, and i think that writing in it will make it very difficult to read for others.

  10. # zboszor
    May 4, 2008

    There’s no need to use all these function magic to make it work for 02-29.

    select * from public.users
    where
    indexable_month_day(birthdate) > indexable_month_day(now()::date) and
    indexable_month_day(birthdate) <= indexable_month_day((now() + ‘1 days'::interval)::date);

  11. # John Pierce
    Nov 22, 2011

    hmmm, isn’t this simpler?

    SELECT … WHERE birthdate+date_trunc (‘years’, age(current_date, birthdate))::date +interval ‘1 year’ BETWEEN CURRENT_DATE AND CURRENT_DATE + N * interval ‘1 day';

    to find if someone’s birthday is in the next N days?

    that gnarly expression between the WHERE and BETWEEN calculates the date of their next birthday…

Leave a comment