who has birthday tomorrow?

so, there you have a users table, with a very basic structure:

      Table "public.users"
  Column   |  Type   | Modifiers
-----------+---------+-----------
 id        | integer | not null
 birthdate | date    |
Indexes:
    "x_pkey" PRIMARY KEY, btree (id)

then, you have a task: find a query that will return all users which have birthday tomorrow. how will you do it?

there are couple of problems with it.

first – let's assume that you want the search using:

SELECT * FROM users WHERE to_char(birthdate, 'MM-DD') = to_char(now() + '1 day'::INTERVAL, 'MM-DD');

this is simple, yet it seq-scans whole table:

# EXPLAIN ANALYZE SELECT * FROM users WHERE to_char(birthdate, 'MM-DD') = to_char(now() + '1 day'::INTERVAL, 'MM-DD');
                                                           QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------
 Seq Scan ON users  (cost=0.00..43674.95 ROWS=7458 width=8) (actual TIME=0.036..5537.293 ROWS=2553 loops=1)
   FILTER: (to_char((birthdate)::TIMESTAMP WITH TIME zone, 'MM-DD'::text) = to_char((now() + '1 day'::INTERVAL), 'MM-DD'::text))
 Total runtime: 5538.686 ms
(3 ROWS)

now, you surely can index it with functional index. right? wrong!:

# CREATE INDEX q ON users (to_char(birthdate, 'MM-DD'));
ERROR:  functions IN INDEX expression must be marked IMMUTABLE

ouch.

to_char() is “stable", but not “immutable" – if you don't know what it means – consult the manual.

of course i can write my own wrapper around to_char:

CREATE OR REPLACE FUNCTION indexable_month_day(DATE) RETURNS TEXT AS $BODY$
    SELECT to_char($1, 'MM-DD');
$BODY$ LANGUAGE 'sql' IMMUTABLE STRICT;

ok, now i can:

# CREATE INDEX q ON users ( indexable_month_day(birthdate) );
CREATE INDEX
# EXPLAIN analyze SELECT * FROM users WHERE indexable_month_day(birthdate) = indexable_month_day( (now() + '1 day'::INTERVAL)::DATE);
                                                    QUERY PLAN
-------------------------------------------------------------------------------------------------------------------
 Bitmap Heap Scan ON users  (cost=138.71..10719.75 ROWS=7458 width=8) (actual TIME=1.693..5.091 ROWS=2553 loops=1)
   Recheck Cond: (indexable_month_day(birthdate) = indexable_month_day(((now() + '1 day'::INTERVAL))::DATE))
   ->  Bitmap INDEX Scan ON q  (cost=0.00..136.85 ROWS=7458 width=0) (actual TIME=1.104..1.104 ROWS=2553 loops=1)
         INDEX Cond: (indexable_month_day(birthdate) = indexable_month_day(((now() + '1 day'::INTERVAL))::DATE))
 Total runtime: 6.103 ms

now bad. but is it correct?

let's assume that we have a person that was born on 29th of february, 1980. and let's assume today is 28th of february of 2007.

now() + ‘1 day' will give me 1st of march, and i will not find the guy from 29th of february.

of course – this is technically right, but anyway – it would be cool to wish the guy “happy birthday".

also – if the guy was born on 29th of february – should be greet him on 28th of february or on 1st of march?

of course – if current year is also leap year, the problem doesn't exist. but – the odds are that it is not leap year.

to make the story short, i wrote a function which returns list of dates that match some “now() + x days" criteria:

CREATE OR REPLACE FUNCTION birthdates_soon(in_today DATE, in_after_days INT4) RETURNS TEXT[] AS $BODY$
DECLARE
    work_date DATE := in_today + in_after_days * '1 day'::INTERVAL;
    output TEXT[] := ARRAY[ indexable_month_day( work_date ) ];
BEGIN
    IF NOT is_leap_year(in_today) THEN
        IF output[1] IN ('02-28', '03-01') THEN
            output := output || '02-29'::TEXT;
        END IF;
    END IF;
    RETURN output;
END;
$BODY$ LANGUAGE plpgsql IMMUTABLE STRICT;

this function needs this small function:

CREATE OR REPLACE FUNCTION is_leap_year(DATE) RETURNS bool AS $BODY$
DECLARE
    use_year INT4 := date_part('year', $1)::INT4;
BEGIN
    IF 0 < use_year % 4 THEN
        RETURN FALSE;
    ELSIF 0 < use_year % 100 THEN
        RETURN TRUE;
    ELSIF 0 < use_year % 400 THEN
        RETURN FALSE;
    END IF;
    RETURN TRUE;
END;
$BODY$ LANGUAGE plpgsql IMMUTABLE strict;

what it does? let's see:

# SELECT
    x.date AS today,
    birthdates_soon(x.date, 1) AS tomorrow
FROM (
    VALUES
        ('2007-02-28'::DATE),
        ('2007-03-01'::DATE),
        ('2007-02-27'),
        ('2000-02-27'),
        ('2000-02-28'),
        ('2000-02-29'),
        (now()::DATE)
    ) AS x (DATE);
   today    |   tomorrow
------------+---------------
 2007-02-28 | {03-01,02-29}
 2007-03-01 | {03-02}
 2007-02-27 | {02-28,02-29}
 2000-02-27 | {02-28}
 2000-02-28 | {02-29}
 2000-02-29 | {03-01}
 2007-10-26 | {10-27}
(7 ROWS)

usually we will be calling the function with first argument set to now()::date, so – let's make a wrapper:

CREATE OR REPLACE FUNCTION birthdates_soon(INT4) RETURNS TEXT[] AS $BODY$
    SELECT birthdates_soon(now()::DATE, $1);
$BODY$ LANGUAGE SQL IMMUTABLE STRICT;

this will allow us to write: … birthdates_soon(1) or … birthdates_soon( ‘2006-01-01'::date, 1).

and now, i can use the function in our “user-finding" query:

# EXPLAIN analyze SELECT * FROM users WHERE indexable_month_day(birthdate) = ANY ( birthdates_soon(1) );
                                                    QUERY PLAN
------------------------------------------------------------------------------------------------------------------
 Bitmap Heap Scan ON users  (cost=138.45..8789.73 ROWS=7458 width=8) (actual TIME=1.442..4.913 ROWS=2553 loops=1)
   Recheck Cond: (indexable_month_day(birthdate) = ANY ('{10-27}'::text[]))
   ->  Bitmap INDEX Scan ON q  (cost=0.00..136.59 ROWS=7458 width=0) (actual TIME=0.870..0.870 ROWS=2553 loops=1)
         INDEX Cond: (indexable_month_day(birthdate) = ANY ('{10-27}'::text[]))
 Total runtime: 5.899 ms
(5 ROWS)

as you can see we have to use = ANY() construct because our function returns array. if it was returning “setof text", i would have to use … in ().

so, that would be all. hope you'll find it useful.

10 thoughts on “who has birthday tomorrow?”

  1. Minor quibble: using current_date seems simpler and clearer than using now()::date.

  2. Thank, Hubert, for presenting this. The birthday problem is something I’ve wanted to take a stab at and your post prompted me to look at it now.

    It’s a personal pet peeve, but I really don’t like casting a datetime value to text. You can get away with it here because you’re relying on the fact that the collation of text representation is going to be the same as the ordering of the dates. As far as I know, you have to do something like this because there isn’t a type to represent just the month and day portion of a datetime value.

    To avoid using to_char, I chose to convert the month and day to an integer representation:

    CREATE FUNCTION month_and_day_integer(p_date DATE)
      RETURNS INTEGER
      immutable strict
      LANGUAGE SQL AS $func$
      SELECT (CAST(EXTRACT(MONTH FROM $1) AS INTEGER) * 100 + 
               CAST(EXTRACT(DAY FROM $1) AS INTEGER))
      $func$;
    comment ON FUNCTION month_and_day_integer(DATE) IS
    'Create an immutable (and therefore indexable) integer representation of month '
    'and day.';

    It’s the same in spirit as what you’re doing with to_char, but I think it’s a bit clearer what is happening: you’re no longer dealing with something that looks like a date. I haven’t profiled it to see how the two pairs of

    extract

    and

    cast

    and a little arithmetic compare to the single

    to_char

    call: I suspect the former may be slower.

    I also took advantage of the fact that today always follows yesterday: rather than determine whether or not a leap year is involved and conditionally add February 29, I just compare the date in question to see if it’s after yesterday: during a leap year, yesterday relative to March 1 (301) is February 29 (229) and otherwise it’s February 28 (228). In any year, the

    month_and_day_integer

    representation of February 29 is going to fall between yesterday and today on March 1.

    Here’s the rest of the code:

    CREATE FUNCTION month_and_day_integer(p_timestamp TIMESTAMP WITHOUT TIME zone)
      RETURNS INTEGER
      immutable strict
      LANGUAGE SQL AS $func$
      SELECT month_and_day_integer(CAST($1 AS DATE))
      $func$;
    comment ON FUNCTION month_and_day_integer(TIMESTAMP WITHOUT TIME zone) IS
    'Wrapper for month_and_day_integer(date) to cast parameter from timestamp to date.';
     
    CREATE FUNCTION is_birthday(p_birthdate DATE, p_date DATE) 
      RETURNS BOOLEAN
      immutable strict
      LANGUAGE SQL AS $func$
      SELECT month_and_day_integer($1) > month_and_day_integer($2 - 1) 
               AND month_and_day_integer($1)  month_and_day_integer($3 - 1)
               AND month_and_day_integer($1) 
     
    NOT HAVING a huge amount OF DATA ON hand, I generated a bit USING:
     
    <code>CREATE TABLE persons (person text PRIMARY KEY, birthdate DATE NOT NULL);
    INSERT INTO persons (person, birthdate)
    SELECT 'person-' || x, CAST('1970-01-01' AS DATE) + x
    FROM generate_series(1,100000) AS foo(x);
    INSERT INTO persons (person, birthdate)
    SELECT 'person-2-' || x, CAST('1970-01-01' AS DATE) - x
    FROM generate_series(1,100000) AS foo(x);
     
    CREATE INDEX person_birthday_idx ON persons (month_and_day_integer(birthdate));

    Query plans are similar (though they look messier due to the inlined SQL):

    -- birthday within one day (today or tomorrow)
    EXPLAIN analyze 
    SELECT persons.*
    FROM persons
    WHERE is_birthday_in(birthdate, INTERVAL '1 day');

    http://explain-analyze.info/query_plans/1227-birthday-within-one-day

    I was a bit surprised that the

    is_birthday

    call wasn’t inlined and didn’t take advantage of

    person_birthday_idx

    , rather using a sequential scan with very unfortunate results:

    EXPLAIN analyze
    SELECT persons.*
    FROM persons
    WHERE is_birthday(birthdate, 'tomorrow'::DATE);

    http://explain-analyze.info/query_plans/1228-birthday-in-tomorrow-seq-scan

    Inlining the function body in the query worked well, however:

    -- inlining function body
    EXPLAIN analyze
    SELECT *
    FROM persons
    WHERE month_and_day_integer(birthdate) > month_and_day_integer('tomorrow'::DATE - 1)
             AND month_and_day_integer(birthdate) 
     
    http://explain-analyze.info/query_plans/1229-birthday-in-manually-inlined
     
    The results are ALL WITH DEFAULT statistics AND an analyzed TABLE.
     
    Just FOR comparison, I used the functions you provided ON the generated DATA:
     
    <code>EXPLAIN analyze 
    SELECT * FROM 
    persons 
    WHERE indexable_month_day(birthdate) = ANY ( birthdates_soon(1) );

    http://explain-analyze.info/query_plans/1230-birthdays-soon

    EXPLAIN analyze 
    SELECT * 
    FROM persons 
    WHERE indexable_month_day(birthdate) = ANY ( birthdates_soon(7) );

    http://explain-analyze.info/query_plans/1231-birthdays-soon-7-days

    EXPLAIN analyze 
    SELECT * 
    FROM persons 
    WHERE is_birthday_in(birthdate, INTERVAL '1 week');

    http://explain-analyze.info/query_plans/1232-birthdays-in-one-week

    Granted, my little analysis isn’t very rigorous, but it looks like your array code is a bit faster than the comparisons I’m using.

  3. aargh. andrewsn ranted, and grzm showed me that i made something really bad – used to_char() to get integer value of part of date.

    function (is_leap_year()) is now fixed to properly use date_part().

  4. Also per AndrewSN, redefining is_birthday to not be strict allows it to be properly inlined.

  5. I’m confused as to why you would’t just:

    create index idx_bday on users( date_part(‘doy’,birthday));

    then query as say:

    select * from users where date_part(‘doy’,birthday) = date_part(‘doy’,’feb 28, 2008′::date + 1);

  6. Oops, my previous post is obviously wrong. Pardon the noise.

  7. Hello! I am from Tojikiston. Your blog is very intrestin, but it very difficult to read it with on-line translator. Can you make it in my own Tojikiston language?

  8. @Ravshan:
    sorry,. but i dont know your language, and i think that writing in it will make it very difficult to read for others.

  9. There’s no need to use all these function magic to make it work for 02-29.

    select * from public.users
    where
    indexable_month_day(birthdate) > indexable_month_day(now()::date) and
    indexable_month_day(birthdate) <= indexable_month_day((now() + ‘1 days’::interval)::date);

  10. hmmm, isn’t this simpler?

    SELECT … WHERE birthdate+date_trunc (‘years’, age(current_date, birthdate))::date +interval ‘1 year’ BETWEEN CURRENT_DATE AND CURRENT_DATE + N * interval ‘1 day’;

    to find if someone’s birthday is in the next N days?

    that gnarly expression between the WHERE and BETWEEN calculates the date of their next birthday…

Comments are closed.