Waiting for 8.5 – exclusion constraints

On 7th of December, Tom Lane committed patch by Jeff Davis that adds general exclusion constraints:

Log Message:
-----------
Add exclusion constraints, which generalize the concept of uniqueness to
support any indexable commutative operator, not just equality.  Two rows
violate the exclusion constraint if "row1.col OP row2.col" is TRUE for
each of the columns in the constraint.
 
Jeff Davis, reviewed by Robert Haas

Now, this sounds, useful, but after checking the docs, I have only slight idea on how to actually use this feature.

I know what it is for – it's a way to specify that such 2 rows cannot exist together. Simplest possible way of exclusion is unique index – 2 rows with the same (NOT NULL) value for given row, cannot exist at the same time in the same table.

So, knowing that it's the simplest way, let's try to use this new capability to emulate UNIQUE CONSTRAINT (side note: as docs say – it's possible, but not suggested, standard UNIQUE will be always faster).

So, test table, will have only 1 column, and this EXCLUSION CONSTRAINT on it. Here's the definition:

CREATE TABLE test (
    i INT4,
    EXCLUDE (i WITH =)
);

What does it mean? It's simple – for every row, there is a violation, is there is any other row with the same ( operator “=" ) value in column i.

Let's test it. First, let's see how the table looks via \d:

     Table "public.test"
 Column |  Type   | Modifiers
--------+---------+-----------
 i      | integer |
Indexes:
    "test_i_exclusion" btree (i)
Exclusion constraints:
    "test_i_exclusion" EXCLUDE USING btree (i WITH =)

2 simple sanity-checks by standard, correct inserts:

INSERT INTO test (i) VALUES (1);
INSERT INTO test (i) VALUES (2);

And then INSERT that should break:

# INSERT INTO test (i) VALUES (1);
psql:z.sql:9: ERROR:  conflicting KEY VALUE violates exclusion CONSTRAINT "test_i_exclusion"
DETAIL:  KEY (i)=(1) conflicts WITH existing KEY (i)=(1).

Nice. Now for something more complex.

Let's try to use this capability to check for overlapping time ranges. This could be used for example for making sure that given room is only reserved by 1 user at any give point in time.

There is a problem with it, though, that I can't seem to find a way to define this exclusion using plain timestamptz datatypes – I have to resort to use geometric types for comparing ranges.

In case you're not familiar with the idea, here we go with some overview:

If you have a range, that you specify with 2 endpoints (1-10, or from 1st to 15th of November 2009), you can change it to geometric type (rectangle for example). For example – range <1,10> can be stored as rectangle (1,1)-(10,10). What is the purpose?

That's simple – searching for ranges in this way is usually faster using GiST indexes for geometric data, than using standard BTree indexes on base columns.

So, knowing this, let's create test table with time ranges, and exclusion constraint based on them:

CREATE TABLE test (
    from_ts TIMESTAMPTZ,
    to_ts   TIMESTAMPTZ,
    CHECK ( from_ts < to_ts ),
    CONSTRAINT overlapping_times EXCLUDE USING GIST (
        box(
            point( EXTRACT(epoch FROM from_ts at TIME zone 'UTC'), EXTRACT(epoch FROM from_ts at TIME zone 'UTC') ),
            point( EXTRACT(epoch FROM to_ts at TIME zone 'UTC')  , EXTRACT(epoch FROM to_ts at TIME zone 'UTC') )
        ) WITH &&
    )
);

PLEASE CHECK COMMENTS BEFORE IMPLEMENTING IT

Scared with the example? Just check this:

# \d test
              Table "public.test"
 Column  |           Type           | Modifiers 
---------+--------------------------+-----------
 from_ts | timestamp with time zone | 
 to_ts   | timestamp with time zone | 
Indexes:
    "overlapping_times" gist (box(point(date_part('epoch'::text, timezone('UTC'::text, from_ts)), date_part('epoch'::text, timezone('UTC'::text, from_ts))), point(date_part('epoch'::text, timezone('UTC'::text, to_ts)), date_part('epoch'::text, timezone('UTC'::text, to_ts)))))
Check constraints:
    "test_check" CHECK (from_ts < to_ts)
Exclusion constraints:
    "overlapping_times" EXCLUDE USING gist (box(point(date_part('epoch'::text, timezone('UTC'::text, from_ts)), date_part('epoch'::text, timezone('UTC'::text, from_ts))), point(date_part('epoch'::text, timezone('UTC'::text, to_ts)), date_part('epoch'::text, timezone('UTC'::text, to_ts)))) WITH &&)

I'll get back to it (what it is, and how it works) in a minute, but first let's see if it really works. First, standard sanity check:

INSERT INTO test ( from_ts, to_ts ) VALUES ( '2009-01-01 01:23:45 EST', '2009-01-10 23:45:12 EST' );
INSERT INTO test ( from_ts, to_ts ) VALUES ( '2009-02-01 01:23:45 EST', '2009-02-10 23:45:12 EST' );

Works fine, ranges do not overlap. But when I'll try to insert row with invalid (overlapping with some previous) range:

# INSERT INTO test ( from_ts, to_ts ) VALUES ( '2009-01-08 00:00:00 EST', '2009-01-15 23:59:59 EST' );
psql:z.sql:18: ERROR:  conflicting KEY VALUE violates exclusion CONSTRAINT "overlapping_times"
DETAIL:  KEY (box(point(date_part('epoch'::text, timezone('UTC'::text, from_ts)), date_part('epoch'::text, timezone('UTC'::text, from_ts))), point(date_part('epoch'::text, timezone('UTC'::text, to_ts)), date_part('epoch'::text, timezone('UTC'::text, to_ts)))))=((1232078399,1232078399),(1231387200,1231387200)) conflicts WITH existing KEY (box(point(date_part('epoch'::text, timezone('UTC'::text, from_ts)), date_part('epoch'::text, timezone('UTC'::text, from_ts))), point(date_part('epoch'::text, timezone('UTC'::text, to_ts)), date_part('epoch'::text, timezone('UTC'::text, to_ts)))))=((1231645512,1231645512),(1230787425,1230787425)).

So, we know it works. But how and why?

Let's bring back table definition for discussion:

CREATE TABLE test (
    from_ts TIMESTAMPTZ,
    to_ts   TIMESTAMPTZ,
    CHECK ( from_ts < to_ts ),
    CONSTRAINT overlapping_times EXCLUDE USING GIST (
        box(
            point( EXTRACT(epoch FROM from_ts at TIME zone 'UTC'), EXTRACT(epoch FROM from_ts at TIME zone 'UTC') ),
            point( EXTRACT(epoch FROM to_ts at TIME zone 'UTC')  , EXTRACT(epoch FROM to_ts at TIME zone 'UTC') )
        ) WITH &&
    )
);

First line is obvious, so are next 2 lines – they simply define columns with timestamptz datatype (timestamp with time zone).

Next line is check – I added it so that I will be sure that from_ts is always before to_ts – to avoid application/test errors that would provide from_ts after to_ts. This could have been also done with trigger, with optionally switching values in case of bad order in them.

And now the constraint. First of all – this time I used named constraint. Without “CONSTRAINT overlapping_times" it would work just as fine, but on violation, it would report constraint check as “test_box_exclusion" which is not really usable.

Now, this exclude has been defined using GIST – it means that it will create GiST index – just like docs say – you will practically always use GiST for your constraint exclusions.

Within the exclusion, we create box(), and use && operator, which (as you can check) finds whether 2 polygons overlap.

So, we have to create the polygons. Choosen polygons are boxes (rectangles), which are defined using 2 points. Each point gets its coordinates from from_ts and to_ts respectively, but since coordinates are assumed to be float values, and we have timestampts, we have to convert them to floats, using extract() function.

One more notice – extract() on timestamptz is not immutable – which is required for indexes – so we couldn't use it directly – I had first to convert timestatmptz to timestamp, by requesting timestamp at specific timezone – in this situation I choose UTC which seems to be pretty sane – stable, doesn't have daylight savings, just works.

So, let's do it step by step, what are the actual values that our index (overlapping_times) will store, for row with (from_ts => ‘2009-01-01 01:23:45 EST', to_ts => ‘2009-01-10 23:45:12 EST').

First, let's convert the values to UTC:

# SELECT '2009-01-01 01:23:45 EST'::timestamptz at TIME zone 'UTC', '2009-01-10 23:45:12 EST'::timestamptz at TIME zone 'UTC';
      timezone       |      timezone
---------------------+---------------------
 2009-01-01 06:23:45 | 2009-01-11 04:45:12
(1 ROW)

Now, let's extract epochs:

SELECT
    EXTRACT( epoch FROM '2009-01-01 01:23:45 EST'::timestamptz at TIME zone 'UTC' ),
    EXTRACT( epoch FROM '2009-01-10 23:45:12 EST'::timestamptz at TIME zone 'UTC' );
 date_part  | date_part
------------+------------
 1230787425 | 1231645512
(1 ROW)

Now, let's convert it to box:

<code>SELECT
    box(
        point(
            EXTRACT( epoch FROM '2009-01-01 01:23:45 EST'::timestamptz at TIME zone 'UTC' ),
            EXTRACT( epoch FROM '2009-01-01 01:23:45 EST'::timestamptz at TIME zone 'UTC' )
        ),
        point(
            EXTRACT( epoch FROM '2009-01-10 23:45:12 EST'::timestamptz at TIME zone 'UTC' ),
            EXTRACT( epoch FROM '2009-01-10 23:45:12 EST'::timestamptz at TIME zone 'UTC' )
        )
    );</code>
                       box
-------------------------------------------------
 (1231645512,1231645512),(1230787425,1230787425)
(1 ROW)

Sweet. It looks pretty convoluted, but once you “get it" – it's all fine and simple.

What's more – readability of the code can be dramatically improved by providing a wrapper around calculations, like this:

CREATE OR REPLACE FUNCTION box_from_ts_range(in_first timestamptz, in_second timestamptz) RETURNS box AS $$
DECLARE
    first_float  float8 := EXTRACT(epoch FROM in_first  AT TIME ZONE 'UTC');
    second_float float8 := EXTRACT(epoch FROM in_second AT TIME ZONE 'UTC');
BEGIN
    RETURN box( point ( first_float, first_float), point( second_float, second_float ) );
END;
$$ LANGUAGE plpgsql IMMUTABLE;

And then, making the table like this:

# CREATE TABLE test (
    from_ts TIMESTAMPTZ,
    to_ts   TIMESTAMPTZ,
    CHECK ( from_ts < to_ts ),
    CONSTRAINT overlapping_times EXCLUDE USING GIST ( box_from_ts_range( from_ts, to_ts ) WITH && )
);
\d test
              TABLE "public.test"
 COLUMN  |           TYPE           | Modifiers 
---------+--------------------------+-----------
 from_ts | TIMESTAMP WITH TIME zone | 
 to_ts   | TIMESTAMP WITH TIME zone | 
Indexes:
    "overlapping_times" gist (box_from_ts_range(from_ts, to_ts))
CHECK constraints:
    "test_check" CHECK (from_ts < to_ts)
Exclusion constraints:
    "overlapping_times" EXCLUDE USING gist (box_from_ts_range(from_ts, to_ts) WITH &&)

Actuall error message now is also more readable (although only slightly):

# INSERT INTO test ( from_ts, to_ts ) VALUES ( '2009-01-08 00:00:00 EST', '2009-01-15 23:59:59 EST' );
psql:z.sql:21: ERROR:  conflicting KEY VALUE violates exclusion CONSTRAINT "overlapping_times"
DETAIL:  KEY (box_from_ts_range(from_ts, to_ts))=((1232078399,1232078399),(1231387200,1231387200)) conflicts WITH existing KEY (box_from_ts_range(from_ts, to_ts))=((1231645512,1231645512),(1230787425,1230787425)).

Now, let's try to use the EXCLUDE for something more realistic – room reservations for hotel.

Let's assume following table structure:

CREATE TABLE reservations (
    id          SERIAL PRIMARY KEY,
    room_number INT4 NOT NULL,
    from_ts     DATE NOT NULL,
    to_ts       DATE NOT NULL,
    guest_name  TEXT NOT NULL,
    CHECK       ( from_ts <= to_ts ),
    CHECK       ( room_number >= 101 AND room_number <= 700 AND room_number % 100 >= 1 AND room_number % 100 <= 25 )
);
\d reservations
                            TABLE "public.reservations"
   COLUMN    |  TYPE   |                         Modifiers
-------------+---------+-----------------------------------------------------------
 id          | INTEGER | NOT NULL DEFAULT NEXTVAL('reservations_id_seq'::regclass)
 room_number | INTEGER | NOT NULL
 from_ts     | DATE    | NOT NULL
 to_ts       | DATE    | NOT NULL
 guest_name  | text    | NOT NULL
Indexes:
    "reservations_pkey" PRIMARY KEY, btree (id)
CHECK constraints:
    "reservations_check" CHECK (from_ts <= to_ts)
    "reservations_room_number_check" CHECK (room_number >= 101 AND room_number <= 700 AND (room_number % 100) >= 1 AND (room_number % 100) <= 25)

Small explanation – hotel works in “days" – you can't have reservation from 8am to 1pm – it is assumed that you can check-in not sooner than 3pm, and you have to check-out by 11am. And as for days – we store to_ts as the last day that customer reserverd “till midnight".

For example – if somebody makes a reservation from 1st of November, till 7th of November, – i.e. his room will be reserved for him from ‘2009-11-01 15:00:00' to ‘2009-11-07 11:00:00', but in reservations table we will store it as:

  • from_ts – 2009-11-01
  • to_ts – 2009-11-06 (not 7th, but 6th – as this is the last midnight for this reservation)

This means that single-day reservations will have the same day in from_ts and to_ts columns. A bit strange perhaps, but workable.

Now. We need/want to add constraint that will make sure that the same room cannot be reserved 2 times.

Since we are creating the constraint using boxes, we actually have 2 axes – one can be used for time, and the other for room_number, so we can:

ALTER TABLE public.reservations ADD CONSTRAINT check_overlapping_reservations EXCLUDE USING GIST (
    box (
        point(
            from_ts - '2000-01-01'::DATE,
            room_number
        ),
        point(
            ( to_ts - '2000-01-01'::DATE ) + 0.5,
            room_number + 0.5
        )
    )
    WITH &&
);

You might wonder why I add 0.5 – it's simple – to avoid creation of boxes that have 0 width or height. Since base numbers (number of days since 2000-01-01, and room_number) are always integers, adding 0.5 there is safe – it will not get mixed with another day/room.

And now, let's test. First, insert simple reservation:

# INSERT INTO reservations (room_number, from_ts, to_ts, guest_name) VALUES (101, '2010-01-05', '2010-01-23', 'test1');
INSERT 0 1

Now, let's insert reservation that is in the same time, but for different room:

# INSERT INTO reservations (room_number, from_ts, to_ts, guest_name) VALUES (102, '2010-01-07', '2010-01-21', 'test2');
INSERT 0 1

And now, same room, but different time:

# INSERT INTO reservations (room_number, from_ts, to_ts, guest_name) VALUES (101, '2010-01-25', '2010-01-30', 'test3');
INSERT 0 1

All works fine. And now – let's add real conflict:

# INSERT INTO reservations (room_number, from_ts, to_ts, guest_name) VALUES (101, '2010-01-07', '2010-01-08', 'test4');
ERROR:  conflicting KEY VALUE violates exclusion CONSTRAINT "check_overlapping_reservations"
DETAIL:  KEY (box(point((from_ts - '2000-01-01'::DATE)::DOUBLE PRECISION, room_number::DOUBLE PRECISION), point(((to_ts - '2000-01-01'::DATE)::NUMERIC + 0.5)::DOUBLE PRECISION, (room_number::NUMERIC + 0.5)::DOUBLE PRECISION)))=((3660.5,101.5),(3659,101)) conflicts WITH existing KEY (box(point((from_ts - '2000-01-01'::DATE)::DOUBLE PRECISION, room_number::DOUBLE PRECISION), point(((to_ts - '2000-01-01'::DATE)::NUMERIC + 0.5)::DOUBLE PRECISION, (room_number::NUMERIC + 0.5)::DOUBLE PRECISION)))=((3675.5,101.5),(3657,101)).

Error message is (as usual) nearly unreadable (the DETAIL: line), but that's not the point. The point is that the conflict was nicely caught.

What's more – since this is constraint – we can add it to table that already has bad data in it, and it will correctly detect problem:

# SELECT * FROM reservations;
 id | room_number |  from_ts   |   to_ts    | guest_name
----+-------------+------------+------------+------------
  1 |         101 | 2010-01-05 | 2010-01-23 | test1
  2 |         102 | 2010-01-07 | 2010-01-21 | test2
  3 |         101 | 2010-01-25 | 2010-01-30 | test3
  4 |         101 | 2010-01-07 | 2010-01-08 | test4
(4 ROWS)
 
# ALTER TABLE public.reservations ADD CONSTRAINT check_overlapping_reservations EXCLUDE USING GIST (
>>     box (
>>         point(
>>             from_ts - '2000-01-01'::DATE,
>>             room_number
>>         ),
>>         point(
>>             ( to_ts - '2000-01-01'::DATE ) + 0.5,
>>             room_number + 0.5
>>         )
>>     )
>>     WITH &&
>> );
NOTICE:  ALTER TABLE / ADD EXCLUDE will CREATE implicit INDEX "check_overlapping_reservations" FOR TABLE "reservations"
ERROR:  could NOT CREATE exclusion CONSTRAINT "check_overlapping_reservations"
DETAIL:  KEY (box(point((from_ts - '2000-01-01'::DATE)::DOUBLE PRECISION, room_number::DOUBLE PRECISION), point(((to_ts - '2000-01-01'::DATE)::NUMERIC + 0.5)::DOUBLE PRECISION, (room_number::NUMERIC + 0.5)::DOUBLE PRECISION)))=((3675.5,101.5),(3657,101)) conflicts WITH KEY (box(point((from_ts - '2000-01-01'::DATE)::DOUBLE PRECISION, room_number::DOUBLE PRECISION), point(((to_ts - '2000-01-01'::DATE)::NUMERIC + 0.5)::DOUBLE PRECISION, (room_number::NUMERIC + 0.5)::DOUBLE PRECISION)))=((3660.5,101.5),(3659,101)).

This is simply amazing. Brilliant feature. Even if the end-user-interface (the way actual constraints are defined) is a bit convoluted.

9 thoughts on “Waiting for 8.5 – exclusion constraints”

  1. I suppose they could have added naming feature. So that in app you could catch the exception by name. There’s still time to do it before 8.5 is out.

  2. Thanks for putting this together.

    Before postgresql 9, I would prevent overlapping stuff by putting in triggers.

    Are exclusion constraints faster?

  3. I’m putting it in comment, as I don’t like changing existing posts (aside from adding a note).

    Implementing it by doing “extract(epoch from field at time zone ‘UTC’)” is unfortunately flawed.

    The correct way would be to use function like:

    create function epoch(timestamptz) returns double precision language sql immutable strict as $f$ select extract(epoch from $1); $f$

    instead of the epoch(field) instead of extract(epoch from field at …).

    Thanks for correction go to RhodiumToad @IRC.

  4. Really really nice and helpful.

    But there are few things i don’t really get.

    – Why is it flawed to use extract(epoch from field at time zone ‘UTC’)? Cos i’m implementing it and it works perfectly

    – Why are you substracting ‘2000-01-01’::date to the timestamp?

    – And also for comprehension is there a reason why you put the room number in the box and not outside as in EXCLUDE USING GIST (room_number WITH =, box ( … ) WITH &&);

  5. @yokoloko:
    1. epoch – try changing timezone to very different, and retry.
    2. 2000-01-01 – to get smaller numbers. it doesn’t matter for pg, but makes for simpler reading
    3. you definitely could do it, but it would require installation of gist_btree extension.

Comments are closed.