Waiting for PostgreSQL 15 – Add UNIQUE null treatment option

On 3rd of February 2022, Peter Eisentraut committed patch:

Add UNIQUE null treatment option 
 
The SQL standard has been ambiguous about whether null values in
unique constraints should be considered equal or not.  Different
implementations have different behaviors.  In the SQL:202x draft, this
has been formalized by making this implementation-defined and adding
an option on unique constraint definitions UNIQUE [ NULLS [NOT]
DISTINCT ] to choose a behavior explicitly.
 
This patch adds this option to PostgreSQL.  The default behavior
remains UNIQUE NULLS DISTINCT.  Making this happen in the btree code
is pretty easy; most of the patch is just to carry the flag around to
all the places that need it.
 
The CREATE UNIQUE INDEX syntax extension is not from the standard,
it's my own invention.
 
I named all the internal flags, catalog columns, etc. in the negative
("nulls not distinct") so that the default PostgreSQL behavior is the
default if the flag is false.
 
Reviewed-by: Maxim Orlov <orlovmg@gmail.com>
Reviewed-by: Pavel Borisov <pashkin.elfe@gmail.com>
Discussion: https://www.postgresql.org/message-id/flat/84e5ee1b-387e-9a54-c326-9082674bde78@enterprisedb.com

This is interesting. Not a problem I encountered myself, but I've seen people on IRC or Slack complain about a version of it.

So, let's see the issue.

Consider this table:

=$ CREATE TABLE test (
    id serial PRIMARY KEY,
    codename text,
    UNIQUE (codename)
);

If I'll try to insert the same codename twice it will fail:

=$ INSERT INTO test (codename) VALUES ('depesz');
INSERT 0 1
 
=$ INSERT INTO test (codename) VALUES ('depesz');
ERROR:  duplicate KEY VALUE violates UNIQUE CONSTRAINT "test_codename_key"
DETAIL:  KEY (codename)=(depesz) already EXISTS.

This is perfectly as expected. What some people find problematic is that we can insert many rows with codename being NULL:

=$ INSERT INTO test (codename) VALUES (NULL) returning *;
 id | codename 
----+----------
  3 | 
(1 ROW)
INSERT 0 1
 
=$ INSERT INTO test (codename) VALUES (NULL) returning *;
 id | codename 
----+----------
  4 | 
(1 ROW)
INSERT 0 1
 
=$ INSERT INTO test (codename) VALUES (NULL) returning *;
 id | codename 
----+----------
  5 | 
(1 ROW)
INSERT 0 1

The rationale is simple – we can't realistically compare NULLs. One NULL value is not equal to another NULL value. It's also not “inequal". The comparison with NULL value returns NULL result – basically meaning “no idea".

Now, with this new change, we can, though, make it so that we can have at most one NULL value:

=$ CREATE TABLE test2 (
    id serial PRIMARY KEY,
    codename text,
    UNIQUE NULLS NOT DISTINCT (codename)
);

With this, if I'd try to insert more than one NULL codename:

=$ INSERT INTO test2 (codename) VALUES (NULL) returning *;
 id | codename 
----+----------
  1 | 
(1 ROW)
INSERT 0 1
 
=$ INSERT INTO test2 (codename) VALUES (NULL) returning *;
ERROR:  duplicate KEY VALUE violates UNIQUE CONSTRAINT "test2_codename_key"
DETAIL:  KEY (codename)=(NULL) already EXISTS.

Currently, if you'd just say “UNIQUE" then original behavior is default, but you can always force it using:

UNIQUE NULLS DISTINCT (codename)

When tables are created you can see the difference in \doutput:

=$ \d test
                             TABLE "public.test"
  COLUMN  |  TYPE   | Collation | NULLABLE |             DEFAULT              
----------+---------+-----------+----------+----------------------------------
 id       | INTEGER |           | NOT NULL | NEXTVAL('test_id_seq'::regclass)
 codename | text    |           |          | 
Indexes:
    "test_pkey" PRIMARY KEY, btree (id)
    "test_codename_key" UNIQUE CONSTRAINT, btree (codename)
 
=$ \d test2
                             TABLE "public.test2"
  COLUMN  |  TYPE   | Collation | NULLABLE |              DEFAULT              
----------+---------+-----------+----------+-----------------------------------
 id       | INTEGER |           | NOT NULL | NEXTVAL('test2_id_seq'::regclass)
 codename | text    |           |          | 
Indexes:
    "test2_pkey" PRIMARY KEY, btree (id)
    "test2_codename_key" UNIQUE CONSTRAINT, btree (codename) NULLS NOT DISTINCT

Same way you can add NULLS DISTINCT or NULLS NOT DISTINCT to create unique index:

=$ CREATE UNIQUE INDEX t1 ON test2 (codename) NULLS DISTINCT;
 
=$ CREATE UNIQUE INDEX t2 ON test2 (codename) NULLS NOT DISTINCT;

That is interesting addition. Thanks to all involved.