Waiting for PostgreSQL 10 – Identity columns

On 6th of April 2017, Peter Eisentraut committed patch:

Identity columns
 
This is the SQL standard-conforming variant of PostgreSQL's serial
columns.  It fixes a few usability issues that serial columns have:
 
- CREATE TABLE / LIKE copies default but refers to same sequence
- cannot add/drop serialness with ALTER TABLE
- dropping default does not drop sequence
- need to grant separate privileges to sequence
- other slight weirdnesses because serial is some kind of special macro

Description doesn't look all that different from normal sequences/serials. Let's see what we can find out about it.

First, sanity check, let's create some table, with normal sequence-based pkey:

$ CREATE TABLE test_old (
    id serial PRIMARY KEY,
    payload text
);
 
$ \d test_old
                             TABLE "public.test_old"
 COLUMN  |  TYPE   | Collation | NULLABLE |               DEFAULT                
---------+---------+-----------+----------+--------------------------------------
 id      | INTEGER |           | NOT NULL | NEXTVAL('test_old_id_seq'::regclass)
 payload | text    |           |          | 
Indexes:
    "test_old_pkey" PRIMARY KEY, btree (id)

OK. Now, let's compare it with the new hotness:

$ CREATE TABLE test_new (
    id INT generated BY DEFAULT AS IDENTITY PRIMARY KEY,
    payload text
);
 
$ \d test_new
                           TABLE "public.test_new"
 COLUMN  |  TYPE   | Collation | NULLABLE |             DEFAULT              
---------+---------+-----------+----------+----------------------------------
 id      | INTEGER |           | NOT NULL | generated BY DEFAULT AS IDENTITY
 payload | text    |           |          | 
Indexes:
    "test_new_pkey" PRIMARY KEY, btree (id)

OK, it looks different, but how it behaves?

$ INSERT INTO test_old (payload) VALUES ('a'), ('b'), ('c') returning *;
 id | payload 
----+---------
  1 | a
  2 | b
  3 | c
(3 ROWS)
 
$ INSERT INTO test_new (payload) VALUES ('a'), ('b'), ('c') returning *;
 id | payload 
----+---------
  1 | a
  2 | b
  3 | c
(3 ROWS)

Inserts seem to be the same.

$ UPDATE test_old SET id = 4 WHERE id = 3 returning *;
 id | payload 
----+---------
  4 | c
(1 ROW)
 
$ UPDATE test_new SET id = 4 WHERE id = 3 returning *;
 id | payload 
----+---------
  4 | c
(1 ROW)

Update worked, but now inserts should/can fail:

$ INSERT INTO test_old (payload) VALUES ('e') returning *;
ERROR:  duplicate KEY VALUE violates UNIQUE CONSTRAINT "test_old_pkey"
DETAIL:  KEY (id)=(4) already EXISTS.
 
$ INSERT INTO test_new (payload) VALUES ('e') returning *;
ERROR:  duplicate KEY VALUE violates UNIQUE CONSTRAINT "test_new_pkey"
DETAIL:  KEY (id)=(4) already EXISTS.

No surprises here. What about delete, and re-insert?

$ DELETE FROM test_old WHERE id = 4 returning *;
 id | payload 
----+---------
  4 | c
(1 ROW)
 
$ DELETE FROM test_new WHERE id = 4 returning *;
 id | payload 
----+---------
  4 | c
(1 ROW)
 
$ INSERT INTO test_old (payload) VALUES ('e') returning *;
 id | payload 
----+---------
  5 | e
(1 ROW)
 
$ INSERT INTO test_new (payload) VALUES ('e') returning *;
 id | payload 
----+---------
  5 | e
(1 ROW)

Well no difference, there even is sequence for the new way:

$ \ds
              List OF relations
 Schema |      Name       |   TYPE   | Owner  
--------+-----------------+----------+--------
 public | test_new_id_seq | SEQUENCE | depesz
 public | test_old_id_seq | SEQUENCE | depesz
(2 ROWS)

Well. Couple of things. For starters this new syntax is (from what I understand) more standard. As in: it is in one of SQL-* specifications.

Second. Take a look at this:

$ DROP SEQUENCE test_old_id_seq cascade;
NOTICE:  DROP cascades TO DEFAULT FOR TABLE test_old COLUMN id
 
\d test_old
              TABLE "public.test_old"
 COLUMN  |  TYPE   | Collation | NULLABLE | DEFAULT 
---------+---------+-----------+----------+---------
 id      | INTEGER |           | NOT NULL | 
 payload | text    |           |          | 
Indexes:
    "test_old_pkey" PRIMARY KEY, btree (id)

and in case of new sequence:

$ DROP SEQUENCE test_new_id_seq cascade;
ERROR:  cannot DROP SEQUENCE test_new_id_seq because TABLE test_new COLUMN id requires it
HINT:  You can DROP TABLE test_new COLUMN id instead.

It can't be dropped, even with cascade – you have first to remove the column that needs it.

There is one more thing. In the create table for identify I used generated by default AS IDENTITY, But I could have done also:

$ CREATE TABLE test_new_2 (
    id INT generated always AS IDENTITY PRIMARY KEY,
    payload text
);
 
$ \d test_new_2
                        TABLE "public.test_new_2"
 COLUMN  |  TYPE   | Collation | NULLABLE |           DEFAULT            
---------+---------+-----------+----------+------------------------------
 id      | INTEGER |           | NOT NULL | generated always AS IDENTITY
 payload | text    |           |          | 
Indexes:
    "test_new_2_pkey" PRIMARY KEY, btree (id)

The difference? Check this:

$ INSERT INTO test_new_2 (payload) VALUES ('a'), ('b'), ('c') returning *;
 id | payload 
----+---------
  1 | a
  2 | b
  3 | c
(3 ROWS)
 
$ UPDATE test_new_2 SET id = 4 WHERE id = 3 returning *;
ERROR:  COLUMN "id" can ONLY be updated TO DEFAULT
DETAIL:  COLUMN "id" IS an IDENTITY COLUMN defined AS GENERATED ALWAYS.

This also works with inserts:

$ INSERT INTO test_old (id, payload) VALUES (99, 'x') returning *;
 id | payload 
----+---------
 99 | x
(1 ROW)
 
$ INSERT INTO test_new (id, payload) VALUES (99, 'x') returning *;
 id | payload 
----+---------
 99 | x
(1 ROW)
 
$ INSERT INTO test_new_2 (id, payload) VALUES (99, 'x') returning *;
ERROR:  cannot INSERT INTO COLUMN "id"
DETAIL:  COLUMN "id" IS an IDENTITY COLUMN defined AS GENERATED ALWAYS.
HINT:  USE OVERRIDING SYSTEM VALUE TO override.
 
$ INSERT INTO test_new_2 (id, payload)
    OVERRIDING SYSTEM VALUE
    VALUES (99, 'x')
    returning *;
 id | payload 
----+---------
 99 | x
(1 ROW)

So, it can be forced, but by default it will disable any meddling with identity columns.

For COPY command the protection is disabled. So you can:

$ COPY test_new_2 (id, payload) FROM STDIN WITH ( DELIMITER ',' );
90,a
91,b
\.
 
$ SELECT * FROM test_new_2 WHERE id IN (90, 91);
 id | payload 
----+---------
 90 | a
 91 | b
(2 ROWS)

All in all, it's really cool. If I have identity generated always, it would definitely make my life easier.

Thanks to all involved.

8 thoughts on “Waiting for PostgreSQL 10 – Identity columns”

  1. How does “generated always” affect backups and restores? Also “COPY” statements. One would think the IDs are important and cannot be changed.

  2. Indeed, a big step forward. Much “cleaner” than “sequence” primary keys.

  3. I am curious about the behavior in your update/delete example. I can see the logic behind not creating another id of ‘4’. But why did the id increment to ‘5’ after the delete of ‘4’?

  4. @Harold:
    insert has to generate id before trying to insert – as such, it gets next value from sequence. This happens also in case of failed inserts.

  5. using one sequence in mutiple table like below, how will this work with new “Identity”

    create table test1 (
    id int primary key default nextval(‘test_old_id_seq’::regclass),
    payload text
    );

    create table test2 (
    id int primary key default nextval(‘test_old_id_seq’::regclass),
    payload text
    );

    create table test3 (
    id int primary key default nextval(‘test_old_id_seq’::regclass),
    payload text
    );

  6. > “It can be dropped, even with cascade”
    probably meant
    “It can’t be dropped, even with cascade”
    ?..
    Nice overview – thank you for your articles

    also maybe mention, that you can drop identity without dropping column with `alter table test_new alter column id DROP IDENTITY;`?..
    Thanks again!

  7. @VaoTsun:

    Fixed the can/can’t typo. Thanks.

    As for dropping identity – i guess your comment is sufficient 🙂

Comments are closed.