April 10th, 2017 by depesz | Tags: , , , , , | 4 comments »
Did it help? If yes - maybe you can help me?

On 6th of April 2017, Peter Eisentraut committed patch:

Identity columns
 
This is the SQL standard-conforming variant of PostgreSQL's serial
columns.  It fixes a few usability issues that serial columns have:
 
- CREATE TABLE / LIKE copies default but refers to same sequence
- cannot add/drop serialness with ALTER TABLE
- dropping default does not drop sequence
- need to grant separate privileges to sequence
- other slight weirdnesses because serial is some kind of special macro

Description doesn't look all that different from normal sequences/serials. Let's see what we can find out about it.

First, sanity check, let's create some table, with normal sequence-based pkey:

$ create table test_old (
    id serial primary key,
    payload text
);
 
$ \d test_old
                             Table "public.test_old"
 Column  |  Type   | Collation | Nullable |               Default                
---------+---------+-----------+----------+--------------------------------------
 id      | integer |           | not null | nextval('test_old_id_seq'::regclass)
 payload | text    |           |          | 
Indexes:
    "test_old_pkey" PRIMARY KEY, btree (id)

OK. Now, let's compare it with the new hotness:

$ CREATE table test_new (
    id int generated by default AS IDENTITY primary key,
    payload text
);
 
$ \d test_new
                           Table "public.test_new"
 Column  |  Type   | Collation | Nullable |             Default              
---------+---------+-----------+----------+----------------------------------
 id      | integer |           | not null | generated by default as identity
 payload | text    |           |          | 
Indexes:
    "test_new_pkey" PRIMARY KEY, btree (id)

OK, it looks different, but how it behaves?

$ insert into test_old (payload) values ('a'), ('b'), ('c') returning *;
 id | payload 
----+---------
  1 | a
  2 | b
  3 | c
(3 rows)
 
$ insert into test_new (payload) values ('a'), ('b'), ('c') returning *;
 id | payload 
----+---------
  1 | a
  2 | b
  3 | c
(3 rows)

Inserts seem to be the same.

$ update test_old set id = 4 where id = 3 returning *;
 id | payload 
----+---------
  4 | c
(1 row)
 
$ update test_new set id = 4 where id = 3 returning *;
 id | payload 
----+---------
  4 | c
(1 row)

Update worked, but now inserts should/can fail:

$ insert into test_old (payload) values ('e') returning *;
ERROR:  duplicate key value violates unique constraint "test_old_pkey"
DETAIL:  Key (id)=(4) already exists.
 
$ insert into test_new (payload) values ('e') returning *;
ERROR:  duplicate key value violates unique constraint "test_new_pkey"
DETAIL:  Key (id)=(4) already exists.

No surprises here. What about delete, and re-insert?

$ delete from test_old where id = 4 returning *;
 id | payload 
----+---------
  4 | c
(1 row)
 
$ delete from test_new where id = 4 returning *;
 id | payload 
----+---------
  4 | c
(1 row)
 
$ insert into test_old (payload) values ('e') returning *;
 id | payload 
----+---------
  5 | e
(1 row)
 
$ insert into test_new (payload) values ('e') returning *;
 id | payload 
----+---------
  5 | e
(1 row)

Well no difference, there even is sequence for the new way:

$ \ds
              List of relations
 Schema |      Name       |   Type   | Owner  
--------+-----------------+----------+--------
 public | test_new_id_seq | sequence | depesz
 public | test_old_id_seq | sequence | depesz
(2 rows)

Well. Couple of things. For starters this new syntax is (from what I understand) more standard. As in: it is in one of SQL-* specifications.

Second. Take a look at this:

$ drop sequence test_old_id_seq cascade;
NOTICE:  drop cascades to default for table test_old column id
 
\d test_old
              Table "public.test_old"
 Column  |  Type   | Collation | Nullable | Default 
---------+---------+-----------+----------+---------
 id      | integer |           | not null | 
 payload | text    |           |          | 
Indexes:
    "test_old_pkey" PRIMARY KEY, btree (id)

and in case of new sequence:

$ drop sequence test_new_id_seq cascade;
ERROR:  cannot drop sequence test_new_id_seq because table test_new column id requires it
HINT:  You can drop table test_new column id instead.

It can be dropped, even with cascade – you have first to remove the column that needs it.

There is one more thing. In the create table for identify I used generated by default AS IDENTITY, But I could have done also:

$ CREATE table test_new_2 (
    id int generated always AS IDENTITY primary key,
    payload text
);
 
$ \d test_new_2
                        Table "public.test_new_2"
 Column  |  Type   | Collation | Nullable |           Default            
---------+---------+-----------+----------+------------------------------
 id      | integer |           | not null | generated always as identity
 payload | text    |           |          | 
Indexes:
    "test_new_2_pkey" PRIMARY KEY, btree (id)

The difference? Check this:

$ insert into test_new_2 (payload) values ('a'), ('b'), ('c') returning *;
 id | payload 
----+---------
  1 | a
  2 | b
  3 | c
(3 rows)
 
$ update test_new_2 set id = 4 where id = 3 returning *;
ERROR:  column "id" can only be updated to DEFAULT
DETAIL:  Column "id" is an identity column defined as GENERATED ALWAYS.

This also works with inserts:

$ insert into test_old (id, payload) values (99, 'x') returning *;
 id | payload 
----+---------
 99 | x
(1 row)
 
$ insert into test_new (id, payload) values (99, 'x') returning *;
 id | payload 
----+---------
 99 | x
(1 row)
 
$ insert into test_new_2 (id, payload) values (99, 'x') returning *;
ERROR:  cannot insert into column "id"
DETAIL:  Column "id" is an identity column defined as GENERATED ALWAYS.
HINT:  Use OVERRIDING SYSTEM VALUE to override.
 
$ insert into test_new_2 (id, payload)
    OVERRIDING SYSTEM VALUE
    values (99, 'x')
    returning *;
 id | payload 
----+---------
 99 | x
(1 row)

So, it can be forced, but by default it will disable any meddling with identity columns.

For COPY command the protection is disabled. So you can:

$ COPY test_new_2 (id, payload) FROM STDIN WITH ( DELIMITER ',' );
90,a
91,b
\.
 
$ select * from test_new_2 where id in (90, 91);
 id | payload 
----+---------
 90 | a
 91 | b
(2 rows)

All in all, it's really cool. If I have identity generated always, it would definitely make my life easier.

Thanks to all involved.

  1. 4 comments

  2. # Miles
    Apr 11, 2017

    How does “generated always” affect backups and restores? Also “COPY” statements. One would think the IDs are important and cannot be changed.

  3. # Stefan Wolf
    Apr 11, 2017

    Indeed, a big step forward. Much “cleaner” than “sequence” primary keys.

  4. # Harold
    Apr 12, 2017

    I am curious about the behavior in your update/delete example. I can see the logic behind not creating another id of ‘4’. But why did the id increment to ‘5’ after the delete of ‘4’?

  5. Apr 12, 2017

    @Harold:
    insert has to generate id before trying to insert – as such, it gets next value from sequence. This happens also in case of failed inserts.

Leave a comment