Waiting for PostgreSQL 11 – Fast ALTER TABLE ADD COLUMN with a non-NULL default

On 28th of March 2018, Andrew Dunstan committed patch:

Fast ALTER TABLE ADD COLUMN with a non-NULL default
 
 
Currently adding a column to a table with a non-NULL default results in
a rewrite of the table. For large tables this can be both expensive and
disruptive. This patch removes the need for the rewrite as long as the
default value is not volatile. The default expression is evaluated at
the time of the ALTER TABLE and the result stored in a new column
(attmissingval) in pg_attribute, and a new column (atthasmissing) is set
to true. Any existing row when fetched will be supplied with the
attmissingval. New rows will have the supplied value or the default and
so will never need the attmissingval.
 
Any time the table is rewritten all the atthasmissing and attmissingval
settings for the attributes are cleared, as they are no longer needed.
 
The most visible code change from this is in heap_attisnull, which
acquires a third TupleDesc argument, allowing it to detect a missing
value if there is one. In many cases where it is known that there will
not be any (e.g.  catalog relations) NULL can be passed for this
argument.
 
Andrew Dunstan, heavily modified from an original patch from Serge
Rielau.
Reviewed by Tom Lane, Andres Freund, Tomas Vondra and David Rowley.
 
Discussion: https://postgr.es/m/-7002-4c27-59f5-@2ndQuadrant.com

This is absolutely amazing.

So, for the longest time, when you did:

ALTER TABLE x ADD COLUMN z text;

it was virtually instantaneous. Get a lock on table, add information about new column to system catalogs, and it's done.

But when you tried:

ALTER TABLE x ADD COLUMN z text DEFAULT 'some value';

then it took long time. How long it did depend on size of table.

This was because postgresql was actually rewriting the whole table, adding the column to each row, and filling it with default value.

Now, thanks to this patch – this is no longer true.

On Pg 10.3:

$ CREATE TABLE z (id int4 PRIMARY KEY);
CREATE TABLE
TIME: 99.873 ms
 
$ INSERT INTO z (id) SELECT generate_series(1,10000000);
INSERT 0 10000000
TIME: 52770.843 ms (00:52.771)
 
$ ALTER TABLE z ADD COLUMN x text DEFAULT 'xx';
ALTER TABLE
TIME: 27271.854 ms (00:27.272)
 
$ SELECT * FROM z LIMIT 2;
 id | x
----+----
  1 | xx
  2 | xx
(2 ROWS)
 
TIME: 1.306 ms

But in new Pg11:

$ CREATE TABLE z (id int4 PRIMARY KEY);
CREATE TABLE
TIME: 10.255 ms
 
$ INSERT INTO z (id) SELECT generate_series(1,10000000);
INSERT 0 10000000
TIME: 13695.293 ms (00:13.695)
 
$ ALTER TABLE z ADD COLUMN x text DEFAULT 'xx';
ALTER TABLE
TIME: 19.509 ms
 
$ SELECT * FROM z LIMIT 2;
 id | x  
----+----
  1 | xx
  2 | xx
(2 ROWS)

This is great. Thanks to all involved.

8 thoughts on “Waiting for PostgreSQL 11 – Fast ALTER TABLE ADD COLUMN with a non-NULL default”

  1. So the new attribute is effectively inserted into the table when it gets SELECT’ed?

  2. A side note: based on your examples, in PG11 CREATE TABLE and INSERT are way faster, it seems?

  3. That’s brilliant, looking forward to forgeting about workarounds for this performance gotcha.

    Makes me wonder: would it make sense to keep this default value in pg_attributes even when the tuple gets updated, to save disk space ? Or does the runtime cost make that unappealing ? Or would distinguishing the ‘alter table add colum’ default from a subsequent ‘alter column’ default be too complicated ?

  4. @Eugene:

    nah, it was just tested on different computers. But you can see comparable speed of update vs. insert

  5. @moltonel:

    I have no idea about internals and complications – for this you’d have to mail pgsql-hackers.

  6. What happens if you want to set the column to NOT NULL also? Are we back to the slow version in that case or does this handle that as well?

    Does this preserve current behavior for default columns where you can explicitly insert a null value despite there being a default? (aka presumably under the scenes PG differentiate between a column being completely missing in an existing tuple and it being set to null in a newer version of a tuple for that table?)

  7. @anonymous:

    not null doesn’t change anything. it is a constraint for new rows. so adding a column with “not null default ‘xxx'” will be fast.

Comments are closed.