How to run short ALTER TABLE without long locking concurrent queries

Recently I've seen case like:

  1. application had to add column to table.
  2. application ran ALTER TABLE ADD COLUMN (without default!)
  3. everything stopped for many MINUTES

Why? How to avoid the problem?

First, let's see what's happening when I try to alter table add column:

=# ALTER TABLE test ADD COLUMN whatever int4;
ALTER TABLE
TIME: 12.662 ms

As you can see adding the column is very fast. So how come it locked other queries for minutes?

Well, let's see locks obtained by alter table:

=# BEGIN;
BEGIN
 
=# ALTER TABLE test ADD COLUMN whatever2 int4;
ALTER TABLE
 
=# SELECT * FROM pg_locks WHERE pid = pg_backend_pid();
   locktype    | DATABASE | relation | page | tuple | virtualxid | transactionid | classid | objid | objsubid | virtualtransaction |  pid  |        mode         | GRANTED | fastpath 
---------------+----------+----------+------+-------+------------+---------------+---------+-------+----------+--------------------+-------+---------------------+---------+----------
 relation      |   165725 |    12143 |      |       |            |               |         |       |          | 3/2594             | 32470 | AccessShareLock     | t       | t
 virtualxid    |          |          |      |       | 3/2594     |               |         |       |          | 3/2594             | 32470 | ExclusiveLock       | t       | t
 transactionid |          |          |      |       |            |          1422 |         |       |          | 3/2594             | 32470 | ExclusiveLock       | t       | f
 relation      |   165725 |   166142 |      |       |            |               |         |       |          | 3/2594             | 32470 | AccessExclusiveLock | t       | f
(4 ROWS)
 
=# ROLLBACK;
ROLLBACK

Please note that there is AccessExclusiveLock – and this will lock everything until the lock is gone, which in my case is – until the transaction with alter table will finish.

But alter table add column takes 12ms. So why minutes? Well, it happened that there were some other queries happening on the table before, and then alter table has to wait with continuing till previous locks are gone.

We can see it by doing:

(SESSION 1) =# BEGIN;
BEGIN
 
(SESSION 1) =# SELECT COUNT(*) FROM test;
 COUNT 
-------
     0
(1 ROW)

then, without closing session 1, and without closing its transaction:

(SESSION 2) =# ALTER TABLE test ADD COLUMN whatever2 int4;

and this hangs – because it's waiting for lock from session 1. But – it already obtained AccessExclusiveLock on the table, so no other select can happen:

(SESSION 3) =# depesz=# SELECT * FROM test LIMIT 1;

also hangs.

So, is there any way to add this column without long locking of other queries?

YES. Using the same trick pg_reorg/pg_repack is using.

First we need to set statement timeout, and then run alter table:

=$ printf "SET statement_timeout = 50;\nALTER TABLE test add column whatever2 INT4;\n"
SET statement_timeout = 50;
ALTER TABLE test add column whatever2 INT4;

The timeout will make sure that alter table can't take more than 50 milliseconds.

Then, if I'll run it, via psql, I get:

=$ time printf "SET statement_timeout = 50;\nALTER TABLE test add column whatever2 INT4;\n" | psql -qX
ERROR:  canceling statement due to statement timeout
 
real    0m0.054s
user    0m0.000s
sys     0m0.002s
 
=$ echo $?
0

It failed, fast, which is good. But it returned code 0. Which is mark for success. But we can change that by adding:

=$ time printf "SET statement_timeout = 50;\nALTER TABLE test add column whatever2 INT4;\n" | psql -qX -v ON_ERROR_STOP=1
ERROR:  canceling statement due to statement timeout
 
real    0m0.054s
user    0m0.002s
sys     0m0.000s
 
=$ echo $?
3

And this is what we need. Now I can wrap it in a loop, but to make it slightly more readable, let's first get rid of printf from each call:

=$ printf "SET statement_timeout = 50;\nALTER TABLE test add column whatever2 INT4;\n" > alter.sql

And now, I can:

=$ while true; do date; psql -qX -v ON_ERROR_STOP=1 -f alter.sql && break; sleep 1; done
Thu 26 Sep 2019 03:43:52 PM CEST
psql:alter.sql:2: ERROR:  canceling statement due to statement timeout
Thu 26 Sep 2019 03:43:53 PM CEST
psql:alter.sql:2: ERROR:  canceling statement due to statement timeout
Thu 26 Sep 2019 03:43:54 PM CEST
psql:alter.sql:2: ERROR:  canceling statement due to statement timeout
Thu 26 Sep 2019 03:43:55 PM CEST
psql:alter.sql:2: ERROR:  canceling statement due to statement timeout
Thu 26 Sep 2019 03:43:56 PM CEST
 
=$

The oneliner expanded:

while true
do
    date
    psql -qX -v ON_ERROR_STOP=1 -f alter.sql && break
    sleep 1
done

date is there just so that I'll see when I started, and when it finally worked. Of course one could use another value of sleep.

If I were to use some smarter language than bash I could do it all in one connection, but for these kinds of tasks, I don't really think it matters much to start new psql every second or so.

Obviously adding the column might take long time now – but at no point in time other queries will be locked for over 50 ms.

Please note that if you'd like to do more stuff to the table, you can do it using transaction and alter.sql containing:

BEGIN;
    SET statement_timeout = 50;
    LOCK TABLE ONLY test IN ACCESS EXCLUSIVE MODE;
    SET statement_timeout = 0;
 
    ALTER TABLE test ....;
    -- do whatever you want, timeout is removed.
commit;

But you have to remember that if the lock will be obtained, no other connection will be able to use the table.

12 thoughts on “How to run short ALTER TABLE without long locking concurrent queries”

  1. You can also do a variation of this where you start a transaction and then attempt to take a NOWAIT table lock on the table you need to alter. It will error the transaction immediately if it can’t take the lock and then you can loop with whatever wait time you want.

    Has the downside that you will never succeed if you can’t find a time that doesn’t have a conflicting lock already, but has the upside of you don’t block for 50ms (or whatever your statement_timeout needs to be) repeatedly.

    Still a good idea to put a statement_timeout on this method also just in case your change takes longer than you thought it should, or you attempted to bundle something else in that transaction that you expected to be fast that isn’t.

  2. @Jeremy Schneider:
    well, now that I’ve read it – it looks that lock_time is/would be even better. Forgot about it 🙂

  3. Is there any specific reason why PL/pgSQL LOOP was not used? If I apply my Oracle experience to this task using that lock_timeout Jeremy Schneider pointed out, I would rather do something like this:

    DO
    $do$
    DECLARE
       lock_timeout CONSTANT text := '50ms';
       max_attempts CONSTANT INT := 1000;
       ddl_completed BOOLEAN := FALSE;
    BEGIN 
     
       PERFORM set_config('lock_timeout', lock_timeout, FALSE);
     
       FOR i IN 1..max_attempts LOOP
          BEGIN
             EXECUTE 'ALTER TABLE test add column whatever2 INT4';
             ddl_completed := TRUE;
             EXIT;
          EXCEPTION 
             WHEN lock_not_available THEN
               NULL;
          END;
       END LOOP;
     
       IF ddl_completed THEN
          RAISE INFO 'DDL has been successfully completed';
       ELSE
          RAISE EXCEPTION 'Failed to perform DDL';
       END IF;
    END
    $do$;
  4. @Mikhail Velikikh:

    No real reason, aside form the fact that I somehow try to steer clear of exception handling in plpgsql. Probably in this case it wasn’t necessary.

    On the other hand – my code seems to be shorter 🙂

  5. This is genius, I love it! I took the approach a couple of days ago of doing the DDL change out of hours, but this would have been much better.

  6. There are basically 2 ways to do this:
    1. Using timeout logic as explained in this blog, or
    2. Using NOWAIT feature, shown below:

    DO LANGUAGE plpgsql $$
    BEGIN
    FOR get_lock IN 1 .. 10 LOOP
      BEGIN
        LOCK TABLE mytableparent IN ACCESS EXCLUSIVE MODE NOWAIT;
        ALTER TABLE mytablechild_01 NO INHERIT mytableparent;
        ALTER TABLE mytablechild_02 NO INHERIT mytableparent;
        RAISE INFO 'locked table and DISINHERITED tables successfully';
        EXIT;
      END;
    END LOOP;
    END;
    $$;
  7. @Michael:

    yeah, but NOWAIT will exit immediately, not even leaving the chance to wait for 50ms (or whatever is acceptable). That’s why I didn’t use it.

  8. Good point, @depesz, but the loop variable iteration tries to account for that by trying multiple times before giving up, so you could adjust it higher. Don’t get me wrong, I like your way as well!

  9. @Michael:

    sure, you can add loop, but – the way it’s written will cause “busy-loop” using all of CPU, and if we’d add sleeps – it is very possible that checking for locks would never succeed on busy systems, while a waiting solution might.

  10. @Depesz, not trying to belabor this point too much but in your opinion can you think of a scenario in which using NOWAIT might be a better solution?

  11. @Michael:
    only if you really don’t want to wait. As in: if it locks, do something, otherwise do something else.

Comments are closed.