Waiting for 9.6 – Add simple VACUUM progress reporting.

On 15th of March, Robert Haas committed patch:

Add simple VACUUM progress reporting. 
 
There's a lot more that could be done here yet - in particular, this
reports only very coarse-grained information about the index vacuuming
phase - but even as it stands, the new pg_stat_progress_vacuum can
tell you quite a bit about what a long-running vacuum is actually
doing.
 
Amit Langote and Robert Haas, based on earlier work by Vinayak Pokale
and Rahila Syed.

Not long ago I blogged about progress information report. Now, we got some new functionality in there.

Let's see it. First, I'll need to create some sample tables:

=$ seq 1 20 | \
    sed "s/.*/create table t& as select i, '2000-01-01'::date + '5 minutes'::interval * i as rnd_ts, random() * 1000000 as rnd, repeat('depesz', 100) as payload from generate_series(1,1000000) i;/" | \
    xargs -d$'\n' -n1 -P5 psql -c

This created 20 tables, each with 1 million records. Data look like this:

$ SELECT i, rnd_ts, rnd, substr(payload, 1, 20) || '...' AS payload FROM t1 LIMIT 10;
 i  |       rnd_ts        |       rnd        |         payload         
----+---------------------+------------------+-------------------------
  1 | 2000-01-01 00:05:00 | 58989.2640709877 | depeszdepeszdepeszde...
  2 | 2000-01-01 00:10:00 | 820058.385375887 | depeszdepeszdepeszde...
  3 | 2000-01-01 00:15:00 | 274652.003776282 | depeszdepeszdepeszde...
  4 | 2000-01-01 00:20:00 | 330385.855864733 | depeszdepeszdepeszde...
  5 | 2000-01-01 00:25:00 | 816058.840602636 | depeszdepeszdepeszde...
  6 | 2000-01-01 00:30:00 | 555196.795146912 | depeszdepeszdepeszde...
  7 | 2000-01-01 00:35:00 | 724607.851821929 | depeszdepeszdepeszde...
  8 | 2000-01-01 00:40:00 | 684836.780186743 | depeszdepeszdepeszde...
  9 | 2000-01-01 00:45:00 | 915637.820959091 | depeszdepeszdepeszde...
 10 | 2000-01-01 00:50:00 | 186367.315705866 | depeszdepeszdepeszde...
(10 ROWS)

Now, let's add primary keys to all these tables, and make indexes on rnd_ts, and on rnd:

=$ seq 1 20 | \
    sed "s/.*/alter table t& add primary key (i);\ncreate index i_&_1 on t& (rnd_ts);\ncreate index i_&_2 on t& (rnd);/" | \
    sort -V | \
    xargs -d$'\n' -P5 -n1 psql -c

Now, each table has this schema:

                 TABLE "public.t1"
 COLUMN  |            TYPE             | Modifiers 
---------+-----------------------------+-----------
 i       | INTEGER                     | NOT NULL
 rnd_ts  | TIMESTAMP WITHOUT TIME zone | 
 rnd     | DOUBLE PRECISION            | 
 payload | text                        | 
Indexes:
    "t1_pkey" PRIMARY KEY, btree (i)
    "i_1_1" btree (rnd_ts)
    "i_1_2" btree (rnd)

Now, when I did “vacuum;", in another psql, I could see:

$ SELECT relid::regclass, * FROM pg_stat_progress_vacuum ;
 relid | pid  | datid | datname | relid |     phase     | heap_blks_total | heap_blks_scanned | heap_blks_vacuumed | index_vacuum_count | max_dead_tuples | num_dead_tuples 
-------+------+-------+---------+-------+---------------+-----------------+-------------------+--------------------+--------------------+-----------------+-----------------
 t3    | 6311 | 16386 | depesz  | 16478 | scanning heap |           83334 |             26928 |                  0 |                  0 |        11184810 |               0
(1 ROW)
 
$ SELECT relid::regclass, * FROM pg_stat_progress_vacuum ;
 relid | pid  | datid | datname | relid |     phase     | heap_blks_total | heap_blks_scanned | heap_blks_vacuumed | index_vacuum_count | max_dead_tuples | num_dead_tuples 
-------+------+-------+---------+-------+---------------+-----------------+-------------------+--------------------+--------------------+-----------------+-----------------
 t3    | 6311 | 16386 | depesz  | 16478 | scanning heap |           83334 |             49197 |                  0 |                  0 |        11184810 |               0
(1 ROW)
 
$ SELECT relid::regclass, * FROM pg_stat_progress_vacuum ;
 relid | pid  | datid | datname | relid |        phase        | heap_blks_total | heap_blks_scanned | heap_blks_vacuumed | index_vacuum_count | max_dead_tuples | num_dead_tuples 
-------+------+-------+---------+-------+---------------------+-----------------+-------------------+--------------------+--------------------+-----------------+-----------------
 t3    | 6311 | 16386 | depesz  | 16478 | cleaning up indexes |           83334 |             83334 |              83334 |                  0 |        11184810 |               0
(1 ROW)
 
$ SELECT relid::regclass, * FROM pg_stat_progress_vacuum ;
 relid | pid  | datid | datname | relid |        phase        | heap_blks_total | heap_blks_scanned | heap_blks_vacuumed | index_vacuum_count | max_dead_tuples | num_dead_tuples 
-------+------+-------+---------+-------+---------------------+-----------------+-------------------+--------------------+--------------------+-----------------+-----------------
 t3    | 6311 | 16386 | depesz  | 16478 | cleaning up indexes |           83334 |             83334 |              83334 |                  0 |        11184810 |               0
(1 ROW)

This is pretty cool. We now have actual, usable, progress information for the vacuum! This is great. Now, if only we could get similar progress information for other operations, that would be really great.

Anyway – thanks a lot 🙂