Which tables should be auto vacuumed or auto analyzed?

Recently I was in a situation where autovacuum couldn't keep up with changes. To solve the problem I finally decided to manually vacuum analyze all tables (manual vacuum/analyze is faster than one ran by autovacuum daemon).

But it irritated me that I didn't have ready way to check which tables are waiting for autovacuum to work on them.

So, I wrote it.

First, let's start with basics. Autovacuum when figuring out whether it should do VACUUM and/or ANALYZE looks at these elements:

In my test case, config variables are:

=$ SELECT name, setting
FROM pg_settings
WHERE name IN ('autovacuum_analyze_scale_factor', 'autovacuum_analyze_threshold', 'autovacuum_vacuum_scale_factor', 'autovacuum_vacuum_threshold')
ORDER BY name;
              name               | setting 
---------------------------------+---------
 autovacuum_analyze_scale_factor | 0.1
 autovacuum_analyze_threshold    | 50
 autovacuum_vacuum_scale_factor  | 0.2
 autovacuum_vacuum_threshold     | 50
(4 ROWS)

Rules are simple:

  • run vacuum if n_dead_tup is larger than reltuples * autovacuum_vacuum_scale_factor + autovacuum_vacuum_threshold
  • run analyze if n_mod_since_analyze is larger than reltuples * autovacuum_analyze_scale_factor + autovacuum_analyze_threshold

Let's do some math. Assuming I have table with 10,000 rows, it will get vacuumed if I delete (or update) 10000 * 0.2 + 50 = 2050 rows. And it will get analyzed if I update (or insert) 10000 * 0.1 + 50 = 1050 rows. Easy. Isn't it?

With math behind, let's see how the values really change. First some test tables:

=$ CREATE TABLE t1 ( id INT4 generated always AS IDENTITY, payload TEXT, PRIMARY KEY (id));
=$ CREATE TABLE t2 ( id INT4 generated always AS IDENTITY, payload TEXT, PRIMARY KEY (id));
=$ CREATE TABLE t3 ( id INT4 generated always AS IDENTITY, payload TEXT, PRIMARY KEY (id));

Immediately after creation stats are obvious:

=$ SELECT c.relname, c.reltuples, s.n_dead_tup, s.n_mod_since_analyze
FROM pg_class c LEFT OUTER JOIN pg_stat_all_tables s ON c.oid = s.relid
WHERE c.relname ~ '^t[123]$' ORDER BY 1;
 relname | reltuples | n_dead_tup | n_mod_since_analyze 
---------+-----------+------------+---------------------
 t1      |         0 |          0 |                   0
 t2      |         0 |          0 |                   0
 t3      |         0 |          0 |                   0
(3 ROWS)

OK. Let's load some data:

=$ copy t1 (payload) FROM '/usr/share/dict/words';
COPY 102401
 
=$ copy t2 (payload) FROM '/usr/share/dict/words';
COPY 102401
 
=$ copy t3 (payload) FROM '/usr/share/dict/words';
COPY 102401

Immediately afterwards, with autovacuum disabled, stats are a bit more interesting:

=$ SELECT c.relname, c.reltuples, s.n_dead_tup, s.n_mod_since_analyze
FROM pg_class c LEFT OUTER JOIN pg_stat_all_tables s ON c.oid = s.relid
WHERE c.relname ~ '^t[123]$' ORDER BY 1;
 relname | reltuples | n_dead_tup | n_mod_since_analyze 
---------+-----------+------------+---------------------
 t1      |         0 |          0 |              102401
 t2      |         0 |          0 |                   0
 t3      |         0 |          0 |                   0
(3 ROWS)

Please note that reltuples are still 0 because this value is updated by vacuum.

Let's update some rows:

=$ UPDATE t1 SET payload = 'new ' || payload WHERE random() < 0.01;
UPDATE 1068
 
=$ UPDATE t2 SET payload = 'new ' || payload WHERE random() < 0.1;
UPDATE 10135
 
=$ UPDATE t3 SET payload = 'new ' || payload WHERE random() < 0.5;
UPDATE 51028
 
=$ SELECT c.relname, c.reltuples, s.n_dead_tup, s.n_mod_since_analyze
FROM pg_class c LEFT OUTER JOIN pg_stat_all_tables s ON c.oid = s.relid
WHERE c.relname ~ '^t[123]$' ORDER BY 1;
 relname | reltuples | n_dead_tup | n_mod_since_analyze 
---------+-----------+------------+---------------------
 t1      |         0 |       1068 |              103469
 t2      |         0 |      10135 |               10135
 t3      |         0 |      51028 |               51028
(3 ROWS)

OK – we see now that n_dead_tup was increased to number of rows updated, and n_mod_since_analyze contains sum of rows inserted and rows updated.

So, finally – delete. But, to be able to see it clearly, first, Let's vacuum/analyze the tables, so counters will be zeroed:

=$ vacuum analyze t1, t2, t3;
VACUUM
 
=$ SELECT c.relname, c.reltuples, s.n_dead_tup, s.n_mod_since_analyze
FROM pg_class c LEFT OUTER JOIN pg_stat_all_tables s ON c.oid = s.relid
WHERE c.relname ~ '^t[123]$' ORDER BY 1;
 relname | reltuples | n_dead_tup | n_mod_since_analyze 
---------+-----------+------------+---------------------
 t1      |    102401 |          0 |                   0
 t2      |    102401 |          0 |                   0
 t3      |    102401 |          0 |                   0
(3 ROWS)
 
=$ DELETE FROM t1 WHERE random() < 0.01;
DELETE 999
 
=$ DELETE FROM t2 WHERE random() < 0.1;
DELETE 10307
 
=$ DELETE FROM t3 WHERE random() < 0.5;
DELETE 51087
 
=$ SELECT c.relname, c.reltuples, s.n_dead_tup, s.n_mod_since_analyze
FROM pg_class c LEFT OUTER JOIN pg_stat_all_tables s ON c.oid = s.relid
WHERE c.relname ~ '^t[123]$' ORDER BY 1;
 relname | reltuples | n_dead_tup | n_mod_since_analyze 
---------+-----------+------------+---------------------
 t1      |    102401 |        999 |                 999
 t2      |    102401 |      10307 |               10307
 t3      |    102401 |      51087 |               51087
(3 ROWS)

Great. Both n_dead_tup and n_mod_since_analyze were increased.

Now, let's write a query that will show us which tables are waiting for autovacuum to work on them.

First – to put all settings in single place, I'll write this CTE:

WITH s AS (
    SELECT
        current_setting('autovacuum_analyze_scale_factor')::float8  AS  analyze_factor,
        current_setting('autovacuum_analyze_threshold')::float8     AS  analyze_threshold,
        current_setting('autovacuum_vacuum_scale_factor')::float8   AS  vacuum_factor,
        current_setting('autovacuum_vacuum_threshold')::float8      AS  vacuum_threshold
)

Then, based on this I can get list of all tables, with information about whether I should vacuum or analyze:

  1. =$ WITH s AS (
  2.     SELECT
  3.         current_setting('autovacuum_analyze_scale_factor')::float8  AS  analyze_factor,
  4.         current_setting('autovacuum_analyze_threshold')::float8     AS  analyze_threshold,
  5.         current_setting('autovacuum_vacuum_scale_factor')::float8   AS  vacuum_factor,
  6.         current_setting('autovacuum_vacuum_threshold')::float8      AS  vacuum_threshold
  7. )
  8. SELECT
  9.     n.nspname,
  10.     c.relname,
  11.     c.oid AS relid,
  12.     t.n_dead_tup,
  13.     t.n_mod_since_analyze,
  14.     c.reltuples * s.vacuum_factor + s.vacuum_threshold AS v_threshold,
  15.     c.reltuples * s.analyze_factor + s.analyze_threshold AS a_threshold
  16. FROM
  17.     s,
  18.     pg_class c
  19.     JOIN pg_namespace n ON c.relnamespace = n.oid
  20.     JOIN pg_stat_all_tables t ON c.oid = t.relid
  21. WHERE
  22.     c.relkind = 'r'
  23.     AND c.relname ~ '^t[123]$'
  24. ORDER BY c.relname;
  25.  nspname | relname | relid | n_dead_tup | n_mod_since_analyze | v_threshold | a_threshold 
  26. ---------+---------+-------+------------+---------------------+-------------+-------------
  27.  public  | t1      | 17626 |        999 |                 999 |     20530.2 |     10290.1
  28.  public  | t2      | 17636 |      10307 |              112708 |     20530.2 |     10290.1
  29.  public  | t3      | 17646 |      51087 |              153488 |     20530.2 |     10290.1
  30. (3 ROWS)

I included condition (line 23) for relname to avoid showing me all tables in this database when just debugging the query. After it will be done, condition c.relname ~ ‘^t[123]$' will be removed.

Anyway – we have tables, their n_dead_tup, n_mod_since_analyze and thresholds for vacuum and analyze. So,let's show it in nicer way:

=$ WITH s AS (
    SELECT
        current_setting('autovacuum_analyze_scale_factor')::float8  AS  analyze_factor,
        current_setting('autovacuum_analyze_threshold')::float8     AS  analyze_threshold,
        current_setting('autovacuum_vacuum_scale_factor')::float8   AS  vacuum_factor,
        current_setting('autovacuum_vacuum_threshold')::float8      AS  vacuum_threshold
), tt AS (
    SELECT
        n.nspname,
        c.relname,
        c.oid AS relid,
        t.n_dead_tup,
        t.n_mod_since_analyze,
        c.reltuples * s.vacuum_factor + s.vacuum_threshold AS v_threshold,
        c.reltuples * s.analyze_factor + s.analyze_threshold AS a_threshold
    FROM
        s,
        pg_class c
        JOIN pg_namespace n ON c.relnamespace = n.oid
        JOIN pg_stat_all_tables t ON c.oid = t.relid
    WHERE
        c.relkind = 'r'
)
SELECT
    nspname,
    relname,
    relid,
    CASE WHEN n_dead_tup > v_threshold THEN 'VACUUM' ELSE '' END AS do_vacuum,
    CASE WHEN n_mod_since_analyze > a_threshold THEN 'ANALYZE' ELSE '' END AS do_analyze
FROM
    tt
WHERE
    n_dead_tup > v_threshold OR
    n_mod_since_analyze > a_threshold
ORDER BY nspname, relname;
 nspname | relname | relid | do_vacuum | do_analyze 
---------+---------+-------+-----------+------------
 public  | t2      | 17636 |           | ANALYZE
 public  | t3      | 17646 | VACUUM    | ANALYZE
(2 ROWS)

This query can be stored as a view:

=$ CREATE VIEW autovacuum_queue AS
WITH s AS (
    SELECT
        current_setting('autovacuum_analyze_scale_factor')::float8  AS  analyze_factor,
        current_setting('autovacuum_analyze_threshold')::float8     AS  analyze_threshold,
        current_setting('autovacuum_vacuum_scale_factor')::float8   AS  vacuum_factor,
        current_setting('autovacuum_vacuum_threshold')::float8      AS  vacuum_threshold
), tt AS (
    SELECT
        n.nspname,
        c.relname,
        c.oid AS relid,
        t.n_dead_tup,
        t.n_mod_since_analyze,
        c.reltuples * s.vacuum_factor + s.vacuum_threshold AS v_threshold,
        c.reltuples * s.analyze_factor + s.analyze_threshold AS a_threshold
    FROM
        s,
        pg_class c
        JOIN pg_namespace n ON c.relnamespace = n.oid
        JOIN pg_stat_all_tables t ON c.oid = t.relid
    WHERE
        c.relkind = 'r'
)
SELECT
    nspname,
    relname,
    relid,
    CASE WHEN n_dead_tup > v_threshold THEN 'VACUUM' ELSE '' END AS do_vacuum,
    CASE WHEN n_mod_since_analyze > a_threshold THEN 'ANALYZE' ELSE '' END AS do_analyze
FROM
    tt
WHERE
    n_dead_tup > v_threshold OR
    n_mod_since_analyze > a_threshold;
CREATE VIEW

And now, I can see this, with (possibly) some additional info:

=$ SELECT *,
    pg_relation_size(relid),
    pg_total_relation_size(relid)
FROM autovacuum_queue
    ORDER BY pg_total_relation_size(relid) DESC
;
 nspname | relname | relid | do_vacuum | do_analyze | pg_relation_size | pg_total_relation_size 
---------+---------+-------+-----------+------------+------------------+------------------------
 public  | t3      | 17646 | VACUUM    | ANALYZE    |          7143424 |               11804672
 public  | t2      | 17636 |           | ANALYZE    |          5128192 |                8011776
(2 ROWS)

Hope you'll find it useful.

4 thoughts on “Which tables should be auto vacuumed or auto analyzed?”

  1. Great!!

    I tend to feel, this kind of thing should be provided by Postgres itself.

    Some time ago I also needed this. It turned out that colleagues from Avito.ru already built something some time ago, so I took their work and slightly improved, here it is: https://gitlab.com/snippets/1889668. It has various details, but it doesn’t deal with autoanalyze part.

    I also find it useful to put this into monitoring – for example, for GitLab.com it is exported to Prometheus: https://gitlab.com/gitlab-com/gl-infra/infrastructure/issues/8494#note_251487637.

    Observing the size of “wait queue”, keeping in mind the value of autovacuum_max_workers, can be helpful for autovacuum tuning.

  2. Thanks for this amazing query. I ran this query couple of times and noticed that it was not showing any output, however pg_stat_all_tables recorded autovacuum happening. Then I found that i had set the autovacuum settings at table level.
    Does this query cover that too ? Is my under standing correct ? please correct me If i am wrong.

  3. @Gowtham:

    No, the query doesn’t handle per-table settings. Mostly because I didn’t think of it while writing the query. It should be possible to add it, though.

  4. @Nikolay:

    For monitoring I prefer to simply check number of concurrent backends doing autovacuum (simple “query ~ ‘^autovacuum:'” is enough) and compare it against autovacuum_max_workers. If the numbers are the same for extended period of time – there is a problem.

Leave a Reply

Your email address will not be published.

This site uses Akismet to reduce spam. Learn how your comment data is processed.